# RNN class of NN
Ng course on **sequence modeling** [ @Coursera](https://www.coursera.org/lecture/nlp-sequence-models/deep-rnns-ehs0S) or [@Youtube](https://youtu.be/DejHQYAGb7Q?list=PLkDaE6sCZn6F6wUI9tvS_Gw1vaFAx6rd6). Special note Ng's lectures on Bidirectional RNN as **attention model** @youtube [C5W3L07](https://youtu.be/SysgYptB198) [C5W3L08](https://youtu.be/quoGRI-1l0A)

LSTN in pictures :) [@Coursera](https://www.coursera.org/lecture/nlp-sequence-models/long-short-term-memory-lstm-KXoay) No flattenng or pooling needed, LSTM input = *sequence* but output is not (by default), use `return_sequences` to pass whole sequence as output.

```python
    keras.layers.Embending(tokenizer.vocab_size, 42),
    keras.layers.Bidirectional(keras.layers.LSTM(42, return_sequences=True)),
    keras.layers.Bidirectional(keras.layers.LSTM(12)),
    keras.layers.Dense(8),
```
Note, in some cases Conv1D can be used instead of RNN to make feature extractor filters from sequence.
Runtime of training LSTM twice slover x2 GRU or x10 Dense of same size. Training time CNN ~ Dense due to hw acceleration.

In [1]:
import tensorflow.keras as keras

model =  keras.models.Sequential([
    keras.layers.Embedding(120, 42, input_shape=(33,)),
    keras.layers.Bidirectional(keras.layers.LSTM(56, return_sequences=True)),
    keras.layers.Bidirectional(keras.layers.LSTM(12)),
    keras.layers.Dense(8),
])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 33, 42)            5040      
_________________________________________________________________
bidirectional (Bidirectional (None, 33, 112)           44352     
_________________________________________________________________
bidirectional_1 (Bidirection (None, 24)                12000     
_________________________________________________________________
dense (Dense)                (None, 8)                 200       
Total params: 61,592
Trainable params: 61,592
Non-trainable params: 0
_________________________________________________________________


In [2]:

model =  keras.models.Sequential([
    keras.layers.Embedding(120, 42, input_shape=(33,)),
    keras.layers.Conv1D(56, 5, activation='relu'),
    keras.layers.MaxPooling1D(pool_size=4),
    keras.layers.Bidirectional(keras.layers.LSTM(12)),
    keras.layers.Dense(8),
])
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 33, 42)            5040      
_________________________________________________________________
conv1d (Conv1D)              (None, 29, 56)            11816     
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 7, 56)             0         
_________________________________________________________________
bidirectional_2 (Bidirection (None, 24)                6624      
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 200       
Total params: 23,680
Trainable params: 23,680
Non-trainable params: 0
_________________________________________________________________


Note, The sequence is a unit of input for [RNN learning](https://keras.io/getting-started/sequential-model-guide/#training)!
**RNN** learns **probability for whole sequence** not for internal relations inside.

Here is example. Fun application, text prediction by RNN -> new context generator! AI can "write" poems, "news" etc. In order for RNN to learn probability in N-gram, we need to split each sentence in n-gram sequences that feeds RNN word by word. That forces learning probability for each n-gram not just "whole" sentence. Predict next word => multi-categorical classification problem as each word is a unique label not a number in continius Rn space.
```
 sentence = "How are you today?"
 encoded = [7 3 4 9]
 
 # keras.preprocessing.sequence.skipgrams()
 n_grams_padded = [
  [0 0 0 7 3],
  [0 0 7 3 4],
  [0 7 3 4 9]
 ]
 
 xs = [
  [0 0 0 7],
  [0 0 7 3],
  [0 7 3 4]
 ]
 labels = [
  3,
  4,
  9
 ]

# one-hot encoding into "word" categories!
ys = keras.utils.to_categorical(labels, num_classes=total_words)

```
Examples
- by Laurence [LSTM Shakespeare](https://github.com/lmoroney/dlaicourse/blob/master/TensorFlow%20In%20Practice/Course%203%20-%20NLP/NLP_Week4_Exercise_Shakespeare_Answer.ipynb)
- TF example [generating text using a character-based RNN](https://www.tensorflow.org/tutorials/text/text_generation).
