# RNN mainly used for natual language processing
<br>

## Why not ANN?
- ### Variable size of neurons in a layer
- ### Too much computation
- ### Parameters are not shared (can't switch sentence sequence)
<br>

## Named Entity Recognition (many input to many output)
- ### Finds out the entity and mark it using ones and zeros
- ### Only one hidden layer, looping through this hidden layer (can have multiple hidden layers)
<br>

## Sentiment Analysis (many input to one output)
- ### Input a sentence, output a review score
- ### Provide a single word/note, output a poem/song

### Vanishing Gradients will make smaller weights even smaller (make learning very slow)
### Exploding Gradients will make larger weights even larger
<br><br>
### Traditional RNN does not account for relationships for words that are far apart from each other (short term memory)
### GRU and LSTM accounts for this shorter memory problem
<img src="https://media.discordapp.net/attachments/763819251249184789/858042335228854302/image.png" width=700>

## LSTM:
- ### Store important words in a long term memory, also keep the short term memory. Forget old keyword (Forget Gate) after a period and record new keywords (Input Gate)
- ### More gate, more accurate but takes longer
<img src="https://media.discordapp.net/attachments/763819251249184789/858189868516507668/image.png" width=700>

<br>

## GRU (Gated Recurrent Units):
- ### Combined long term and short term memory
- ### More efficient, 2 gates (reset, update)
<img src="https://media.discordapp.net/attachments/763819251249184789/858192984259690506/image.png" width=700>

<br>

## Bidirectional RNN
-  ### Goes backward to determine previous keywords using context

<br>

## Preprocessing (cvt words to numbers)
- ### Assign unique numbers to words. Cons: ramdom and don't record relationships between words
- ### One hot encoding. Cons: No relationships, computation inefficient
- ### Word embedding: convert words to vectors with different features (automatic process) (TF-IDF, Word2Vec)
<img src="https://media.discordapp.net/attachments/763819251249184789/858197391496577054/image.png" width=700>



## Supervised learning (not popular)
- ### Take a NLP problem and tru to solve it, get word embeddings as side effect
- ### Come up with a embedding size (4, 10, 300) that becomes matrix E
- ### Full process (also need padding):
<img src="https://cdn.discordapp.com/attachments/763819251249184789/858199864655609856/image.png" width=700>
- ### Eventually word with similar meanings will have similar feature vectors

In [28]:
import numpy as np
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Embedding

reviews = ['nice food',
        'amazing restaurant',
        'too good',
        'just loved it!',
        'will go again',
        'horrible food',
        'never go there',
        'poor service',
        'poor quality',
        'needs improvement']

sentiment = np.array([1,1,1,1,1,0,0,0,0,0])


vocab_size = 50

enc_rev = [one_hot(d, vocab_size) for d in reviews]
max_len = 3
padded_rev = pad_sequences(enc_rev, maxlen=max_len, padding="post") # added padding

X = padded_rev
y = sentiment
X

array([[13, 34,  0],
       [42, 28,  0],
       [18,  8,  0],
       [16, 46, 46],
       [46, 19,  8],
       [12, 34,  0],
       [ 2, 19,  3],
       [48, 33,  0],
       [48, 20,  0],
       [38, 17,  0]])

In [29]:
model = Sequential([
    Embedding(vocab_size, 4, input_length=max_len, name="embedding"),
    Flatten(),
    Dense(1, activation="sigmoid"),
])

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

In [30]:
model.fit(X, y, epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x1eb6ac3b9d0>

In [31]:
loss, acc = model.evaluate(X, y)
acc



1.0

In [32]:
# get embeddings
w = model.get_layer("embedding").get_weights()[0]
len(w)

50