## **Long short-term Memory Network (LSTM)**

Meant to keep information in longer sequences.

* two main connections: short-term (SimpleRNN), long-term (keep in memory important stuff early in the seq)


In [8]:
## sigmoid (outputs from 0 - 1)
import numpy as np

def sigmoid(z):
  return 1 / (1 + np.exp(-z))

def tanh(z):
  return (np.exp(z) - np.exp(-z)) / (np.exp(z) + np.exp(-z))

In [11]:
tanh(-10)

np.float64(-0.9999999958776926)

## **LSTM Neuron**

Functions in the backward

* Forget Gate
* Input Gate (Input Node)
* Output Gate

**Forget Gate**

Controls how much of the previous hidden state (short-term) to keep. It uses the sigmoid activation as a weight on how much info to keep.

Mathematically,

$$f_t = \sigmoid(W_xf* x + W_hf + b)$$

In [21]:
## value of the sequence = 0.5
x_t = np.array([0])

## short term (hidden state) = 0
h_prev = np.array([100])

## long term (memory) = 0
c_prev = np.array([0.3])

In [22]:
## forget gate needs to determine
## Wxf (weight of x)
## Whf (weight of short term)
## bf
Wxf = 0.8
Whf = 0.5
bf = 0.2

## forget gate output (scalar 0 - 1)
ft = sigmoid(Wxf * x_t + Whf * h_prev + bf)

In [24]:
## how does that affect the long term
c_new = c_prev*ft

In [26]:
c_new

array([0.3])

**Input Gate**

This decides how much new information to add to the LONG TERM memory from what's going on in the short term.

* tanh() - direction - weight
* sigmoid() - percentage

$$ i = \sigma(Wxi *x + Whi * h + bi)$$
$$ C = tanh(Wxc * x + Whc* h + bc)$$

output
$$C = ft *C_{old} + i *C_{old}$$

In [28]:
## weights and biases (input gate)
Wxi = 0.7
Whi = 0.6
bi = 0.1

## long term weights
Wxc = 0.9
Whc = 0.4
bc = 0.05

## input gate
it = sigmoid(Wxi *x_t + Whi*h_prev + bi)
it

## candidate new memory (long term)
C_tilde = tanh(Wxc*x_t + Whc*h_prev + bc)
C_tilde

array([1.])

In [29]:
## updated cell (forget gate) + it * C_tilde
Ct = ft + it*C_tilde

In [30]:
Ct

array([2.])

##**LSTM Network Implementation**


In [1]:
## imdb reviews
import tensorflow as tf
import tensorflow_datasets as tfds

dataset, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True)



Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.5ZRRDG_1.0.0/imdb_reviews-train.tfrecor…

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.5ZRRDG_1.0.0/imdb_reviews-test.tfrecord…

Generating unsupervised examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.5ZRRDG_1.0.0/imdb_reviews-unsupervised.…

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.


In [2]:
## y = {positive: 1, negative: 0}
train_dataset, test_dataset = dataset['train'], dataset['test']

In [4]:
## parameters
batch_size = 64 ## gradient descent (update weights every 64 obs)
buffer_size = 10000
embedding_dim = 64
max_sequence_length = 200

In [5]:
## prepare data set
train_dataset = train_dataset.shuffle(buffer_size).batch(batch_size).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(batch_size).prefetch(tf.data.AUTOTUNE)

In [6]:
## vectorization layer
vectorize_layer = tf.keras.layers.TextVectorization(max_tokens=100, output_mode='int', output_sequence_length=max_sequence_length)

In [7]:
## use vectorizer on training
train_text = train_dataset.map(lambda x, y: x)
vectorize_layer.adapt(train_text)

In [19]:
## BUILD A LSTM NETWORK
from tensorflow.keras.layers import Dense, Embedding, Bidirectional, LSTM
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(vectorize_layer)
model.add(Embedding(100, embedding_dim, mask_zero=True))
model.add(Bidirectional(LSTM(64))) ## relu function is not as good in LSTMs (way faster)
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid')) ## binary

In [20]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [21]:
## fit
model.fit(train_dataset, epochs=5, validation_data=test_dataset)

Epoch 1/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 29ms/step - accuracy: 0.5708 - loss: 0.6709 - val_accuracy: 0.6923 - val_loss: 0.5912
Epoch 2/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 26ms/step - accuracy: 0.6520 - loss: 0.6273 - val_accuracy: 0.6970 - val_loss: 0.5860
Epoch 3/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 26ms/step - accuracy: 0.6762 - loss: 0.6134 - val_accuracy: 0.7088 - val_loss: 0.5710
Epoch 4/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 24ms/step - accuracy: 0.7008 - loss: 0.5722 - val_accuracy: 0.6996 - val_loss: 0.5928
Epoch 5/5
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 25ms/step - accuracy: 0.7048 - loss: 0.5654 - val_accuracy: 0.7076 - val_loss: 0.5623


<keras.src.callbacks.history.History at 0x7eac53de1590>

In [22]:
# prompt: check performance on test

loss, accuracy = model.evaluate(test_dataset)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")

[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 14ms/step - accuracy: 0.7054 - loss: 0.5653
Loss: 0.5623095631599426
Accuracy: 0.7075999975204468
