
# Long Short-Term Memory (LSTM) Networks Overview

This notebook provides an overview of Long Short-Term Memory (LSTM) networks, their architecture, how they work, and a basic implementation using a text dataset (IMDB reviews).



## Background

### Long Short-Term Memory (LSTM) Networks

LSTM networks are a type of recurrent neural network (RNN) designed to overcome the limitations of standard RNNs, particularly the vanishing gradient problem. They are well-suited for tasks that involve sequential data, such as language modeling, time series prediction, and speech recognition.

### Key Components of LSTMs

- **Cell State**: A memory unit that carries information across time steps.
- **Gates**: Mechanisms that regulate the flow of information:
  - **Forget Gate**: Decides what information to discard.
  - **Input Gate**: Decides what new information to store.
  - **Output Gate**: Decides what information to output.

### Applications of LSTMs

LSTMs are widely used in tasks such as text generation, sentiment analysis, machine translation, and speech synthesis.



## Mathematical Foundation

### The LSTM Cell

An LSTM cell updates its hidden state \( h_t \) and cell state \( C_t \) using the following steps:

1. **Forget Gate** \( f_t \):

\[
f_t = \sigma(W_{xf}x_t + W_{hf}h_{t-1} + b_f)
\]

2. **Input Gate** \( i_t \) and **Candidate Values** \( \tilde{C}_t \):

\[
i_t = \sigma(W_{xi}x_t + W_{hi}h_{t-1} + b_i)
\]
\[
\tilde{C}_t = \tanh(W_{xc}x_t + W_{hc}h_{t-1} + b_c)
\]

3. **Update Cell State** \( C_t \):

\[
C_t = f_t \ast C_{t-1} + i_t \ast \tilde{C}_t
\]

4. **Output Gate** \( o_t \) and **Hidden State** \( h_t \):

\[
o_t = \sigma(W_{xo}x_t + W_{ho}h_{t-1} + b_o)
\]
\[
h_t = o_t \ast \tanh(C_t)
\]

Where:
- \( \sigma \) is the sigmoid activation function.
- \( \ast \) denotes element-wise multiplication.
- \( W \) and \( b \) are the weight matrices and bias vectors.



## Implementation in Python

We'll implement an LSTM using TensorFlow and Keras on a text sequence dataset (IMDB movie reviews).


In [None]:

import tensorflow as tf
from tensorflow.keras import datasets, layers, models, preprocessing
import matplotlib.pyplot as plt

# Load and preprocess the IMDB dataset
max_features = 10000  # Number of words to consider as features
maxlen = 500  # Cut texts after this number of words

(x_train, y_train), (x_test, y_test) = datasets.imdb.load_data(num_words=max_features)
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)

# Build the LSTM model
model = models.Sequential([
    layers.Embedding(max_features, 128, input_length=maxlen),
    layers.LSTM(128, return_sequences=True),
    layers.LSTM(64),
    layers.Dense(1, activation='sigmoid')
])

# Compile and train the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=32, validation_data=(x_test, y_test))

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc}")

# Plot sample predictions
predictions = model.predict(x_test[:10])

for i in range(10):
    print(f"Review {i+1}:")
    print(f"Prediction: {'Positive' if predictions[i] > 0.5 else 'Negative'}, Actual: {'Positive' if y_test[i] == 1 else 'Negative'}")



## Conclusion

This notebook provided an overview of LSTM networks, their architecture, and a basic implementation using the IMDB dataset. LSTMs are a powerful tool for sequential data processing, capable of learning long-term dependencies.
