# LSTM - Long Short Term Memory and 
# GRU - Gated Recurrent Unit


<img src='l1.png' />


In [None]:
- nlp tasks
- speech recognition
- time-series prediction
- speech synthesis
- text generation
- caption for a video

# RNN and Its drawbacks

In [None]:
- forward pass -> sort term memory
- backward pass -> vanishing gradient problem


new weights = weight - learning rate * gradient

# LSTM - Long Short Term Memory

<img src='l2.png' />

In [None]:
LSTM - mechanism -> gates
- it stores relevant information to make predictions for a long period of time



RNN
words -> vectors -> RNN process it one by one -> tanh[-1,1] -> hidden layer

[5,0.1,-0.5] -> tanh -> [0.99, 0.009, -0.46]

- small sentences
- less conputational as compared to LSTM and GRU

In [None]:
LSTM - the information should propagate forward

- cell state -> memory of the network
- gates -> neural network -> allow which information should be there on the cell
        -> gates can learn what information is relavant to keep or forget during training
    

Gates contains sigmoid activations
- sigmoid -> [0,1]

- Forget Gate
- Input Gate
- Cell State
- Output Gate


# Gates

## Forget Gate


<img src='l3.gif' />

In [None]:
- sigmoid -> [0,1]
forget gate - it will decide what information should be thrown away or kept

if the value is close to 0 -> forget
                close to 1 -> kept

## Input Gate

<img src='l4.gif' />

In [None]:
input gate -> to update the cell state


## Cell State

<img src='l5.gif' />

## Output Gate

<img src='l6.gif' />

In [None]:
forget gate -> decide what is relevant to keep from prior steps

inout gate -> what information is relevant to add from the current step

output -> determines what next hidden state should be

In [1]:
# algo for LSTM

def LSTMCell(prev_ct, prev_ht, input):
    combine = prev_ht + input
    ft = forget_layer(combine)
    candidate = candidate_layer(combine)
    it = input_layer(combine)
    ct = prev_ct * ft + candidate * it
    ot = output_layer(combine)
    ht = ot * tanh(ct)
    return ht, ct

In [None]:
# ct = [0,0,0]
# ht = [0,0,0]

# for input in inputs:
#     ct, ht = LSTMCell(ct,ht,input)

# GRU - Gated Recurrent Unit

<img src='l7.png' />

In [None]:
GRU -> it is newer generation of RNNs

- no cell state
- using hidden state to tranfer information

## update gate 


In [None]:
it is simialr to forget and input gate

- decide what information to throw away and what new information to add

## reset gate 

In [None]:
how much past information to forget

In [None]:
GRU are faster than LSTM
- less tensor operations

# PyTorch Code for LSTM on Movie Review Data

In [None]:
!pip install torchtext

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext import data
from torchtext import datasets

In [None]:
# Set random seed for reproducibility
seed = 42
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True

TEXT = data.Field(tokenize='spacy', lower=True)
LABEL = data.LabelField(dtype=torch.float)

In [None]:
# Load the IMDb dataset
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

# Split the training data into training and validation sets
train_data, valid_data = train_data.split(random_state=random.seed(seed))

# Build the vocabulary
TEXT.build_vocab(train_data, max_size=25000, vectors="glove.6B.100d", unk_init=torch.Tensor.normal_)
LABEL.build_vocab(train_data)



In [None]:
# Set up the iterators
BATCH_SIZE = 64

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size=BATCH_SIZE,
    sort_key=lambda x: len(x.text),
    sort_within_batch=True
)



In [None]:
# Define the LSTM model
class LSTMModel(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout):
        super().__init__()

        # Embedding layer
        self.embedding = nn.Embedding(input_dim, embedding_dim)

        # LSTM layer
        self.lstm = nn.LSTM(
            embedding_dim,
            hidden_dim,
            num_layers=n_layers,
            bidirectional=bidirectional,
            dropout=dropout if n_layers > 1 else 0
        )

        # Fully connected layer
        self.fc = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)

        # Dropout layer
        self.dropout = nn.Dropout(dropout)

    def forward(self, text):
        # Pass the input through the embedding layer
        embedded = self.dropout(self.embedding(text))

        # Pass the embedded input through the LSTM layer
        output, (hidden, cell) = self.lstm(embedded)

        # Concatenate the final forward and backward hidden states
        hidden = self.dropout(torch.cat((hidden[-2, :, :], hidden[-1, :, :]), dim=1))

        # Pass the concatenated hidden state through the fully connected layer
        return self.fc(hidden)



In [None]:
# Initialize the model
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5

model = LSTMModel(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM, N_LAYERS, BIDIRECTIONAL, DROPOUT)

# Print the model architecture
print(model)

# Define the loss and optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters())



In [None]:
# Train the model
NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
    for batch in train_iterator:
        text, labels = batch.text, batch.label
        optimizer.zero_grad()
        predictions = model(text).squeeze(1)
        loss = criterion(predictions, labels)
        loss.backward()
        optimizer.step()



In [None]:
# Evaluate the model
def evaluate(model, iterator, criterion):
    model.eval()
    total_loss = 0
    total_correct = 0

    with torch.no_grad():
        for batch in iterator:
            text, labels = batch.text, batch.label
            predictions = model(text).squeeze(1)
            loss = criterion(predictions, labels)
            total_loss += loss.item()

            # Round predictions to the nearest integer (0 or 1)
            rounded_preds = torch.round(torch.sigmoid(predictions))
            total_correct += (rounded_preds == labels).sum().item()

    return total_loss / len(iterator), total_correct / len(iterator.dataset)

# Evaluate the model on the validation set
val_loss, val_acc = evaluate(model, valid_iterator, criterion)
print(f'Validation Loss: {val_loss:.3f}, Validation Accuracy: {val_acc * 100:.2f}%')


Dataset

https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews