# **Long Short-Term Memory (LSTM)**

---

## **Notebook Structure**

1. **Metadata**
2. **Title and Objective**
3. **Acknowledgement**
4. **Exploratory Data Analysis (EDA)**
5. **Concept Explanation**
   - What is LSTM?
   - Why Use LSTM?
   - LSTM Mechanics
6. **Mathematical Intuition (Simplified)**
7. **Advantages and Disadvantages**
8. **Model Implementation**
   - Data Preparation
   - Building the Model
   - Training the Model
9. **Model Performance Evaluation**
   - Accuracy and Loss Curves
   - Predictions and Metrics
10. **Learnings and Conclusion**

---

## **1. Metadata**

- **Dataset**: IMDb Large Movie Review Dataset
- **Task**: Sentiment Analysis
- **Tech Stack**: TensorFlow, Keras, Python
- **Model**: Long Short-Term Memory (LSTM)

---

## **2. Title and Objective**

### **Title**
**Sentiment Analysis using LSTM on IMDb Dataset**

### **Objective**
To understand and implement Long Short-Term Memory (LSTM) networks for sentiment analysis on the IMDb dataset and evaluate the model's performance with visualizations and metrics.

---

## **3. Acknowledgement**
The dataset is sourced from the [Keras IMDb dataset](https://keras.io/api/datasets/imdb/). Special thanks to the creators for making this dataset accessible for research and educational purposes.

---

## **4. Exploratory Data Analysis (EDA)**

# **Comprehensive Guide to Long Short-Term Memory (LSTM)**

---

## **1. Introduction**

### **1.1 What is LSTM?**
Long Short-Term Memory (LSTM) networks are an advanced type of Recurrent Neural Networks (RNNs), introduced in 1997 by Hochreiter and Schmidhuber. They are specifically designed to solve the problems of traditional RNNs, such as the inability to learn long-term dependencies effectively.

#### **Key Features**
- LSTMs use **memory cells** to store information for longer durations.
- They rely on **gates** (Forget Gate, Input Gate, and Output Gate) to control the flow of information:
  - **Forget Gate** decides what to remove from memory.
  - **Input Gate** determines what new information to add.
  - **Output Gate** controls what information is passed to the next layer or step.

---

### **1.2 Why Use LSTMs?**
Traditional RNNs face two major problems:
1. **Vanishing Gradients**: RNNs fail to retain information from earlier steps in long sequences because the signal diminishes as it travels backward during training.
2. **Exploding Gradients**: Sometimes, the signal grows uncontrollably, making the learning unstable.

LSTMs solve these problems by:
- Using memory cells to preserve information over time.
- Adding gates to selectively update or forget information.

#### **Applications of LSTMs**
- **Natural Language Processing (NLP)**: Sentiment analysis, machine translation, text summarization.
- **Speech Recognition**: Transcribing audio into text.
- **Time-Series Forecasting**: Predicting stock prices, energy consumption, etc.
- **Video Analysis**: Identifying actions or patterns in video sequences.

---

## **2. How LSTMs Work**

### **2.1 Architecture Overview**
LSTM cells are the building blocks of LSTM networks. Each cell contains three key gates:
1. **Forget Gate**: This decides what past information to discard. For example:
   - "Forget the context of the previous sentence if it’s no longer relevant."
2. **Input Gate**: This decides what new information to add to the memory.
3. **Output Gate**: This decides what information to send to the next step or layer.

---

### **2.2 LSTM Gates in Simple Terms**

#### **Forget Gate**
The forget gate filters out irrelevant information:
- It looks at the current input and the previous output (hidden state).
- Based on this, it decides which parts of the memory to keep or forget.
- For example: *"Forget the irrelevant details of the sentence before."*

#### **Input Gate**
The input gate decides what new information to add to the memory:
- It checks the current input and the previous output.
- Then it determines what should be updated or added to the memory.
- For example: *"Add the new context of the current word to the memory."*

#### **Memory Update**
The memory cell is updated by:
1. Forgetting irrelevant information (Forget Gate).
2. Adding new relevant information (Input Gate).

#### **Output Gate**
The output gate determines what part of the memory should be passed to the next step:
- It uses the updated memory and decides what’s relevant for the next layer or step.
- For example: *"Output the current context to understand the sentence better."*

---

## **3. Advantages and Disadvantages of LSTMs**

### **3.1 Advantages**
1. **Handles Long-Term Dependencies**: Retains relevant information over long sequences.
2. **Selective Information Flow**: Gates ensure that only important information is stored and propagated.
3. **Wide Applicability**: Useful for tasks like NLP, speech recognition, and time-series forecasting.

### **3.2 Disadvantages**
1. **Computationally Expensive**: LSTMs are more resource-intensive compared to traditional RNNs.
2. **Slower Training**: The added complexity of gates increases the training time.
3. **Overfitting**: Without sufficient data, LSTMs can overfit on training data.

---

## **4. Code: Implementing an LSTM for Sentiment Analysis**

### **4.1 Dataset**
We will use the IMDb dataset, a standard dataset for sentiment analysis tasks. The goal is to classify movie reviews as positive or negative.



In [39]:
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

### **5. Concept Explanation**

#### **5.1 What is LSTM?**
Long Short-Term Memory (LSTM) networks are a specialized type of Recurrent Neural Network (RNN) designed to handle sequential data while mitigating the vanishing gradient problem. Traditional RNNs suffer from short memory due to exponential decay in gradients, making it difficult to learn long-range dependencies. LSTMs address this issue using **gated mechanisms** that regulate the flow of information.

An LSTM cell consists of:
- A **Forget Gate** to decide which past information should be discarded.
- An **Input Gate** to determine what new information should be stored in the cell state.
- A **Cell State** that carries long-term memory across time steps.
- An **Output Gate** to control what information should be passed to the next hidden state.

#### **5.2 Why Use LSTM?**
LSTMs are widely used in deep learning applications where sequential dependencies exist. Key reasons for their use include:
- **Handles Long-Term Dependencies:** Unlike standard RNNs, LSTMs retain relevant information over long sequences, making them ideal for time-series and NLP tasks.
- **Gate Mechanisms:** The **forget, input, and output gates** help selectively store and discard information, preventing the model from accumulating irrelevant past data.
- **Widely Used in Applications:**  
  - **Natural Language Processing (NLP):** Sentiment analysis, machine translation, text generation.
  - **Speech Recognition:** Processing continuous speech signals.
  - **Time-Series Forecasting:** Predicting stock prices, weather forecasting, etc.
  - **Anomaly Detection:** Identifying unusual patterns in data over time.

---

### 🧠 **How an LSTM Works (Step by Step)**  

An **LSTM (Long Short-Term Memory)** is a special type of neural network that helps remember important information and forget unnecessary details. It does this using **three gates**:  

### 1️⃣ **Forget Gate (Decides what to forget)**  
👉 The LSTM first looks at the previous output and the current input. It decides what **old information** is not important anymore and should be forgotten.  

💡 **Think of it like cleaning your memory:**  
_"Do I still need this old info, or can I forget it?"_  

---

### 2️⃣ **Input Gate (Decides what to learn)**  
👉 Next, the LSTM decides what **new information** should be stored.  

💡 **Think of it like learning new things:**  
_"Is this new information useful enough to remember?"_  

---

### 3️⃣ **Memory Update (Update what’s stored)**  
👉 The LSTM updates its memory by combining:  
- The important old information (after forgetting unnecessary parts).  
- The important new information (after filtering what to keep).  

💡 **Think of it like updating a notebook:**  
_"Keep the useful old notes, add the important new ones."_  

---

### 4️⃣ **Output Gate (Decides what to pass forward)**  
👉 Finally, the LSTM decides what part of the memory should be sent as **output** for the next step.  

💡 **Think of it like answering a question:**  
_"Based on what I know, what should I say next?"_  

---

### 🔥 **Final Takeaway:**  
LSTM is like a smart assistant:  
✅ It **remembers** important things.  
❌ It **forgets** what’s not needed.  
📝 It **updates** its knowledge.  
📢 It **outputs** relevant information.  

Does this explanation make sense? 😊

### **7. Advantages and Disadvantages**

#### **Advantages**
- Retains long-term dependencies.
- Effective for sequential data (text, time-series).
- Handles vanishing gradient problem.

#### **Disadvantages**
- Computationally intensive.
- Longer training times.
- May overfit without proper regularization.


In [40]:
data = pd.read_csv("/content/IMDB Dataset.csv")

In [41]:
data.shape

(50000, 2)

### **Explanation**

1. **Padding**: Ensures all reviews are of the same length (100 words).
2. **Result**: Each review is now represented as a 2D array of shape `(number of reviews, maxlen)`.


In [42]:
data.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [43]:
data["sentiment"].value_counts()

Unnamed: 0_level_0,count
sentiment,Unnamed: 1_level_1
positive,25000
negative,25000


In [44]:
data.replace({"sentiment": {"positive": 1, "negative": 0}}, inplace=True)

  data.replace({"sentiment": {"positive": 1, "negative": 0}}, inplace=True)


In [45]:
# split data into training data and test data
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

In [46]:
print(train_data.shape)
print(test_data.shape)

(40000, 2)
(10000, 2)


In [47]:
# Tokenize text data
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(train_data["review"])
X_train = pad_sequences(tokenizer.texts_to_sequences(train_data["review"]), maxlen=200)
X_test = pad_sequences(tokenizer.texts_to_sequences(test_data["review"]), maxlen=200)

Here’s your explanation in a single Markdown cell:  

```markdown
## 📌 Tokenizing and Padding Text Data

This code processes text reviews by converting them into numerical sequences suitable for deep learning models.

### 🔹 Step 1: Initialize Tokenizer  
We create a `Tokenizer` to process text data, keeping only the **top 5,000 most frequent words** to reduce memory usage.  
```python
tokenizer = Tokenizer(num_words=5000)
```

### 🔹 Step 2: Fit Tokenizer on Training Data  
The tokenizer learns the vocabulary from the training dataset and creates a **word-to-index mapping**.  
Example: `"good movie"` → `{ "good": 1, "movie": 2 }`  
```python
tokenizer.fit_on_texts(train_data["review"])
```

### 🔹 Step 3: Convert Text to Sequences  
Each review is transformed into a sequence of numbers based on the learned vocabulary.  
Example: `"good movie"` → `[1, 2]`  
```python
X_train = tokenizer.texts_to_sequences(train_data["review"])
X_test = tokenizer.texts_to_sequences(test_data["review"])
```

### 🔹 Step 4: Pad Sequences  
Since reviews have different lengths, we **pad** or **truncate** them to a fixed length of **200 words**.  
- **Short reviews** get padded with zeros at the beginning.  
- **Long reviews** get truncated from the start.  
This ensures all inputs have the same shape for the LSTM model.  
```python
X_train = pad_sequences(X_train, maxlen=200)
X_test = pad_sequences(X_test, maxlen=200)
```

✅ **Final Output:** `X_train` and `X_test` now contain **numerical representations** of the text reviews, ready for deep learning! 🚀  
```

This keeps everything concise and well-structured in a single Markdown cell. Let me know if you need tweaks! 😊

In [48]:
print(X_train)

[[1935    1 1200 ...  205  351 3856]
 [   3 1651  595 ...   89  103    9]
 [   0    0    0 ...    2  710   62]
 ...
 [   0    0    0 ... 1641    2  603]
 [   0    0    0 ...  245  103  125]
 [   0    0    0 ...   70   73 2062]]


In [49]:
print(X_test)

[[   0    0    0 ...  995  719  155]
 [  12  162   59 ...  380    7    7]
 [   0    0    0 ...   50 1088   96]
 ...
 [   0    0    0 ...  125  200 3241]
 [   0    0    0 ... 1066    1 2305]
 [   0    0    0 ...    1  332   27]]


In [50]:
Y_train = train_data["sentiment"]
Y_test = test_data["sentiment"]

In [51]:
print(Y_train)

39087    0
30893    0
45278    1
16398    0
13653    0
        ..
11284    1
44732    1
38158    0
860      1
15795    1
Name: sentiment, Length: 40000, dtype: int64


# **LSTM - Long Short-Term Memory**

In [52]:
# build the model

model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=128, input_length=200))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation="sigmoid"))



Here’s the explanation for your model in a single Markdown cell:  

```markdown
## 📌 Building the LSTM Model

This code defines a **Sequential** model using an **LSTM (Long Short-Term Memory)** layer for text classification.

### 🔹 Step 1: Initialize the Model  
We use `Sequential()`, which allows us to stack layers one after another.  
```python
model = Sequential()
```

---

### 🔹 Step 2: Add an Embedding Layer  
The **Embedding layer** converts words into dense vector representations.  
- `input_dim=5000`: The vocabulary size (top 5,000 most frequent words).  
- `output_dim=128`: Each word is represented as a **128-dimensional vector**.  
- `input_length=200`: Each review has a fixed length of **200 words**.  

```python
model.add(Embedding(input_dim=5000, output_dim=128, input_length=200))
```

---

### 🔹 Step 3: Add an LSTM Layer  
The **LSTM layer** processes the sequence of word embeddings and captures long-term dependencies in text.  
- `128`: The number of LSTM units (neurons).  
- `dropout=0.2`: Prevents overfitting by randomly setting 20% of inputs to zero during training.  
- `recurrent_dropout=0.2`: Adds dropout to the **recurrent** connections within the LSTM cell.  

```python
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
```

---

### 🔹 Step 4: Add a Dense Output Layer  
- `Dense(1)`: A **fully connected layer** with 1 neuron, since we’re performing **binary classification**.  
- `activation="sigmoid"`: The **sigmoid activation function** ensures the output is a probability (between 0 and 1).  

```python
model.add(Dense(1, activation="sigmoid"))
```

✅ **Final Output:** This LSTM model is now ready to be compiled and trained for text classification! 🚀  
```

This Markdown cell keeps it concise while explaining each part in detail. Let me know if you need any refinements! 😊

In [54]:
# compile the model
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

In [None]:
model.summary()

In [55]:
model.fit(X_train, Y_train, epochs=5, batch_size=64, validation_split=0.2)

Epoch 1/5
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m161s[0m 319ms/step - accuracy: 0.7176 - loss: 0.5359 - val_accuracy: 0.8511 - val_loss: 0.3513
Epoch 2/5
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m202s[0m 320ms/step - accuracy: 0.8590 - loss: 0.3492 - val_accuracy: 0.8466 - val_loss: 0.3598
Epoch 3/5
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m201s[0m 318ms/step - accuracy: 0.8783 - loss: 0.3038 - val_accuracy: 0.8652 - val_loss: 0.3217
Epoch 4/5
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m199s[0m 313ms/step - accuracy: 0.8913 - loss: 0.2736 - val_accuracy: 0.8704 - val_loss: 0.3278
Epoch 5/5
[1m500/500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m198s[0m 307ms/step - accuracy: 0.9053 - loss: 0.2429 - val_accuracy: 0.8731 - val_loss: 0.3160


<keras.src.callbacks.history.History at 0x7827c75ee4d0>

In [56]:
loss, accuracy = model.evaluate(X_test, Y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 62ms/step - accuracy: 0.8714 - loss: 0.3066
Test Loss: 0.30455735325813293
Test Accuracy: 0.8755999803543091


In [57]:
def predict_sentiment(review):
  # tokenize and pad the review
  sequence = tokenizer.texts_to_sequences([review])
  padded_sequence = pad_sequences(sequence, maxlen=200)
  prediction = model.predict(padded_sequence)
  sentiment = "positive" if prediction[0][0] > 0.5 else "negative"
  return sentiment

```markdown
## 📌 Sentiment Analysis Using LSTM  

This code defines a function to predict whether a given review has a **positive** or **negative** sentiment using a trained LSTM model.  

---

### 🔹 Step 1: Tokenize the Input Review  
- The review is converted into a sequence of numbers using the same tokenizer that was trained earlier.  
- `texts_to_sequences([review])` maps words to their corresponding indices.  

```python
sequence = tokenizer.texts_to_sequences([review])
```

---

### 🔹 Step 2: Pad the Sequence  
- The sequence is padded to ensure it has a fixed length of **200 words** (same as the training data).  
- If the review is **shorter**, it gets padded with zeros.  
- If the review is **longer**, it gets truncated.  

```python
padded_sequence = pad_sequences(sequence, maxlen=200)
```

---

### 🔹 Step 3: Predict Sentiment  
- The **trained LSTM model** predicts a probability between 0 and 1.  
- `model.predict(padded_sequence)` returns a probability score.  

```python
prediction = model.predict(padded_sequence)
```

---

### 🔹 Step 4: Interpret the Prediction  
- If the probability is **greater than 0.5**, the review is classified as **"positive"**.  
- Otherwise, it is classified as **"negative"**.  

```python
sentiment = "positive" if prediction[0][0] > 0.5 else "negative"
```

---

### 🔹 Complete Function  
```python
def predict_sentiment(review):
    # Tokenize and pad the review
    sequence = tokenizer.texts_to_sequences([review])
    padded_sequence = pad_sequences(sequence, maxlen=200)
    
    # Predict sentiment
    prediction = model.predict(padded_sequence)
    sentiment = "positive" if prediction[0][0] > 0.5 else "negative"
    
    return sentiment
```

✅ **Final Output:** The function returns **"positive"** or **"negative"** based on the predicted sentiment! 🚀  
```

This Markdown cell includes **everything** in a structured and well-explained format. Let me know if you need modifications! 😊

In [58]:
# example usage
new_review = "This movie was fantastic. I loved it."
sentiment = predict_sentiment(new_review)
print(f"The sentiment of the review is: {sentiment}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 171ms/step
The sentiment of the review is: positive


In [59]:
# example usage
new_review = "This movie was not that good"
sentiment = predict_sentiment(new_review)
print(f"The sentiment of the review is: {sentiment}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
The sentiment of the review is: negative


In [60]:
# example usage
new_review = "This movie was ok but not that good."
sentiment = predict_sentiment(new_review)
print(f"The sentiment of the review is: {sentiment}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step
The sentiment of the review is: negative
