# Long Short-Term Memory (LSTM) Implementation

This notebook demonstrates LSTM networks which solve the vanishing gradient problem of traditional RNNs.

## Dataset
IMDB movie reviews for sentiment analysis.

## Topics Covered:
1. LSTM Architecture and Gates
2. Long-term Dependency Learning
3. Memory Cell Mechanism
4. Comparison with SimpleRNN

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding, Dropout, BatchNormalization
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

print(f"TensorFlow Version: {tf.__version__}")

In [None]:
# Load and preprocess data
vocab_size = 10000
max_length = 200

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=vocab_size)

X_train = pad_sequences(X_train, maxlen=max_length)
X_test = pad_sequences(X_test, maxlen=max_length)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"Class distribution: Positive={np.sum(y_train)}, Negative={len(y_train)-np.sum(y_train)}")

In [None]:
# Build LSTM model
model = Sequential([
    # Embedding layer
    Embedding(vocab_size, 128, input_length=max_length),
    
    # LSTM layers with dropout
    LSTM(128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2),
    BatchNormalization(),
    
    LSTM(64, dropout=0.2, recurrent_dropout=0.2),
    BatchNormalization(),
    
    # Dense layers
    Dense(32, activation='relu'),
    Dropout(0.5),
    
    Dense(16, activation='relu'),
    Dropout(0.5),
    
    # Output layer
    Dense(1, activation='sigmoid')
])

model.summary()

In [None]:
# Compile model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

print("Model compiled successfully!")

In [None]:
# Define callbacks
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True, verbose=1)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, min_lr=1e-6, verbose=1)
checkpoint = ModelCheckpoint('best_lstm_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)

callbacks = [early_stop, reduce_lr, checkpoint]

In [None]:
# Train model
history = model.fit(X_train, y_train,
                    validation_split=0.2,
                    epochs=15,
                    batch_size=128,
                    callbacks=callbacks,
                    verbose=1)

print("\nTraining completed!")

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Accuracy plot
axes[0].plot(history.history['accuracy'], marker='o', label='Training Accuracy')
axes[0].plot(history.history['val_accuracy'], marker='s', label='Validation Accuracy')
axes[0].set_title('LSTM Model Accuracy', fontsize=14)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Accuracy')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Loss plot
axes[1].plot(history.history['loss'], marker='o', label='Training Loss')
axes[1].plot(history.history['val_loss'], marker='s', label='Validation Loss')
axes[1].set_title('LSTM Model Loss', fontsize=14)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# Make predictions
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int).flatten()

print(f"\nPredictions made: {len(y_pred)}")

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=True,
            xticklabels=['Negative', 'Positive'],
            yticklabels=['Negative', 'Positive'])
plt.title('Confusion Matrix - LSTM', fontsize=14)
plt.ylabel('Actual Sentiment')
plt.xlabel('Predicted Sentiment')
plt.show()

# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

In [None]:
# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random Classifier')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve - LSTM Sentiment Analysis')
plt.legend(loc='lower right')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Analyze predictions
word_index = imdb.get_word_index()
reverse_word_index = {value: key for key, value in word_index.items()}

def decode_review(encoded_review):
    return ' '.join([reverse_word_index.get(i - 3, '?') for i in encoded_review if i >= 3])

# Show sample predictions
print("Sample Predictions:\n" + "="*80)
for i in range(5):
    review = decode_review(X_test[i])
    actual = 'Positive' if y_test[i] == 1 else 'Negative'
    predicted = 'Positive' if y_pred[i] == 1 else 'Negative'
    confidence = y_pred_prob[i][0]
    
    print(f"\nReview {i+1}:")
    print(f"Text: {review[:150]}...")
    print(f"Actual: {actual} | Predicted: {predicted} | Confidence: {confidence:.4f}")
    print("-" * 80)

In [None]:
# Save the final model
model.save('lstm_sentiment_model.h5')
print("Model saved as 'lstm_sentiment_model.h5'")

## Summary

### LSTM Architecture:
LSTMs use a gating mechanism to control information flow:

1. **Forget Gate**: Decides what to discard from cell state
2. **Input Gate**: Decides what new information to store
3. **Cell State**: Long-term memory
4. **Output Gate**: Decides what to output

### Key Advantages:
- **Long-term Dependencies**: Remembers information for extended periods
- **Vanishing Gradient Solution**: Gates prevent gradient vanishing
- **Better Accuracy**: Superior performance on sequence tasks
- **Flexible**: Works with variable-length sequences

### When to Use LSTM:
- Long sequences (>100 time steps)
- Complex dependencies
- Text generation
- Machine translation
- Speech recognition
- Time series forecasting

### LSTM vs SimpleRNN:
- **LSTM**: Better for long sequences, more parameters, slower
- **SimpleRNN**: Faster, simpler, but struggles with long sequences
- LSTM typically achieves 3-5% better accuracy
- LSTM is the default choice for most sequence tasks