# Class 2: Recurrent Neural Networks (RNNs) and Transformers Overview

## Welcome to Week 10, Class 2!
In this notebook, we’ll explore **sequence models**, which are crucial for tasks like sentiment analysis and text generation where word order matters. We’ll cover **Recurrent Neural Networks (RNNs)** and introduce **transformers**, then apply them to sentiment analysis, building on Class 1’s NLP skills.

**Objectives**:
- Understand why sequence models are needed for text.
- Learn the basics of RNNs and their challenges.
- Get a high-level overview of transformers and attention.
- Train sentiment analysis models using an RNN (LSTM) and a transformer.

**Let’s dive in!**

## 1. Why Sequence Models?
In Class 1, we used **bag-of-words (BoW)**, which ignores word order (e.g., "not good" vs. "good"). Sequence models like RNNs and transformers:
- Capture **context** and **order** in text.
- Handle tasks like:
  - Sentiment analysis (e.g., understanding long reviews).
  - Text generation (e.g., writing sentences).
  - Translation (e.g., English to Spanish).

**Discussion Question**: Why does word order matter in sentences like "The movie is great" vs. "Is the movie great"?

## 2. Setup
Let’s install and import the required libraries. Run the cell below to set up the environment.

In [None]:
# Install libraries (uncomment if needed)
# !pip install numpy pandas tensorflow scikit-learn transformers datasets torch

# Import libraries
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')

print("Setup complete!")

## 3. Recurrent Neural Networks (RNNs)
RNNs process sequences (e.g., words in a sentence) one step at a time, maintaining a **hidden state** to remember context.

**Key Points**:
- **Architecture**: Input → Hidden State → Output, looped over time steps.
- **Use Case**: Sentiment analysis, where earlier words affect later ones.
- **Challenge**: **Vanishing gradients** make it hard to learn long-term dependencies (e.g., context from 50 words ago).
- **Solution**: Variants like **LSTM** (Long Short-Term Memory) improve memory.

**Analogy**: Reading a book and remembering key plot points as you go.

### 3.1 Hands-On: Sentiment Analysis with LSTM
Let’s train an LSTM model for sentiment analysis using a small dataset. We’ll:
1. Preprocess text (convert to sequences).
2. Build an LSTM model.
3. Train and evaluate it.

#### Step 1: Load and Preprocess Data
We’ll use a toy dataset of movie reviews.

In [None]:
# Toy dataset
data = {
    "review": [
        "I loved the movie it was great",
        "Terrible film so boring and dull",
        "Amazing story and fantastic acting",
        "I hated this movie it was awful",
        "Really enjoyed the plot and characters",
        "Worst movie ever do not watch"
    ],
    "sentiment": [1, 0, 1, 0, 1, 0]  # 1 = positive, 0 = negative
}
df = pd.DataFrame(data)
print(df)

# Tokenize and pad sequences
max_words = 1000  # Vocabulary size
max_len = 20      # Max sequence length

tokenizer = Tokenizer(num_words=max_words, oov_token="<OOV>")
tokenizer.fit_on_texts(df["review"])
sequences = tokenizer.texts_to_sequences(df["review"])
padded_sequences = pad_sequences(sequences, maxlen=max_len, padding="post", truncating="post")

# Labels
y = df["sentiment"].values

print("\nPadded sequences:\n", padded_sequences)

**What’s Happening?**
- **Tokenizer**: Converts words to integers (e.g., "movie" → 5).
- **Padding**: Ensures all sequences are the same length by adding zeros.

#### Step 2: Build and Train LSTM Model

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(padded_sequences, y, test_size=0.2, random_state=42)

# Build model
model = Sequential([
    Embedding(max_words, 16, input_length=max_len),  # Convert words to dense vectors
    LSTM(32, return_sequences=False),               # LSTM layer
    Dense(1, activation="sigmoid")                  # Output: probability of positive sentiment
])

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

# Train (small dataset, so few epochs)
history = model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test), verbose=1)

#### Step 3: Evaluate Model

In [None]:
# Predict
y_pred = (model.predict(X_test) > 0.5).astype(int)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=["Negative", "Positive"]))

#### Step 4: Test a New Review
**Your Turn**: Predict the sentiment of a new review.

In [None]:
# Test new review
new_review = "This movie was absolutely fantastic"
new_sequence = tokenizer.texts_to_sequences([new_review])
new_padded = pad_sequences(new_sequence, maxlen=max_len, padding="post", truncating="post")
prediction = model.predict(new_padded)[0][0]
print(f"Review: {new_review}\nSentiment: {'Positive' if prediction > 0.5 else 'Negative'} (Probability: {prediction:.2f})")

# Your code here
your_review = "Your review here"  # Write your own review
your_sequence = tokenizer.texts_to_sequences([your_review])
your_padded = pad_sequences(your_sequence, maxlen=max_len, padding="post", truncating="post")
your_prediction = model.predict(your_padded)[0][0]
print(f"Your review: {your_review}\nSentiment: {'Positive' if your_prediction > 0.5 else 'Negative'} (Probability: {your_prediction:.2f})")

**Question**: How does the LSTM compare to Class 1’s BoW model? (Hint: Think about context.)

## 4. Introduction to Transformers
Transformers are the backbone of modern NLP (e.g., BERT, GPT). Unlike RNNs, they:
- Use **attention** to focus on important words, regardless of their position.
- Process all words at once, not sequentially.
- Excel at capturing long-range dependencies.

**Analogy**: Instead of reading a book page by page (RNN), transformers skim the whole book and highlight key parts (attention).

**Key Concept**:
- **Attention**: Weighs which words matter most (e.g., in "The movie wasn’t great", "wasn’t" affects "great").

We won’t build a transformer from scratch (it’s complex!), but we’ll use a pre-trained one.

### 4.1 Hands-On: Sentiment Analysis with Transformers
Let’s use **Hugging Face’s pipeline** to perform sentiment analysis with a pre-trained transformer (DistilBERT).

In [None]:
# Load transformer pipeline
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Test reviews
reviews = [
    "I absolutely loved this movie, it was thrilling!",
    "This film was a complete waste of time."
]

# Predict sentiment
results = sentiment_analyzer(reviews)
for review, result in zip(reviews, results):
    print(f"Review: {review}\nSentiment: {result['label']} (Score: {result['score']:.2f})\n")

**Your Turn**: Test the transformer on two reviews of your own.

In [None]:
# Your code here
your_reviews = [
    "Your first review here",  # Write a review
    "Your second review here"  # Write another
]
your_results = sentiment_analyzer(your_reviews)
for review, result in zip(your_reviews, your_results):
    print(f"Your review: {review}\nSentiment: {result['label']} (Score: {result['score']:.2f})\n")

**Question**: How does the transformer’s output compare to the LSTM’s? (Hint: Look at confidence scores.)

## 5. RNNs vs. Transformers
- **RNNs**:
  - Good for small datasets and simpler tasks.
  - Struggle with long sequences.
  - Faster to train on small setups.
- **Transformers**:
  - Handle long contexts better (via attention).
  - Require more data and compute.
  - Pre-trained models (like BERT) are powerful out of the box.

**Discussion**:
- When would you use an RNN over a transformer, or vice versa?
- How might these models help with your capstone project?

## 6. Wrap-Up
You’ve learned:
- How RNNs (like LSTM) process sequences and capture context.
- The basics of transformers and their attention mechanism.
- How to apply both to sentiment analysis.

**Homework**:
- Explore Hugging Face: [Hugging Face NLP Course](https://huggingface.co/course).
- Start planning your capstone project dataset and task (e.g., text classification, generation).

**Deliverables**:
- Submit this notebook with completed "Your Turn" sections.
- Write a short paragraph on one challenge you faced with the LSTM or transformer.

**Questions?** Reach out to the instructor or discuss with peers.

Great work, and see you in Class 3 for model deployment and ethics! 🚀