<div style="  background: linear-gradient(145deg, #0f172a, #1e293b);  border: 4px solid transparent;  border-radius: 14px;  padding: 18px 22px;  margin: 12px 0;  font-size: 26px;  font-weight: 600;  color: #f8fafc;  box-shadow: 0 6px 14px rgba(0,0,0,0.25);  background-clip: padding-box;  position: relative;">  <div style="    position: absolute;    inset: 0;    padding: 4px;    border-radius: 14px;    background: linear-gradient(90deg, #06b6d4, #3b82f6, #8b5cf6);    -webkit-mask:       linear-gradient(#fff 0 0) content-box,       linear-gradient(#fff 0 0);    -webkit-mask-composite: xor;    mask-composite: exclude;    pointer-events: none;  "></div>    <b>Recurrent Neural Networks (RNNs) for Language Modeling with Keras</b>    <br/>  <span style="color:#9ca3af; font-size: 18px; font-weight: 400;">(From Theory to Implementation using TensorFlow/Keras)</span></div>

## Table of Contents

1. [Introduction to the Course](#section-1)
2. [Applications of Machine Learning to Text Data](#section-2)
3. [Recurrent Neural Networks (RNNs) & Sequence Models](#section-3)
4. [Introduction to Language Models](#section-4)
5. [Preprocessing Text Data](#section-5)
6. [Introduction to RNNs inside Keras](#section-6)
7. [Building, Training, and Evaluating Models](#section-7)
8. [Full Example: IMDB Sentiment Classification](#section-8)
9. [Conclusion](#section-9)

---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 1. Introduction to the Course</span><br>

Welcome to the comprehensive guide on **Recurrent Neural Networks (RNNs) for Language Modeling with Keras**. This notebook covers the fundamental concepts of processing text data, understanding sequence models, and implementing them using the Keras deep learning library.

### The Abundance of Text Data
Text data is ubiquitous in the modern digital landscape. Before diving into complex models, it is essential to recognize where this data comes from. Massive amounts of unstructured text are generated every second via:

*   **News Outlets**: Yahoo! News, Google News.
*   **Social Media**: Twitter (X), Facebook, Weibo.
*   **Search Engines**: Google search queries.
*   **General Web Content**: Articles, blogs, and forums.

This data contains rich semantic structures, rules, and meanings that algorithms can learn to perform specific tasks.

---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 2. Applications of Machine Learning to Text Data</span><br>

Machine learning applied to text data generally falls into four major categories. Understanding these applications helps in selecting the right architecture.

### 1. Sentiment Analysis
Determining the emotional tone behind a body of text.
*   **Input**: "I loved this movie!"
*   **Output**: Positive (üëç) or Negative (üëé).

### 2. Multi-class Classification
Categorizing text into one of many specific topics.
*   **Input**: A news article.
*   **Output**: Politics, Sports, Science, Finance, etc.

### 3. Text Generation
Predicting the next word or sequence of words. This is used in features like "Smart Compose" in email clients.
*   **Context**: "Next World Cup is going to be awesome..."
*   **Suggested Replies**: "Yes!", "No, I haven't.", "Not yet!"

### 4. Neural Machine Translation (NMT)
Translating text from one language to another using neural networks.
*   **Input (PT)**: "Vamos jogar futebol esse domingo?"
*   **Output (EN)**: "Let's play soccer this Sunday?"

---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 3. Recurrent Neural Networks (RNNs) & Sequence Models</span><br>

RNNs are designed to handle sequential data. Unlike traditional feedforward networks, RNNs have a "memory" that captures information about what has been calculated so far.

### The RNN Cell
In an RNN, the output of a cell at time step $t$ depends on the input at time $t$ and the hidden state from time $t-1$. This allows the network to share weights across time steps.

<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> üí° <b>Tip:</b> In RNN architectures, weights are <b>shared</b> across all time steps. This significantly reduces the number of parameters compared to treating every word position as a separate feature. </div>

### Sequence-to-Sequence Models

We can configure RNNs in different ways depending on the input and output requirements:

#### A. Many-to-One: Classification
Used for Sentiment Analysis.
*   **Input**: Sequence of words ($X_1, X_2, X_3, X_4$).
*   **Processing**: The RNN processes the sequence.
*   **Output**: A single prediction at the end ($Y_{pred}$).
*   **Decision Rule**:
    *   If $Y_{pred} > 0.5 \rightarrow$ Positive.
    *   Else $\rightarrow$ Negative.

#### B. Many-to-Many: Text Generation
Used for Language Modeling.
*   **Input**: Sequence of words.
*   **Output**: A sequence where $Y_t$ is the prediction for the next word $X_{t+1}$.
*   **Example**:
    *   Input: "The" $\rightarrow$ Output: "weather"
    *   Input: "weather" $\rightarrow$ Output: "is"

#### C. Many-to-Many: Neural Machine Translation
Uses an **Encoder-Decoder** architecture.
*   **Encoder**: Processes the input sentence (e.g., "Vamos jogar futebol") and compresses it into a context vector.
*   **Decoder**: Takes the context vector and generates the translated sentence (e.g., "Let's play soccer").

---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 4. Introduction to Language Models</span><br>

A language model learns the probability of a sequence of words. It answers the question: *How likely is this sentence to occur?*

### Sentence Probability Models

Given the sentence: **"I loved this movie"**

#### 1. Unigram Model
Assumes words are independent.
$$ P(\text{sentence}) = P(\text{I}) \times P(\text{loved}) \times P(\text{this}) \times P(\text{movie}) $$

#### 2. N-gram Models
Assumes the probability of a word depends on the previous $N-1$ words.

*   **Bigram (N=2)**:
    $$ P(\text{sentence}) = P(\text{I}) \times P(\text{loved} | \text{I}) \times P(\text{this} | \text{loved}) \times P(\text{movie} | \text{this}) $$

*   **Trigram (N=3)**:
    $$ P(\text{sentence}) = P(\text{I}) \times P(\text{loved} | \text{I}) \times P(\text{this} | \text{I loved}) \times \dots $$

#### 3. Skip-gram
Looks at the context surrounding a word (both before and after).
$$ P(\text{sentence}) = P(\text{context of I} | \text{I}) \times P(\text{context of loved} | \text{loved}) \dots $$

#### 4. Neural Networks (RNNs)
In Deep Learning, the probability is often computed using a **Softmax** function on the output layer, which normalizes the outputs into a probability distribution over the vocabulary.

---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 5. Preprocessing Text Data</span><br>

Before feeding text into a neural network, it must be converted into numbers. This involves creating a vocabulary and mapping words to integers.

### Step 1: Building Vocabulary Dictionaries
We need two dictionaries:
1.  `word_to_index`: Maps a word to a unique integer.
2.  `index_to_word`: Maps the integer back to the word.



In [None]:
# Original Code Implementation from Slides
text = "i loved this movie i loved this actor" # Dummy text for demonstration

# Get unique words
unique_words = list(set(text.split(' ')))

# Create dictionary: word is key, index is value
word_to_index = {k:v for (v,k) in enumerate(unique_words)}

# Create dictionary: index is key, word is value
index_to_word = {k:v for (k,v) in enumerate(unique_words)}

print("Word to Index:", word_to_index)
print("Index to Word:", index_to_word)



### Step 2: Creating Input (X) and Output (y)
For a language model, we often use a sliding window. If `sentence_size` is 3, we use 3 words to predict the 4th.



In [None]:
# Initialize variables
X = []
y = []

sentence_size = 3
step = 1

# Loop over the text
# We stop at len(text) - sentence_size to ensure we have a target label
words = text.split(' ') # Working with list of words, not characters for this example

for i in range(0, len(words) - sentence_size, step):
    # Append the sequence of words (inputs)
    X.append(words[i : i + sentence_size])
    # Append the next word (target)
    y.append(words[i + sentence_size])

# Example output
print(f"Input X[0]: {X[0]}")
print(f"Target y[0]: {y[0]}")

# Converting to Indices (Preprocessing for Model)
X_indices = []
y_indices = []

for i in range(len(X)):
    sentence_indices = [word_to_index[w] for w in X[i]]
    target_index = word_to_index[y[i]]
    X_indices.append(sentence_indices)
    y_indices.append(target_index)

print(f"Numerical X[0]: {X_indices[0]}")
print(f"Numerical y[0]: {y_indices[0]}")



### Step 3: Transforming New Texts
When the model is in production, new raw text must undergo the exact same transformation.



In [None]:
new_text = ["i loved this"] # Example new sentence

new_text_split = []

for sentence in new_text:
    sent_split = []
    for wd in sentence.split(' '):
        # Check if word exists in vocab to avoid errors
        if wd in word_to_index:
            ix = word_to_index[wd]
            sent_split.append(ix)
    new_text_split.append(sent_split)

print("New Text Indices:", new_text_split)



---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 6. Introduction to RNNs inside Keras</span><br>

**Keras** is a high-level neural networks API, written in Python and capable of running on top of TensorFlow. It is designed to enable fast experimentation.

### Key Keras Modules

| Module | Description |
| :--- | :--- |
| `keras.models` | Contains `Sequential` (linear stack of layers) and `Model` (functional API). |
| `keras.layers` | Contains building blocks like `LSTM`, `GRU`, `Dense`, `Embedding`, `Dropout`. |
| `keras.preprocessing` | Utilities for sequence padding (`pad_sequences`) and text tokenization. |
| `keras.datasets` | Pre-loaded datasets like IMDB Movie Reviews and Reuters Newswire. |

### Important Layers for NLP

1.  **Embedding**: Turns positive integers (indexes) into dense vectors of fixed size.
2.  **LSTM / GRU**: Recurrent layers that handle sequence data.
3.  **Dense**: Regular fully connected layer.
4.  **Bidirectional**: Wrapper that allows RNNs to process input from start-to-end and end-to-start.

### Padding Sequences
RNNs usually expect inputs of the same length. `pad_sequences` handles this.



In [None]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Example: Different length sentences
sequences = [
    [1, 2, 3],       # Length 3
    [1, 2, 3, 4, 5], # Length 5
    [1]              # Length 1
]

# Pad to maxlen=3 (truncating longer ones, padding shorter ones)
padded = pad_sequences(sequences, maxlen=3)
print("Padded Sequences:\n", padded)



---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 7. Building, Training, and Evaluating Models</span><br>

Here is the standard workflow for creating a neural network in Keras.

### 1. Creating the Model
We use the `Sequential` API to stack layers.



In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Instantiate
model = Sequential()

# Add layers
# input_dim=100 implies we have 100 features
model.add(Dense(64, activation='relu', input_dim=100)) 
model.add(Dense(1, activation='sigmoid')) # Binary classification output

# Compile
model.compile(optimizer='adam', 
              loss='mean_squared_error', 
              metrics=['accuracy'])

print("Model compiled successfully.")



### 2. Training the Model
We use the `.fit()` method.

<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> üí° <b>Tip:</b> <br><b>Epochs</b>: One pass over the entire dataset.<br><b>Batch Size</b>: Number of samples per gradient update. </div>



In [None]:
# Mock data for demonstration
import numpy as np
X_train = np.random.random((1000, 100))
y_train = np.random.randint(2, size=(1000, 1))

# Train
model.fit(X_train, y_train, epochs=2, batch_size=32)



### 3. Evaluation and Prediction
Once trained, we check performance on test data and make predictions.



In [None]:
# Mock test data
X_test = np.random.random((200, 100))
y_test = np.random.randint(2, size=(200, 1))

# Evaluate
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Predict
new_data = np.random.random((2, 100))
predictions = model.predict(new_data)
print("Predictions:", predictions)



---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 8. Full Example: IMDB Sentiment Classification</span><br>

This section combines everything into a full RNN example using the IMDB dataset. The goal is to classify movie reviews as positive or negative.

### The Architecture
1.  **Embedding Layer**: Converts word indices to vectors (Vocab size 10,000 -> Vector size 128).
2.  **LSTM Layer**: 128 units, with dropout to prevent overfitting.
3.  **Dense Layer**: 1 unit with Sigmoid activation (output 0 to 1).



In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

# 1. Load Data (Limiting to top 10,000 words)
max_features = 10000
maxlen = 80  # Cut texts after this number of words (among top max_features most common words)

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

# 2. Pad Sequences
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

# 3. Build and Compile the Model
print('Building model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2)) # recurrent_dropout is optional but good
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# 4. Training
print('Training...')
# Using a small epoch count and batch size for demonstration speed
model.fit(x_train, y_train, batch_size=32, epochs=1, validation_data=(x_test, y_test))

# 5. Evaluation
score, acc = model.evaluate(x_test, y_test, batch_size=32)
print('Test score:', score)
print('Test accuracy:', acc)



---

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  üßæ 9. Conclusion</span><br>

In this notebook, we have traversed the landscape of Recurrent Neural Networks for language modeling.

**Key Takeaways:**
1.  **Text Data is Sequential**: Unlike images, text has a temporal dimension where order matters.
2.  **RNNs are Specialized**: They maintain a hidden state (memory) to process sequences, making them ideal for NLP.
3.  **Preprocessing is Vital**: Raw text must be tokenized, indexed, and padded before entering a network.
4.  **Keras Simplifies Deep Learning**: With just a few lines of code, we can build complex architectures like Embeddings + LSTMs to perform sentiment analysis with high accuracy.

**Next Steps:**
*   Experiment with **Bidirectional LSTMs** to capture context from both directions.
*   Try **Text Generation** by changing the output to a Softmax over the vocabulary size.
*   Explore **Transformers** (like BERT or GPT), which have largely superseded RNNs for complex tasks, though RNNs remain a foundational concept.
