# Homework 8 - Recurrent Neural Networks (RNNs)

In this assignment, you will be training and using a "char-RNN" on the Shakespeare dataset. This is the name given to a character-level recurrent neural network language model by [this famous blog post](https://karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy. Andrej's original char-rnn is in Torch (the predecessor to PyTorch that is not commonly used anymore). Fortunately, there are many other implementations of this model available; for example, there is one (in both mxnet and pytorch) in chapters 8 and 9 of [the textbook](https://d2l.ai/), and another pytorch one [here](https://github.com/spro/char-rnn.pytorch). You can refer to these example implementations (or another one that you find) when completing this homework.

You will train both vanilla RNN and GRU models, and compare their performance. Additionally, you will experiment with a smaller dataset to observe the differences in training and results.

You will:
1. Download and tokenize the Shakespeare dataset at a character level.
2. Train a vanilla RNN on the Shakespeare dataset and report the training loss.
3. Generate samples from the trained vanilla RNN model.
4. Train a GRU RNN on the Shakespeare dataset and compare the results with the vanilla RNN.
5. Train an RNN on a smaller dataset and compare the results with the Shakespeare dataset.



## Task 1: Download and Tokenize the Shakespeare Dataset

In this task, you will download the Shakespeare dataset and tokenize it at a character level.

In [1]:
!pip install requests
!pip install torch

import re
import requests

# Download the dataset
url = 'https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt'
response = requests.get(url)
text = response.text

# Remove non-alphabetical characters, lowercase, and replace whitespace with ' '
raw_dataset = ' '.join(re.sub('[^A-Za-z ]+', '', text).lower().split())
# Maps token index to character
idx_to_char = list(set(raw_dataset))
# Maps character to token index
char_to_idx = dict([(char, i) for i, char in enumerate(idx_to_char)])
# Tokenize the dataset
corpus_indices = [char_to_idx[char] for char in raw_dataset]

print(f"Number of unique characters: {len(idx_to_char)}")
print(f"First 100 characters in the dataset: {raw_dataset[:100]}")
print(f"First 100 token indices: {corpus_indices[:100]}")

Collecting torch
  Obtaining dependency information for torch from https://files.pythonhosted.org/packages/11/c5/2370d96b31eb1841c3a0883a492c15278a6718ccad61bb6a649c80d1d9eb/torch-2.6.0-cp311-cp311-win_amd64.whl.metadata
  Downloading torch-2.6.0-cp311-cp311-win_amd64.whl.metadata (28 kB)
Collecting typing-extensions>=4.10.0 (from torch)
  Obtaining dependency information for typing-extensions>=4.10.0 from https://files.pythonhosted.org/packages/26/9f/ad63fc0248c5379346306f8668cda6e2e2e9c95e01216d2b8ffd9ff037d0/typing_extensions-4.12.2-py3-none-any.whl.metadata
  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy==1.13.1 (from torch)
  Obtaining dependency information for sympy==1.13.1 from https://files.pythonhosted.org/packages/b2/fe/81695a1aa331a842b582453b605175f419fe8540355886031328089d840a/sympy-1.13.1-py3-none-any.whl.metadata
  Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Downloading torch-2.6.0-cp311-cp311-win_amd64.whl (204.

### Task 1.1: Instructions
- Install the required libraries (requests and torch)
- Download the Shakespeare dataset from the provided URL
- Remove non-alphabetical characters, lowercase the text, and replace whitespace with ' '
- Create a mapping from characters to indices and vice versa
- Tokenize the dataset using the character-to-index mapping

> This task helps in understanding the preprocessing steps required for character-level language modeling.


## Task 2: Train a Vanilla RNN

In this task, you will train a vanilla RNN on the Shakespeare dataset and report the training loss. You will also generate some samples from the trained model.

### Task 2.1: Implement and Train the Vanilla RNN
- Define the vanilla RNN model using PyTorch
- Train the model on the tokenized Shakespeare dataset
- Report the training loss after each epoch

> This task helps in understanding the implementation and training of a simple RNN model for character-level language modeling.


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the vanilla RNN model
class VanillaRNN(nn.Module):
    def __init__(self, vocab_size, hidden_size, output_size):
        super(VanillaRNN, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(vocab_size, hidden_size)
        self.rnn = nn.RNN(hidden_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden):
        x = self.embedding(x)
        out, hidden = self.rnn(x, hidden)
        out = self.fc(out.reshape(out.size(0)*out.size(1), -1))
        return out, hidden

    def init_hidden(self, batch_size):
        return torch.zeros(1, batch_size, self.hidden_size)

# Hyperparameters
vocab_size = len(idx_to_char)
hidden_size = 64
output_size = vocab_size
num_epochs = 5
learning_rate = 0.002

# Model, loss function, and optimizer
model = VanillaRNN(vocab_size, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
for epoch in range(num_epochs):
    hidden = model.init_hidden(1)
    optimizer.zero_grad()
    inputs = torch.tensor(corpus_indices[:-1]).unsqueeze(0)
    targets = torch.tensor(corpus_indices[1:]).unsqueeze(0)
    outputs, hidden = model(inputs, hidden)
    loss = criterion(outputs, targets.reshape(-1))
    loss.backward()
    optimizer.step()
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [1/5], Loss: 3.3545
Epoch [2/5], Loss: 3.2932
Epoch [3/5], Loss: 3.2345
Epoch [4/5], Loss: 3.1777
Epoch [5/5], Loss: 3.1220


### Task 2.2: Generate Samples from the Vanilla RNN
- Implement a function to generate text samples from the trained vanilla RNN model
- Generate and print text samples at the end of training

> This task helps in understanding how to use a trained RNN model to generate text samples.


In [3]:
# Function to generate text samples
def generate_text(model, start_str, length=100):
    model.eval()
    input_str = torch.tensor([char_to_idx[c] for c in start_str]).unsqueeze(0)
    hidden = model.init_hidden(1)
    generated_str = start_str
    for _ in range(length):
        output, hidden = model(input_str, hidden)
        _, top_idx = torch.topk(output[-1], k=1)
        char_idx = top_idx[0].item()
        generated_str += idx_to_char[char_idx]
        input_str = torch.tensor([[char_idx]])
    return generated_str

# Generate samples
start_str = 'the '
generated_text = generate_text(model, start_str)
print(f'Generated text: {generated_text}')

Generated text: the the the the the the the the the the the the the the the the the the the the the the the the the the 


## Task 3: Train a GRU RNN

In this task, you will train a GRU RNN on the Shakespeare dataset and compare the results with the vanilla RNN.

### Task 3.1: Implement and Train the GRU RNN
- Define the GRU RNN model using PyTorch
- Train the model on the tokenized Shakespeare dataset
- Report the training loss after each epoch

> This task helps in understanding the implementation and training of a GRU model for character-level language modeling.


In [4]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the GRU RNN model
class GRURNN(nn.Module):
    def __init__(self, vocab_size, hidden_size, output_size):
        super(GRURNN, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(vocab_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden):
        x = self.embedding(x)
        out, hidden = self.gru(x, hidden)
        out = self.fc(out.reshape(out.size(0)*out.size(1), -1))
        return out, hidden

    def init_hidden(self, batch_size):
        return torch.zeros(1, batch_size, self.hidden_size)

# Model, loss function, and optimizer
model_gru = GRURNN(vocab_size, hidden_size, output_size)
optimizer_gru = optim.Adam(model_gru.parameters(), lr=learning_rate)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_gru.to(device)

# Training loop
for epoch in range(num_epochs):
    hidden = model_gru.init_hidden(1)
    optimizer_gru.zero_grad()
    inputs = torch.tensor(corpus_indices[:-1]).unsqueeze(0)
    targets = torch.tensor(corpus_indices[1:]).unsqueeze(0)
    outputs, hidden = model_gru(inputs, hidden)
    loss = criterion(outputs, targets.reshape(-1))
    loss.backward()
    optimizer_gru.step()
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [1/5], Loss: 3.3323
Epoch [2/5], Loss: 3.2883
Epoch [3/5], Loss: 3.2458
Epoch [4/5], Loss: 3.2036
Epoch [5/5], Loss: 3.1608


### Task 3.2: Compare GRU and Vanilla RNN
- Compare the final training loss of the GRU model with the vanilla RNN model
- Generate and print text samples from the trained GRU model

> This task helps in understanding the differences in performance and sample quality between vanilla RNN and GRU models.


In [5]:
# Generate samples from the GRU model
generated_text_gru = generate_text(model_gru, start_str)
print(f'Generated text from GRU: {generated_text_gru}')

Generated text from GRU: the  or o or o or o or o or o or o or o or o or o or o or o or o or o or o or o or o or o or o or o or o


## Task 4: Train on a Smaller Dataset

In this task, you will train either the vanilla RNN or the GRU RNN on a smaller dataset and compare the results with the Shakespeare dataset (you can find some ideas about alternative datasets in Andrej's blog post, but feel free to get creative).

### Task 4.1: Instructions
- Find a smaller dataset for training (e.g., nursery rhymes, short poems, etc.)
- Preprocess the dataset similarly to the Shakespeare dataset
- Train either the vanilla RNN or the GRU RNN on the smaller dataset
- Compare the final training loss and generated samples with the Shakespeare dataset results

> This task helps in understanding how the size and complexity of the dataset affect the training and performance of RNN models.


In [6]:
# Example smaller dataset (nursery rhyme)
small_text = '''
Twinkle, twinkle, little star,
How I wonder what you are!
Up above the world so high,
Like a diamond in the sky.
'''

In [7]:
# Preprocess the smaller dataset
small_raw_dataset = ' '.join(re.sub('[^A-Za-z ]+', '', small_text).lower().split())
small_idx_to_char = list(set(small_raw_dataset))
small_char_to_idx = dict([(char, i) for i, char in enumerate(small_idx_to_char)])
small_corpus_indices = [small_char_to_idx[char] for char in small_raw_dataset]

print(f"Number of unique characters in small dataset: {len(small_idx_to_char)}")
print(f"First 100 characters in the small dataset: {small_raw_dataset[:100]}")
print(f"First 100 token indices in the small dataset: {small_corpus_indices[:100]}")

Number of unique characters in small dataset: 21
First 100 characters in the small dataset: twinkle twinkle little starhow i wonder what you areup above the world so highlike a diamond in the 
First 100 token indices in the small dataset: [18, 9, 12, 13, 3, 2, 10, 0, 18, 9, 12, 13, 3, 2, 10, 0, 2, 12, 18, 18, 2, 10, 0, 14, 18, 16, 6, 20, 8, 9, 0, 12, 0, 9, 8, 13, 17, 10, 6, 0, 9, 20, 16, 18, 0, 15, 8, 11, 0, 16, 6, 10, 11, 19, 0, 16, 1, 8, 5, 10, 0, 18, 20, 10, 0, 9, 8, 6, 2, 17, 0, 14, 8, 0, 20, 12, 7, 20, 2, 12, 3, 10, 0, 16, 0, 17, 12, 16, 4, 8, 13, 17, 0, 12, 13, 0, 18, 20, 10, 0]


In [8]:
# Train on the smaller dataset using the GRU model
for epoch in range(num_epochs):
    hidden = model_gru.init_hidden(1)
    optimizer_gru.zero_grad()
    inputs = torch.tensor(small_corpus_indices[:-1]).unsqueeze(0)
    targets = torch.tensor(small_corpus_indices[1:]).unsqueeze(0)
    outputs, hidden = model_gru(inputs, hidden)
    loss = criterion(outputs, targets.reshape(-1))
    loss.backward()
    optimizer_gru.step()
    if (epoch + 1) % 5 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [5/5], Loss: 2.9536


In [9]:
# Generate samples from the GRU model trained on the smaller dataset
generated_text_small_gru = generate_text(model_gru, start_str)
print(f'Generated text from GRU on small dataset: {generated_text_small_gru}')

Generated text from GRU on small dataset: the   o  o   o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o
