# Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are a type of neural network designed to handle sequential data. They can maintain a hidden state that gets updated at each step in the sequence, making them well-suited for tasks like language modeling.

For our simple language model, we'll use a character-level RNN. It will predict the next character in a sequence given the previous characters. This is a simpler task compared to predicting the next word, as GPT models do.

Great! Let's begin with preparing the dataset.

## Prepare the dataset

For our simple language model, we'll use a text file as our dataset. You can choose any text file you like, such as a book from Project Gutenberg or a simple text file with some sample sentences.

Follow these steps to prepare the dataset:

* Load the text file and preprocess it (lowercase, remove special characters, etc.)
* Create a dictionary that maps characters to integers and another dictionary that maps integers to characters. We'll use these dictionaries to convert the text to numbers and vice versa.
* Convert the text to a sequence of integers using the character-to-integer dictionary.
* Create input and target sequences. For each sequence of length n in the input, the corresponding target sequence should have the same length and contain the next character in the text for each character in the input sequence.

Here's some code to help you get started:

In [None]:
import numpy as np

# Load the text
with open("../a06_RNN_language_model/animal_farm.txt", "r") as f:
    text = f.read()

text = text[0:5000] # Make text shorter for faster testing

# Create dictionaries
chars = sorted(list(set(text)))
char_to_int = {c: i for i, c in enumerate(chars)}
int_to_char = {i: c for i, c in enumerate(chars)}

# Convert text to integers
int_text = [char_to_int[c] for c in text]

# Create input and target sequences
sequence_length = 50
X, y = [], []

for i in range(len(int_text) - sequence_length):
  X.append(int_text[i:i + sequence_length])
  y.append(int_text[i + 1:i + sequence_length + 1])

X = np.array(X)
y = np.array(y)

## Implementing the RNN model in PyTorch

We'll create a simple RNN model using PyTorch's nn.RNN module along with a fully connected layer for the final output. Our model will have the following layers:

* An embedding layer (nn.Embedding) to convert the input character integers to embeddings.
* An RNN layer (nn.RNN) that will maintain hidden states and learn the sequence patterns.
* A fully connected output layer (nn.Linear) that will produce the probabilities for the next character.

Here's a simple implementation:

In [None]:
import torch
import torch.nn as nn

class SimpleRNN(nn.Module):
  def __init__(self, input_size, embed_size, hidden_size, output_size):
    super(SimpleRNN, self).__init__()
    self.embed_size = embed_size
    self.hidden_size = hidden_size
    self.output_size = output_size

    self.embedding = nn.Embedding(input_size, embed_size)
    self.rnn = nn.RNN(embed_size, hidden_size, batch_first=True)
    self.fc = nn.Linear(hidden_size, output_size)

  def forward(self, x, hidden):
    x = self.embedding(x)
    x, hidden = self.rnn(x, hidden)
    x = self.fc(x)
    return x, hidden

  def init_hidden(self, batch_size):
    return torch.zeros(1, batch_size, self.hidden_size)

input_size = len(chars)
embed_size = 128
hidden_size = 256
output_size = len(chars)

model = SimpleRNN(input_size, embed_size, hidden_size, output_size)

## Optimizer and the loss function

Now that we have our RNN model defined, we need to set up the training process. We'll need a loss function, an optimizer, and a training loop that feeds the input sequences and target sequences to the model, computes the loss, and updates the model's parameters.

First, let's define the loss function and the optimizer. We'll use the Cross Entropy Loss, which is suitable for classification tasks, and the Adam optimizer for updating the model's parameters.

In [None]:
import torch.optim as optim

# Set the learning rate
lr = 0.001

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=lr)

## Training loop

Next, let's create the training loop. We'll need to perform the following steps for each epoch:

* Reset the hidden state using model.init_hidden().
* Pass the input sequence and the initial hidden state to the model.
* Compute the loss between the model's output and the target sequence.
* Backpropagate the loss and update the model's parameters using the optimizer.

Here's the code for the training loop:

In [None]:
import time

# Set the device
# this allows to run the code on GPU if available to make it faster
device = "mps" if torch.backends.mps.is_available() else "cpu"
device = "cuda" if torch.cuda.is_available() else device
model = model.to(device)

# Set the number of training epochs
num_epochs = 50

# Set the batch size
batch_size = 64

X = torch.tensor(X, dtype=torch.long).to(device)
y = torch.tensor(y, dtype=torch.long).to(device)

# Train the model
for epoch in range(num_epochs):
  start_epoch = time.time()
  
  # Loop over the input-target pairs in the dataset
  for i in range(0, len(X), batch_size):
    # Get the actual batch size for the current iteration
    actual_batch_size = min(batch_size, len(X) - i)
    
    # Reset the hidden state
    hidden = model.init_hidden(actual_batch_size).to(device)
    
    # Detach the hidden state from its history
    hidden.detach_()
    
    # Get a batch of input and target sequences
    input_batch = X[i:i+actual_batch_size]
    target_batch = y[i:i+actual_batch_size]
    
    # Zero the gradients
    optimizer.zero_grad()
    
    # Forward pass: pass the input and hidden state to the model
    output, hidden = model(input_batch, hidden)
    
    # Reshape the output and target_batch
    output = output.view(-1, output.shape[2])
    target_batch = target_batch.view(-1)
    
    # Compute the loss
    loss = criterion(output, target_batch)
    
    # Backward pass: compute the gradients
    loss.backward()
    
    # Update the model parameters
    start = time.time()
    optimizer.step()

  # Print the loss for this epoch
  print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}, time: {time.time() - start_epoch:.2f} s')

## Evaluating the model

As you gain more experience and work with larger datasets, you can consider using more powerful hardware, like GPUs, and optimize the code further to improve training time.

Now that you've trained the model, you can use it to generate new text by providing it with an initial sequence and sampling the output probabilities. This will allow you to see how well the model has learned the structure and style of the input text.

You can use the following function to generate new text using your trained model:

In [None]:
def generate_text(model, initial_sequence, n_chars, temperature=1.0):
  model.eval()  # Set the model to evaluation mode
  # initial_sequence = initial_sequence.lower()  # Convert the initial sequence to lowercase
  
  # Only keep characters that are present in char_to_int
  initial_sequence = ''.join(c for c in initial_sequence if c in char_to_int)
  generated_sequence = initial_sequence

  # Convert the initial sequence to a tensor
  input_sequence = torch.tensor([char_to_int[c] for c in initial_sequence], dtype=torch.long).unsqueeze(1).to(device)
  
  # Initialize the hidden state
  hidden = model.init_hidden(1).to(device)
  
  # Generate new characters
  for _ in range(n_chars):
    # Reshape input_sequence to have a sequence length of 1
    input_sequence_reshaped = input_sequence.view(1, -1).to(device)
    output, hidden = model(input_sequence_reshaped, hidden)

    # Apply the temperature and sample the output probabilities
    output_dist = output.data[0, -1, :].div(temperature).exp()
    
    top_i = torch.multinomial(output_dist, 1)[0]

    # Add the generated character to the sequence
    generated_char = int_to_char[top_i.item()]
    generated_sequence += generated_char

    # Append the number for the next predicted character to the input sequence
    new_character = torch.tensor([[char_to_int[generated_char]]]).to(device)
    input_sequence = torch.cat((input_sequence, new_character))

  return generated_sequence
  
initial_sequence = "We are not like that."
n_chars = 300
temperature = 1.0

generated_text = generate_text(model, initial_sequence, n_chars, temperature)
print(generated_text)

### Sample output

> We are not like that.

Before long the of ours?

Let us face it: our lives are miserable, laborious, and short.

He wayd wast came the came mane, hoping to light in the big barn as soon as Mr. Jones, of the catto tame ant made a mare who drew Mr. Jones’stterep off, bro draw amo had not be trodden oft laskedMad ank in or