Great! It's nice to see that you already have some experience with neural networks and PyTorch. For today's lesson, we'll work on building a simple language model using an RNN in PyTorch.

Here's the plan for our 60-minute session:

* Briefly discuss RNNs (5 min)
* Prepare the dataset (10 min)
* Implement a simple RNN model in PyTorch (15 min)
* Train the model (10 min)
* Test the model (10 min)
* Wrap up and discuss improvements (10 min)

Let's get started with a brief discussion of RNNs.

## 1. Briefly discuss RNNs (5 min)

Recurrent Neural Networks (RNNs) are a type of neural network designed to handle sequential data. They can maintain a hidden state that gets updated at each step in the sequence, making them well-suited for tasks like language modeling.

For our simple language model, we'll use a character-level RNN. It will predict the next character in a sequence given the previous characters. This is a simpler task compared to predicting the next word, as GPT models do.

## Next steps:

Please let me know if you are familiar with RNNs and if you have any questions about them. If you're ready to move on, I'll guide you through preparing the dataset for our language model.

Great! Let's begin with preparing the dataset.

## 2. Prepare the dataset (10 min)

For our simple language model, we'll use a text file as our dataset. You can choose any text file you like, such as a book from Project Gutenberg or a simple text file with some sample sentences.

Follow these steps to prepare the dataset:

* Load the text file and preprocess it (lowercase, remove special characters, etc.)
* Create a dictionary that maps characters to integers and another dictionary that maps integers to characters. We'll use these dictionaries to convert the text to numbers and vice versa.
* Convert the text to a sequence of integers using the character-to-integer dictionary.
* Create input and target sequences. For each sequence of length n in the input, the corresponding target sequence should have the same length and contain the next character in the text for each character in the input sequence.

Here's some code to help you get started:

In [1]:
import numpy as np

# Load and preprocess the text
with open("input.txt", "r") as f:
    text = f.read().lower()
    text = ''.join(c for c in text if c.isalnum() or c.isspace())

In [2]:
# Create dictionaries
chars = sorted(list(set(text)))
char_to_int = {c: i for i, c in enumerate(chars)}
int_to_char = {i: c for i, c in enumerate(chars)}

In [3]:
# Convert text to integers
int_text = [char_to_int[c] for c in text]

In [4]:
# Create input and target sequences
sequence_length = 50
X, y = [], []

for i in range(len(int_text) - sequence_length):
    X.append(int_text[i:i + sequence_length])
    y.append(int_text[i + 1:i + sequence_length + 1])

X = np.array(X)
y = np.array(y)

## Now let's move on to implementing the RNN model.

### 3. Implement a simple RNN model in PyTorch (15 min)

We'll create a simple RNN model using PyTorch's nn.RNN module along with a fully connected layer for the final output. Our model will have the following layers:

* An embedding layer (nn.Embedding) to convert the input character integers to embeddings.
* An RNN layer (nn.RNN) that will maintain hidden states and learn the sequence patterns.
* A fully connected output layer (nn.Linear) that will produce the probabilities for the next character.

Here's a simple implementation:


In [5]:
import torch
import torch.nn as nn

class SimpleRNN(nn.Module):
  def __init__(self, input_size, embed_size, hidden_size, output_size):
    super(SimpleRNN, self).__init__()
    self.embed_size = embed_size
    self.hidden_size = hidden_size
    self.output_size = output_size

    self.embedding = nn.Embedding(input_size, embed_size)
    self.rnn = nn.RNN(embed_size, hidden_size, batch_first=True)
    self.fc = nn.Linear(hidden_size, output_size)

  def forward(self, x, hidden):
    x = self.embedding(x)
    x, hidden = self.rnn(x, hidden)
    x = self.fc(x)
    return x, hidden

  def init_hidden(self, batch_size):
    return torch.zeros(1, batch_size, self.hidden_size)

input_size = len(chars)
embed_size = 128
hidden_size = 256
output_size = len(chars)

model = SimpleRNN(input_size, embed_size, hidden_size, output_size)

Now that we have our RNN model defined, we need to set up the training process. We'll need a loss function, an optimizer, and a training loop that feeds the input sequences and target sequences to the model, computes the loss, and updates the model's parameters.

First, let's define the loss function and the optimizer. We'll use the Cross Entropy Loss, which is suitable for classification tasks, and the Adam optimizer for updating the model's parameters.

Add the following code to your notebook:

In [6]:
import torch.optim as optim

# Set the learning rate
lr = 0.001

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=lr)

Next, let's create the training loop. We'll need to perform the following steps for each epoch:

* Reset the hidden state using model.init_hidden().
* Pass the input sequence and the initial hidden state to the model.
* Compute the loss between the model's output and the target sequence.
* Backpropagate the loss and update the model's parameters using the optimizer.

Here's the code for the training loop:


In [7]:
# Set the number of training epochs
num_epochs = 50

# Set the batch size
batch_size = 64

# Train the model
for epoch in range(num_epochs):
  # Reset the hidden state
  hidden = model.init_hidden(batch_size)
  
  # Loop over the input-target pairs in the dataset
  # for i in range(0, len(X), batch_size):
  for i in range(0, 300, batch_size):
    # Get a batch of input and target sequences
    input_batch = torch.tensor(X[i:i+batch_size], dtype=torch.long)
    target_batch = torch.tensor(y[i:i+batch_size], dtype=torch.long)
    
    # Zero the gradients
    optimizer.zero_grad()
    
    # Forward pass: pass the input and hidden state to the model
    output, hidden = model(input_batch, hidden)
    
    # Reshape the output and target_batch
    output = output.view(-1, output.shape[2])
    target_batch = target_batch.view(-1)
    
    # Compute the loss
    loss = criterion(output, target_batch)
    
    # Backward pass: compute the gradients
    loss.backward()
    
    # Update the model parameters
    optimizer.step()

  # # Print the loss for this epoch
  # print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')




RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.