# **Simple RNN**
An implementation of a minimal character-level Vanilla Recurrent Neural Network (RNN) model for text generation. It's written in Python and uses NumPy for numerical operations.

In [1]:
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""

import numpy as np

## **Data Loading & Preprocessing**


In [2]:
# Read and preprocess the Shakespearean sonnets
# remember to keep a simple plain text file
with open('/content/sonnet.txt', 'r', encoding='utf-8') as file:
    sonnets_text = file.read()

# Split the text into training and validation sets
total_length = len(sonnets_text)
split_ratio = 0.8  # 80% for training, 20% for validation

split_index = int(total_length * split_ratio)
training_data = sonnets_text[:split_index]
validation_data = sonnets_text[split_index:]

# Save the training and validation datasets to files
with open('training.txt', 'w', encoding='utf-8') as file:
    file.write(training_data)

with open('validation.txt', 'w', encoding='utf-8') as file:
    file.write(validation_data)


### **Data I/O and Vocabulary Setup:**

The code reads the training text, creates a set of unique characters in the text, and assigns indices to each character. It also calculates the sizes of the data and vocabulary. This vocabulary setup is crucial for encoding characters into numerical inputs for the RNN.

### **Hyperparameters and Model Parameters:**

Hyperparameters like `hidden_size` (size of the hidden layer), `seq_length` (number of steps to unroll the RNN for), and `learning_rate` are defined. Model parameters like weights and biases `(Wxh, Whh, Why, bh, by)` are initialized using random values.

In [3]:
# Data I/O
data = open('/content/training.txt', 'r').read()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('data has %d characters, %d unique.' % (data_size, vocab_size))
char_to_ix = {ch: i for i, ch in enumerate(chars)}
ix_to_char = {i: ch for i, ch in enumerate(chars)}

# Hyperparameters
hidden_size =  256 # Size of hidden layer of neurons
seq_length = 25 # Number of steps to unroll the RNN for
learning_rate = 1e-1

# Model parameters
Wxh = np.random.randn(hidden_size, vocab_size) * 0.01  # Input to hidden
Whh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden
Why = np.random.randn(vocab_size, hidden_size) * 0.01  # Hidden to output
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((vocab_size, 1))  # Output bias

data has 4825 characters, 30 unique.


### **Loss Function (lossFun):**

This function calculates the loss, gradients, and the last hidden state for a given input sequence and target sequence. It performs both the forward pass (calculating intermediate values) and the backward pass (backpropagation to compute gradients).

In [4]:
def lossFun(inputs, targets, hprev):
    """
    Calculate loss, gradients, and last hidden state.

    Args:
        inputs (list of int): List of input character indices.
        targets (list of int): List of target character indices.
        hprev (numpy.ndarray): Initial hidden state.

    Returns:
        float: Loss
        numpy.ndarray: Gradient for input to hidden weights
        numpy.ndarray: Gradient for hidden to hidden weights
        numpy.ndarray: Gradient for hidden to output weights
        numpy.ndarray: Gradient for hidden bias
        numpy.ndarray: Gradient for output bias
        numpy.ndarray: Last hidden state
    """
    xs, hs, ys, ps = {}, {}, {}, {}
    hs[-1] = np.copy(hprev)
    loss = 0

    # Forward pass
    for t in range(len(inputs)):
        xs[t] = np.zeros((vocab_size, 1))  # Encode in 1-of-k representation
        xs[t][inputs[t]] = 1
        hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh)  # Hidden state
        ys[t] = np.dot(Why, hs[t]) + by  # Unnormalized log probabilities for next chars
        ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t]))  # Probabilities for next chars
        loss += -np.log(ps[t][targets[t], 0])  # Softmax (cross-entropy loss)

    # Backward pass: Compute gradients going backwards
    dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
    dbh, dby = np.zeros_like(bh), np.zeros_like(by)
    dhnext = np.zeros_like(hs[0])

    for t in reversed(range(len(inputs))):
        dy = np.copy(ps[t])
        dy[targets[t]] -= 1  # Backprop into y
        dWhy += np.dot(dy, hs[t].T)
        dby += dy
        dh = np.dot(Why.T, dy) + dhnext  # Backprop into h
        dhraw = (1 - hs[t] * hs[t]) * dh  # Backprop through tanh nonlinearity
        dbh += dhraw
        dWxh += np.dot(dhraw, xs[t].T)
        dWhh += np.dot(dhraw, hs[t-1].T)
        dhnext = np.dot(Whh.T, dhraw)

    for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
        np.clip(dparam, -5, 5, out=dparam)  # Clip to mitigate exploding gradients

    return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs) - 1]

### **Sampling Function (sample):**

This function generates a sequence of characters by repeatedly sampling characters based on the RNN's predictions. It takes an initial memory state `h`, a seed character index `seed_ix`, and the number of characters to sample `n`.

In [5]:
def sample(h, seed_ix, n):
    """
    Sample a sequence of integers from the model.

    Args:
        h (numpy.ndarray): Memory state
        seed_ix (int): Seed letter for the first time step
        n (int): Number of steps to sample

    Returns:
        list of int: Sequence of sampled integers
    """
    x = np.zeros((vocab_size, 1))
    x[seed_ix] = 1
    ixes = []

    for t in range(n):
        h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)
        y = np.dot(Why, h) + by
        p = np.exp(y) / np.sum(np.exp(y))
        ix = np.random.choice(range(vocab_size), p=p.ravel())
        x = np.zeros((vocab_size, 1))
        x[ix] = 1
        ixes.append(ix)

    return ixes

In [6]:
n, p = 0, 0
mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
mbh, mby = np.zeros_like(bh), np.zeros_like(by)  # Memory variables for Adagrad
smooth_loss = -np.log(1.0 / vocab_size) * seq_length  # Loss at iteration 0

acceptable_loss = 1.5  # Set an acceptable loss threshold
max_iterations = 50000  # Set a maximum number of iterations to avoid infinite loop
validation_interval = 10000  # Evaluate validation loss every 10000 iterations
validation_data = open('/content/validation.txt', 'r').read()  # Load validation data

best_loss = float('inf')  # Initialize best validation loss
best_params = None  # Store best model parameters

In [7]:
total_params = (
    hidden_size * vocab_size  # Wxh
    + hidden_size * hidden_size  # Whh
    + vocab_size * hidden_size  # Why
    + 2 * hidden_size + vocab_size  # bh and by
)

print("Total number of parameters:", total_params)

# outputs the total number of parameters in the model
# keep this similar to the size of i/p file

Total number of parameters: 81438


### **Training Loop:**

The main training loop iterates through the data multiple times (controlled by `max_iterations`). In each iteration, it processes a chunk of `seq_length` characters, calculates loss, performs gradient updates using Adagrad, and updates the iteration and data pointers.
* The inputs and targets for the current sequence are prepared.
* The loss and gradients are calculated using the `lossFun` function.
* Adagrad updates are applied to the model parameters.
* The validation loss is evaluated every `validation_interval` iterations.

### **Validation:**

During the training loop, the code calculates the validation loss by processing the validation data in chunks of `seq_length` characters. It compares the validation loss to the best recorded loss and performs early stopping if the validation loss increases for an extended period.

In [8]:

while n < max_iterations:
    # Prepare inputs (sweeping from left to right in steps seq_length long)
    if p + seq_length + 1 >= len(data) or n == 0:
        hprev = np.zeros((hidden_size, 1))  # Reset RNN memory
        p = 0  # Go from the start of the data

    inputs = [char_to_ix[ch] for ch in data[p:p + seq_length]]
    targets = [char_to_ix[ch] for ch in data[p + 1:p + seq_length + 1]]

    # Sample from the model now and then
    # if n % 100 == 0:
    #     sample_ix = sample(hprev, inputs[0], 200)
    #     txt = ''.join(ix_to_char[ix] for ix in sample_ix)
    #     print('----\n %s \n----' % (txt, ))

    # Forward seq_length characters through the net and fetch gradient
    loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)
    smooth_loss = smooth_loss * 0.999 + loss * 0.001

    if n % 5000 == 0:
        print('iter %d, loss: %f' % (n, smooth_loss))  # Print progress

     # Check if the loss is acceptable
    if smooth_loss < acceptable_loss:
        print("Acceptable loss reached. Training complete.")
        break  # Exit the loop when acceptable loss is reached

    # Perform parameter update with Adagrad
    for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], [dWxh, dWhh, dWhy, dbh, dby], [mWxh, mWhh, mWhy, mbh, mby]):
        mem += dparam * dparam
        param += -learning_rate * dparam / np.sqrt(mem + 1e-8)  # Adagrad update

    p += seq_length  # Move data pointer
    n += 1  # Iteration counter

    if n % validation_interval == 0:
        val_loss = 0
        num_batches = len(validation_data) // seq_length
        val_hprev = np.zeros((hidden_size, 1))

        # Calculate validation loss by iterating through validation data
        for b in range(num_batches):
            inputs = [char_to_ix[ch] for ch in validation_data[b*seq_length:(b+1)*seq_length]]
            targets = [char_to_ix[ch] for ch in validation_data[b*seq_length+1:(b+1)*seq_length+1]]
            loss, _, _, _, _, _, _ = lossFun(inputs, targets, val_hprev)
            val_loss += loss

        val_loss /= num_batches

        print('iter %d, validation loss: %f' % (n, val_loss))  # Print validation progress

        # Check for early stopping
        if val_loss < best_loss:
            best_loss = val_loss
            best_params = {
                'Wxh': np.copy(Wxh),
                'Whh': np.copy(Whh),
                'Why': np.copy(Why),
                'bh': np.copy(bh),
                'by': np.copy(by)
            }

        # Early stopping if validation loss starts increasing
        if n > 2 * validation_interval and val_loss > 1.5 * best_loss:
            print("Early stopping due to increasing validation loss.")
            break


iter 0, loss: 85.029927
iter 5000, loss: 65.834783
iter 10000, validation loss: 64.450059
iter 10000, loss: 57.007958
iter 15000, loss: 54.612780
iter 20000, validation loss: 65.641756
iter 20000, loss: 53.310195
iter 25000, loss: 52.044722
iter 30000, validation loss: 65.731041
iter 30000, loss: 51.091996
iter 35000, loss: 50.312399
iter 40000, validation loss: 65.672620
iter 40000, loss: 49.750033
iter 45000, loss: 49.077974
iter 50000, validation loss: 72.005383


if we train for around 50,000 iterations even though our model might overfit, it will generate good text.

It's generally a good practice to ensure that the text length of the validation data is similar to that of the training data. This helps the model to generalize well to different text lengths
and produce meaningful text during validation.

### **Text Generation:**

After training, the best model parameters are used for text generation. A `starting character` is provided, and the `sample function` generates a sequence of characters by repeatedly predicting the next character based on the RNN's outputs.

In [9]:
# Use the best model parameters for text generation
Wxh = best_params['Wxh']
Whh = best_params['Whh']
Why = best_params['Why']
bh = best_params['bh']
by = best_params['by']

# Specify the starting character for text generation
starting_char = 'f'  # Replace with the character you want as the starting point
starting_ix = char_to_ix[starting_char]

# Generate text after training, starting with the specified character
generated_text = sample(np.zeros((hidden_size, 1)), starting_ix, 1000)
generated_text = ''.join(ix_to_char[ix] for ix in generated_text)
print("Generated Text:\n", generated_text)


Generated Text:
 or thes for id toatetisheukes praukger me ald ungish fo coauchaif magt nis oret thel
fin the zoubut uy wiso ael pcaves wounk.
pous dor
thene lor loun hien ehaf lor is uet berpfor ante peag
faardu plfer sof
ayirawtore sorl faakail bi hin sume irefraesd ave siot for to lorsee psaa s-b lpmeoo say tre prave fatw bues
fnig
indrouthe yeas bus lom hial ou ape ang
me as ser ixdhif wisol thans yofe pey
foue puc bel ip chintsmexragu poy fawort nk werhe out aveld meaveaud cod my ter ikif he bes brer laat apel lone ar i peale pof wu sloranat oach thaklorit sniep
bu hix.
ft ale pupin to ali broues sof d
lt
ry mat oas louris alge if bung pu s bref
thes bp ic foon wuly ing
lly tin be pr dous co sicpe ore love irinorippre brt avys lif av mess ii hlmi ti sse reer sser praf you upuche pre all ald.
prat prey pam dore loa fin tnl hay shi far tagp hiar pp or the bo thor foul whoet bu ul the iy silm gul
arts hous z bu yold
soming y the be sur oupr nlat ave
arous buth barinseror mesirt liure

# **☝ GARBAGE**