# Minimal character RNN

![Character sequence](images/charseq.jpeg)

Related paper by Andrej Karpathy: [Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. "Visualizing and understanding recurrent networks." arXiv preprint arXiv:1506.02078 (2015).](https://arxiv.org/abs/1506.02078)

Related blogpost by Andrej Karpathy: [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

Original code by Andrej Karpathy: [gist](https://gist.github.com/karpathy/d4dee566867f8291f086)

In [1]:
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F

import torch.optim as optim

from torch.autograd import Variable

import numpy as np

from tqdm import tqdm

import sys

## Load the data

A Shakespeare sample can be downloaded from [here](https://github.com/karpathy/char-rnn/raw/master/data/tinyshakespeare/input.txt).

In [2]:
with open("data/tinyshakespeare.txt", "r") as data_file:
    data = data_file.read()

Show the amount of characters in the text:

In [3]:
data_size = len(data)
print("Number of symbols in text:", data_size)

Number of symbols in text: 1115394


Build an alphabet from the text:

In [4]:
alphabet = set(data)
alphabet_size = len(alphabet)
print("Alphabet size:", alphabet_size)

Alphabet size: 65


Assign a number to every symbol in the alphabet:

In [5]:
symbol_to_id = {}
id_to_symbol = {}
for symbol_id, symbol in enumerate(sorted(alphabet)):
    symbol_to_id[symbol] = symbol_id
    id_to_symbol[symbol_id] = symbol

Transform a symbol into a one-hot-encoded vector:

In [6]:
def one_hot_encoding(symbol):
    one_hot_encoded = torch.zeros(alphabet_size)
    symbol_id = symbol_to_id[symbol]
    one_hot_encoded[symbol_id] = 1
    return one_hot_encoded

Transform a sequence of symbols into a one-dimensional tensor of symbol IDs:

In [7]:
def labels_tensor(symbols):
    return torch.Tensor([symbol_to_id[symbol] for symbol in symbols]).long()

## Model

In [8]:
hidden_size = 100

class MinCharRNN(nn.Module):
    
    def __init__(self):
        super(MinCharRNN, self).__init__()
        
        self.input_to_hidden = nn.Linear(alphabet_size, hidden_size)
        self.hidden_to_hidden = nn.Linear(hidden_size, hidden_size)
        self.hidden_to_output = nn.Linear(hidden_size, alphabet_size)

    def forward(self, input_symbol, hidden_state):
        hidden_state = torch.tanh(self.input_to_hidden(input_symbol) + self.hidden_to_hidden(hidden_state))
        output = self.hidden_to_output(hidden_state)
        return output, hidden_state

## Training

Function to initialize every module (layer) of our model:

In [9]:
def initialize_weights(m):
    if isinstance(m, nn.Linear):
        nn.init.uniform_(m.weight, -0.01, 0.01)

Initialize the model, the loss funcion and the optimization algorithm:

In [10]:
learning_rate = 1e-1

model = MinCharRNN()    
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adagrad(model.parameters(), lr=learning_rate)

model.apply(initialize_weights)

MinCharRNN(
  (input_to_hidden): Linear(in_features=65, out_features=100, bias=True)
  (hidden_to_hidden): Linear(in_features=100, out_features=100, bias=True)
  (hidden_to_output): Linear(in_features=100, out_features=65, bias=True)
)

Uncomment to load a previously saved model:

In [11]:
# model.load_state_dict(torch.load("models/min-char-rnn.torch"))

Function to print a sample text with a fixed amount of characters:

In [12]:
sample_size = 200
first_symbol = "\n"
symbol_ids = list(range(alphabet_size))

def print_sample():
    sample = ""
    
    with torch.no_grad():
        v_input_symbol = Variable(one_hot_encoding(first_symbol))
        v_hidden_state = Variable(torch.zeros((1, hidden_size)))    

        for sample_id in range(sample_size):
            v_logits, v_hidden_state = model(v_input_symbol, v_hidden_state)

            v_probabilities = F.softmax(v_logits, dim=1)
            probabilities = v_probabilities.data.squeeze(0).numpy()

            symbol_id = np.random.choice(symbol_ids, p=probabilities)
            symbol = id_to_symbol[symbol_id]
            sample += symbol

            v_input_symbol = Variable(one_hot_encoding(symbol))

    print(sample)

Initial sample without training:

In [13]:
print_sample()

uFL!mFkGTuI-rSu&m?oM!oagzfUI,RQcprivDhNIa
TwxD&abkWEm&CcnQH&baScigYEGlmeuWIQdxY3MX:INIWDrO'C.:Vx':it' Yd&kPQbWFnXD,ECx q
-pq-rumlB&BmMkQqYJwUe!j$X'MdgtMXZ3$Zn-zlg3.R3
PKMNeAPYuxVzVQPfy$?!HxmGq$sNjC3Q.


In [14]:
epochs = 10
sequence_size = 25
batches = data_size // (sequence_size + 1)
gradient_clipping = 5

initial_state = torch.zeros((1, hidden_size))

for epoch_id in range(epochs):
    # reset the state before every epoch
    last_hidden_state = initial_state
    
    epoch_accumulated_loss = 0.0
    
    # train
    model.train(mode=True)
    
    with tqdm(total=batches) as progress_bar:
        for batch_id in range(batches):
            batch_start = batch_id * sequence_size

            # reuse the hidden state from last batch
            hidden_state = Variable(last_hidden_state)

            # clear the gradient information from the past batch
            optimizer.zero_grad()

            # for every symbol in the batch
            # try predict the next symbol
            # and meassure the loss
            predictions = []
            for sequence_id in range(sequence_size):
                v_input_symbol = Variable(one_hot_encoding(data[batch_start + sequence_id]))

                v_prediction, hidden_state = model(v_input_symbol, hidden_state)

                predictions.append(v_prediction)

            # create all labels
            v_labels = Variable(labels_tensor(data[batch_start + 1:batch_start + sequence_size + 1]))

            # create all predictions
            v_predictions = torch.cat(predictions)

            # backpropagate through time
            v_loss = loss_function(v_predictions, v_labels)
            v_loss.backward()

            # gradient clipping to avoid exploding gradients
            for parameter in model.parameters():
                parameter.grad.data.clamp_(-gradient_clipping, gradient_clipping)

            # update parameters
            optimizer.step()

            # batch logging
            loss = v_loss.data.item()
            epoch_accumulated_loss += loss
            
            progress_bar.set_postfix(loss="{:.03f}".format(loss))
            progress_bar.update()

            # take the hidden state out of the variable
            # to avoid backpropagating the next batch to this one
            last_hidden_state = hidden_state.data
    
    # epoch logging
    mean_loss = epoch_accumulated_loss / float(batches)
    print("Epoch {:d}/{:d} Mean Loss: {:.03f} Sample:".format(epoch_id + 1, epochs, mean_loss))
    print()
    model.train(mode=False)
    print_sample()
    sys.stdout.flush()

100%|██████████| 42899/42899 [06:56<00:00, 103.01it/s, loss=1.622]


Epoch 1/10 Mean Loss: 2.167 Sample:

Thy
Whad
CUCNIS
S
ENANNANNONOUNUMUSHUCUNUCANRUN
NANANTANENENICENWENS:
MIOMHIIMICHCOZWUCHIMUCUNUSBUMICUCUKUINUCUNUNCINUNCCENCUNUNANCIXISNNWANUNUS
HUCUNUNUYUCUCUSTHUCANENDUNUHICSoEThINNCUNUNANHHMICUIS
N


100%|██████████| 42899/42899 [06:35<00:00, 108.55it/s, loss=1.528]


Epoch 2/10 Mean Loss: 2.016 Sample:

Se
And then we gint?

PONTHOCTANDA:
Beris bodnoun, and doy a my be waithhich he flony ma som my cir, gofartet is gledce.

PETRUCICENCENTO:
Bomeve ape for to bele notP-are,
on and Lracted, he he eeds t


100%|██████████| 42899/42899 [06:38<00:00, 107.77it/s, loss=1.562]


Epoch 3/10 Mean Loss: 1.924 Sample:

TINAND:
Ke,

GRENIO:
Jio? bestemy the blutaife fecrotire.

PETCIO:
Gadstardens nouste,
And sight not?

GREMIONR:
Hot thy apeabt.

PETRUSIET:
Becaist seatesce,
Hobantes.

ARGCETLAN:
Ifaverys!

KAMly so


100%|██████████| 42899/42899 [06:45<00:00, 105.88it/s, loss=1.536]


Epoch 4/10 Mean Loss: 1.884 Sample:

SINA:
What I bet:
How stoUNE::
VINGBAPUENIO:
With how incor wer Karath lover;
Frion repe: take with the be! it feyst be.

BIONTENTIO:
I gies; live, wefe:
Mate guingicn thid this rever this slawn;
Or h


100%|██████████| 42899/42899 [07:31<00:00, 95.00it/s, loss=1.494] 


Epoch 5/10 Mean Loss: 1.857 Sample:

TESCD:
He mofnennbut
Thats heavus willo.

PELTINO:
Not to forbinio, and you daipes havit toundiendient noure;
And bid matter to. Serpyce!

PATHISAY LANDANANIVA:
Serfor me we smany, then shat, the Lord


100%|██████████| 42899/42899 [07:30<00:00, 95.29it/s, loss=1.469] 


Epoch 6/10 Mean Loss: 1.836 Sample:

STALY:
I canother, to stild this to mone a thoughe havon, neintio staech the now we tang, in san!

VINTENCENTIO:
Loty?
Wear,
By be ta-mime sera wilgwert, Wan the cley,
Of ishading to a and ate you wit


100%|██████████| 42899/42899 [07:28<00:00, 95.59it/s, loss=1.431] 


Epoch 7/10 Mean Loss: 1.821 Sample:


RIMEN:
O: Karint a frupon hath me's ending!

KATHANCHAN:
Hey?
Oss faps, to war.
'sware up the wicrould of this
of weld lite way, How staw--Hath Pllover
Tore it them.

ANANIO:
Hede in o him am buir on


100%|██████████| 42899/42899 [07:11<00:00, 99.48it/s, loss=1.403] 


Epoch 8/10 Mean Loss: 1.809 Sample:

Se
Tast and and prond sirramertiol?

NONTENTINA:
Busted me, that's to be could ceep on not,
You, no devile bestest, allat, astord, boincter?
And wifpasteet! the uik, him!

PETRUCHIO:
And for he daster


100%|██████████| 42899/42899 [07:23<00:00, 93.89it/s, loss=1.380] 


Epoch 9/10 Mean Loss: 1.799 Sample:

TER:
What arry-winds
you! how for siome have to hold's.

PETRUCHIO:
Your all.

LERNIO:
And is ssole,
Thus lords! and me the right smand go I will, she somparis: which sore whire you madoth?

BAPTISTA:


100%|██████████| 42899/42899 [07:11<00:00, 99.42it/s, loss=1.365] 


Epoch 10/10 Mean Loss: 1.791 Sample:

SINR:
But, kind wive him now thear.

VhIO:
O to manter have,
Womuster'd a many
If a have not the stame ont viuen ased?

PRINTINCA:
O jed.

KATHARINCUS:
Lere be cangy lower, and re thinss. 'tcunt know 


Uncomment to save the model:

In [15]:
# torch.save(model.state_dict(), "models/min-char-rnn.torch")