#### Importing Necessary Libraries:

In [2]:
import torch
import numpy as np
from torch import nn
import torch.nn.functional as F

## Loading the Data:

In [3]:
with open('C:/Users/Geekquad/rnn_data/anna.txt', 'r') as f:
    text = f.read()

#### Checking out the first 500 characters:

In [4]:
text[:500]

"Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverything was in confusion in the Oblonskys' house. The wife had\ndiscovered that the husband was carrying on an intrigue with a French\ngirl, who had been a governess in their family, and she had announced to\nher husband that she could not go on living in the same house with him.\nThis position of affairs had now lasted three days, and not only the\nhusband and wife themselves, but all the members of their f"

## Tokenization:

In the cells below I am creating a couple of dictionaries to convert the characters to and from integers. 
Encoding the characters as integers makes it easier to use as input in the network.

In [5]:
"""Creating two dictonaries
   1. int2char : which maps integers to characters
   2. char2int : which maps charaters to integers"""

chars = tuple(set(text))
int2char = dict(enumerate((chars)))
char2int = {ch: ii for ii, ch in int2char.items()}

#ENCODING THE TEXT:
encoded = np.array([char2int[ch] for ch in text])

And we can see those same characters from above, encoded as integers.

In [6]:
encoded[:100]

array([63, 17, 61, 71, 26,  4, 72, 28, 41,  5,  5,  5, 68, 61, 71, 71, 18,
       28, 29, 61, 64, 35, 53, 35,  4, 73, 28, 61, 72,  4, 28, 61, 53, 53,
       28, 61, 53, 35, 16,  4, 34, 28,  4, 48,  4, 72, 18, 28, 66, 58, 17,
       61, 71, 71, 18, 28, 29, 61, 64, 35, 53, 18, 28, 35, 73, 28, 66, 58,
       17, 61, 71, 71, 18, 28, 35, 58, 28, 35, 26, 73, 28, 22, 69, 58,  5,
       69, 61, 18, 60,  5,  5, 74, 48,  4, 72, 18, 26, 17, 35, 58])

## Pre-processing the data:

As in out char-RNN, our LSTM expects an input that is one-hot encoded meaning, that each character is converted into an integer (by our created dictionary), and then converted into a column vector where only it's corresponding integer index will have the value of 1 and the rest of the vector will be filled with 0's. 
Making a one_hot_encoding function to do this:

In [7]:
def one_hot_encode(arr, n_labels):
    one_hot = np.zeros((arr.size, n_labels), dtype = np.float32)
    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1
    one_hot = one_hot.reshape((*arr.shape, n_labels))
    return one_hot

In [8]:
test_seq = np.array([[3, 5, 1]])
one_hot = one_hot_encode(test_seq, 8)

print(one_hot)

[[[0. 0. 0. 1. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 1. 0. 0.]
  [0. 1. 0. 0. 0. 0. 0. 0.]]]


## Making training mini-batches

To train on this data, we will create mini-batches for training of some desired number of sequence steps.

In [9]:
def get_batches(arr, batch_size, seq_length):
    batch_size_total = batch_size*seq_length
    n_batches = len(arr)//batch_size_total
    
    arr = arr[:n_batches*batch_size_total]
    arr = arr.reshape((batch_size, -1))
    
    for n in range(0, arr.shape[1], seq_length):
        x = arr[:, n:n+seq_length]
        y = np.zeros_like(x)
        try:
            y[:,:-1], y[:,-1] = x[:,1:], arr[:, n+seq_length]
        except IndexError:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
        yield x, y 

Now I'll make some data sets and we can check out what's going on as we batch data. Here I am going to use a batch size of 8 and 50 sequence steps.

In [10]:
batches = get_batches(encoded, 8, 50)
x, y = next(batches)

In [11]:
print('x/n', x[:10, :10])
print('\ny\n', y[:10, :10])

x/n [[63 17 61 71 26  4 72 28 41  5]
 [73 22 58 28 26 17 61 26 28 61]
 [ 4 58 52 28 22 72 28 61 28 29]
 [73 28 26 17  4 28 70 17 35  4]
 [28 73 61 69 28 17  4 72 28 26]
 [70 66 73 73 35 22 58 28 61 58]
 [28 15 58 58 61 28 17 61 52 28]
 [75 82 53 22 58 73 16 18 60 28]]

y
 [[17 61 71 26  4 72 28 41  5  5]
 [22 58 28 26 17 61 26 28 61 26]
 [58 52 28 22 72 28 61 28 29 22]
 [28 26 17  4 28 70 17 35  4 29]
 [73 61 69 28 17  4 72 28 26  4]
 [66 73 73 35 22 58 28 61 58 52]
 [15 58 58 61 28 17 61 52 28 73]
 [82 53 22 58 73 16 18 60 28 12]]


## Building the Network:

In [12]:
class CharRNN(nn.Module):
    def __init__(self, tokens, n_hidden=256, n_layers=2, drop_prob=0.5, lr=0.001):
        super().__init__()
        self.drop_prob = drop_prob
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.lr = lr
        
        self.chars = tokens
        self.int2char = dict(enumerate(self.chars))
        self.char2int = {ch: ii for ii, ch in self.int2char.items()}
        
        self.lstm = nn.LSTM(len(self.chars), n_hidden, n_layers, dropout=drop_prob, batch_first=True)
        self.dropout = nn.Dropout(drop_prob)
        self.fc = nn.Linear(n_hidden, len(self.chars))
        
    def forward(self, x, hidden):
        r_output, hidden = self.lstm(x, hidden)
        out = self.dropout(r_output)
        out = out.contiguous().view(-1, self.n_hidden)
        out = self.fc(out)
        return out, hidden
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        
        hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(), weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
        
        return hidden       

In [13]:
def train(net, data, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
    ''' Training a network 
    
        Arguments
        ---------
        
        net: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        batch_size: Number of mini-sequences per mini-batch, aka batch size
        seq_length: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        print_every: Number of steps for printing training and validation loss
    
    '''
    
    net.train()
    opt = torch.optim.Adam(net.parameters(), lr = lr)
    criterion = nn.CrossEntropyLoss()
    
    val_idx = int(len(data)*(1-val_frac))
    data, val_data = data[:val_idx], data[val_idx:]
    
    counter = 0
    n_chars = len(net.chars)
    for e in range(epochs):
        h = net.init_hidden(batch_size)
        
        for x, y in get_batches(data, batch_size, seq_length):
            
            counter += 1
            x = one_hot_encode(x, n_chars)
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
            
            h = tuple([each.data for each in h])

            net.zero_grad()
            
            output, h = net(inputs, h)
            
            loss = criterion(output, targets.view(batch_size*seq_length).long())
            loss.backward()
            nn.utils.clip_grad_norm_(net.parameters(), clip)
            opt.step()
            
            # loss stats
            if counter % print_every == 0:
                val_h = net.init_hidden(batch_size)
                val_losses = []
                net.eval()
                for x, y in get_batches(val_data, batch_size, seq_length):
                    x = one_hot_encode(x, n_chars)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)
                    
                    val_h = tuple([each.data for each in val_h])
                    
                    inputs, targets = x, y                                         
                    output, val_h = net(inputs, val_h)
                    val_loss = criterion(output, targets.view(batch_size*seq_length).long())
                
                    val_losses.append(val_loss.item())
                
                net.train()
                
                print("Epoch: {}/{}...".format(e+1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))

## Instantiating the model:

In [14]:
n_hidden=512
n_layers=2

net = CharRNN(chars, n_hidden, n_layers)
print(net)

CharRNN(
  (lstm): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


In [15]:
batch_size = 128
seq_length = 100
n_epochs = 30

## Training the Model:

In [16]:
train(net, encoded, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length, lr=0.001, print_every=10)

Epoch: 1/30... Step: 10... Loss: 3.2533... Val Loss: 3.2208
Epoch: 1/30... Step: 20... Loss: 3.1390... Val Loss: 3.1379
Epoch: 1/30... Step: 30... Loss: 3.1390... Val Loss: 3.1244
Epoch: 1/30... Step: 40... Loss: 3.1120... Val Loss: 3.1190
Epoch: 1/30... Step: 50... Loss: 3.1443... Val Loss: 3.1165
Epoch: 1/30... Step: 60... Loss: 3.1176... Val Loss: 3.1145
Epoch: 1/30... Step: 70... Loss: 3.1058... Val Loss: 3.1115
Epoch: 1/30... Step: 80... Loss: 3.1174... Val Loss: 3.1039
Epoch: 1/30... Step: 90... Loss: 3.1050... Val Loss: 3.0855
Epoch: 1/30... Step: 100... Loss: 3.0602... Val Loss: 3.0453
Epoch: 1/30... Step: 110... Loss: 3.0394... Val Loss: 3.0287
Epoch: 1/30... Step: 120... Loss: 2.9150... Val Loss: 2.8941
Epoch: 1/30... Step: 130... Loss: 2.8789... Val Loss: 2.8259
Epoch: 2/30... Step: 140... Loss: 2.7523... Val Loss: 2.6890
Epoch: 2/30... Step: 150... Loss: 2.6459... Val Loss: 2.6020
Epoch: 2/30... Step: 160... Loss: 2.5851... Val Loss: 2.5447
Epoch: 2/30... Step: 170... Loss:

Epoch: 10/30... Step: 1350... Loss: 1.3886... Val Loss: 1.4249
Epoch: 10/30... Step: 1360... Loss: 1.3932... Val Loss: 1.4251
Epoch: 10/30... Step: 1370... Loss: 1.3878... Val Loss: 1.4262
Epoch: 10/30... Step: 1380... Loss: 1.4275... Val Loss: 1.4179
Epoch: 10/30... Step: 1390... Loss: 1.4311... Val Loss: 1.4175
Epoch: 11/30... Step: 1400... Loss: 1.4430... Val Loss: 1.4201
Epoch: 11/30... Step: 1410... Loss: 1.4420... Val Loss: 1.4179
Epoch: 11/30... Step: 1420... Loss: 1.4298... Val Loss: 1.4111
Epoch: 11/30... Step: 1430... Loss: 1.4014... Val Loss: 1.4199
Epoch: 11/30... Step: 1440... Loss: 1.4299... Val Loss: 1.4167
Epoch: 11/30... Step: 1450... Loss: 1.3541... Val Loss: 1.4075
Epoch: 11/30... Step: 1460... Loss: 1.3813... Val Loss: 1.4072
Epoch: 11/30... Step: 1470... Loss: 1.3738... Val Loss: 1.4110
Epoch: 11/30... Step: 1480... Loss: 1.3919... Val Loss: 1.4040
Epoch: 11/30... Step: 1490... Loss: 1.3715... Val Loss: 1.4010
Epoch: 11/30... Step: 1500... Loss: 1.3719... Val Loss:

Epoch: 20/30... Step: 2660... Loss: 1.2462... Val Loss: 1.3030
Epoch: 20/30... Step: 2670... Loss: 1.2675... Val Loss: 1.3043
Epoch: 20/30... Step: 2680... Loss: 1.2495... Val Loss: 1.2980
Epoch: 20/30... Step: 2690... Loss: 1.2341... Val Loss: 1.3015
Epoch: 20/30... Step: 2700... Loss: 1.2544... Val Loss: 1.2965
Epoch: 20/30... Step: 2710... Loss: 1.2274... Val Loss: 1.2954
Epoch: 20/30... Step: 2720... Loss: 1.2270... Val Loss: 1.3000
Epoch: 20/30... Step: 2730... Loss: 1.2094... Val Loss: 1.2996
Epoch: 20/30... Step: 2740... Loss: 1.2105... Val Loss: 1.2962
Epoch: 20/30... Step: 2750... Loss: 1.2141... Val Loss: 1.2957
Epoch: 20/30... Step: 2760... Loss: 1.2122... Val Loss: 1.2957
Epoch: 20/30... Step: 2770... Loss: 1.2523... Val Loss: 1.2955
Epoch: 20/30... Step: 2780... Loss: 1.2833... Val Loss: 1.2940
Epoch: 21/30... Step: 2790... Loss: 1.2518... Val Loss: 1.2947
Epoch: 21/30... Step: 2800... Loss: 1.2664... Val Loss: 1.2993
Epoch: 21/30... Step: 2810... Loss: 1.2614... Val Loss:

Epoch: 29/30... Step: 3970... Loss: 1.1826... Val Loss: 1.2634
Epoch: 29/30... Step: 3980... Loss: 1.1583... Val Loss: 1.2615
Epoch: 29/30... Step: 3990... Loss: 1.1576... Val Loss: 1.2652
Epoch: 29/30... Step: 4000... Loss: 1.1693... Val Loss: 1.2661
Epoch: 29/30... Step: 4010... Loss: 1.1414... Val Loss: 1.2613
Epoch: 29/30... Step: 4020... Loss: 1.1483... Val Loss: 1.2648
Epoch: 29/30... Step: 4030... Loss: 1.1615... Val Loss: 1.2637
Epoch: 30/30... Step: 4040... Loss: 1.1633... Val Loss: 1.2637
Epoch: 30/30... Step: 4050... Loss: 1.1742... Val Loss: 1.2594
Epoch: 30/30... Step: 4060... Loss: 1.1796... Val Loss: 1.2631
Epoch: 30/30... Step: 4070... Loss: 1.1707... Val Loss: 1.2651
Epoch: 30/30... Step: 4080... Loss: 1.1556... Val Loss: 1.2637
Epoch: 30/30... Step: 4090... Loss: 1.1744... Val Loss: 1.2644
Epoch: 30/30... Step: 4100... Loss: 1.1414... Val Loss: 1.2631
Epoch: 30/30... Step: 4110... Loss: 1.1450... Val Loss: 1.2613
Epoch: 30/30... Step: 4120... Loss: 1.1410... Val Loss:

## Checkpoint:

After training, we will save the model so we can load it again later is we need to. I am saving the parameters needed to create the same architecture, the hidden layer hyperparameters and the next characters.

In [17]:
model_name = 'rnn_30_epoch.net'
checkpoint = {'n_hidden': net.n_hidden,
              'n_layers': net.n_layers,
              'state_dict': net.state_dict(),
              'tokens' : net.chars}

with open(model_name, 'wb') as f:
    torch.save(checkpoint, f)

## Making Predictions:

Now that out mddol is trained, we can make predictions about the next characters. We can sample the text and make it more resoanble to handle.
To sample, we pass in a character and have the network predict the next character. Then we take that character, pass it back in, and get another predicted character.

###### To make the prediction more resonable we will do "Top K - sampling." 
Thsi will prevent the network from giving us completely absurd characters while allowing it to introduce some noise and randomness into the sampled text.

In [18]:
def predict(net, char, h = None, top_k = None):
    x = np.array([[net.char2int[char]]])
    x = one_hot_encode(x, len(net.chars))
    inputs = torch.from_numpy(x)
    
    h = tuple([each.data for each in h])
    out, h = net(inputs, h)
    
    p = F.softmax(out, dim =1).data
    
    #GETTING TOP CHARACTERS
    
    if top_k is None:
        top_ch = np.arange(len(net.chars))
    else:
        p, top_ch = p.topk(top_k)
        top_ch = top_ch.numpy().squeeze()
        
    p = p.numpy().squeeze()
    char = np.random.choice(top_ch, p=p/p.sum())
    
    return net.int2char[char], h

#### Priming and Generating Texts:

Priming otherwise the network will start out generating characters at random. In general the first bunch of characters will be a little rough since it hasn't built up a long history of characters to predict from.

In [19]:
def sample(net, size, prime='The', top_k=None):
    net.eval() 
    chars = [ch for ch in prime]
    h = net.init_hidden(1)
    for ch in prime:
        char, h = predict(net, ch, h, top_k=top_k)

    chars.append(char)
    
    for ii in range(size):
        char, h = predict(net, chars[-1], h, top_k=top_k)
        chars.append(char)

    return ''.join(chars)

In [20]:
print(sample(net, 1000, prime='Aditya ', top_k=5))

Aditya on the railways so in women as something would be delighted,
but he went on; "both the secret convensed officer of her hers in
the
way he has so minuteful, but to me, you define him,"
she asked, who had said a matter about his wheak as she had always said
all the sister-in-law to see him all the path,
and he would not come. But her son were all the point on the conversation with his
significance with him.

"What do you say about! It's being might be time."

"I don't understand what I done anything," he thought.

The side, too had
taken them and so anything standing the most friends of a sister. All of his strange, hondrack and their states of their
spart at the point of a chill. He came out of the sound of this man, significantly
at his book.

"Yes, that's the mistale. What am the princess's a simple son there?"

"Oh, yes, I won't think that I shall have a love to be anything to think of you."

Stave was beating a person.

She did not know the forest and her
substaist, he settle

##  Loading the checkoint:

In [21]:
with open('rnn_30_epoch.net', 'rb') as f:
    checkpoint = torch.load(f)
    
loaded = CharRNN(checkpoint['tokens'], n_hidden=checkpoint['n_hidden'], n_layers=checkpoint['n_layers'])
loaded.load_state_dict(checkpoint['state_dict'])

#### Sample using the Loaded Model:

In [22]:
print(sample(loaded, 2000, top_k = 10, prime="This is Geekquad "))

This is Geekquad to his way in it. The life in enting sounds, and the sudden obsires
of
his left flying close and simple sounds and bedoother's face in their despatro department in
the bails only one of outsider and from his bidget, and has handed his eyes, and as he should not give this;
then were sole feeling by now, to his
position--there was nothing but a teals still had to
sit down. Only filled her terrible
besige, seeing Lizaveta Petrovna to
his whole discussion so as for his what they were phusping in from the
curtsing in the
conversation with Alexey Alexandrovitch's head.



Chapter 22


When he had never come up to him, and that the predict of his caped the
coachman could see worse, so at that
duries of that, too, to small, but he had been both her hander.

"I beg now and conditions away, but if no one
can't except me of interest, but I cannot believe."

She listened, but she flung ooter out
of side, starting away from his hat, but she
would not look at her. The condessed offi