#### Importing Necessary Libraries:

In [4]:
import torch
import numpy as np
from torch import nn
import torch.nn.functional as F

## Loading the Data:

In [5]:
with open('C:/Users/Geekquad/rnn_data/anna.txt', 'r') as f:
    text = f.read()

#### Checking out the first 500 characters:

In [6]:
text[:500]

"Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverything was in confusion in the Oblonskys' house. The wife had\ndiscovered that the husband was carrying on an intrigue with a French\ngirl, who had been a governess in their family, and she had announced to\nher husband that she could not go on living in the same house with him.\nThis position of affairs had now lasted three days, and not only the\nhusband and wife themselves, but all the members of their f"

## Tokenization:

In the cells below I am creating a couple of dictionaries to convert the characters to and from integers. 
Encoding the characters as integers makes it easier to use as input in the network.

In [7]:
"""Creating two dictonaries
   1. int2char : which maps integers to characters
   2. char2int : which maps charaters to integers"""

chars = tuple(set(text))
int2char = dict(enumerate((chars)))
char2int = {ch: ii for ii, ch in int2char.items()}

#ENCODING THE TEXT:
encoded = np.array([char2int[ch] for ch in text])

And we can see those same characters from above, encoded as integers.

In [8]:
encoded[:100]

array([77, 42, 11, 58, 19, 22, 30, 20, 38, 62, 62, 62, 56, 11, 58, 58, 35,
       20, 41, 11, 24, 17, 15, 17, 22, 49, 20, 11, 30, 22, 20, 11, 15, 15,
       20, 11, 15, 17, 45, 22, 55, 20, 22, 21, 22, 30, 35, 20, 18, 46, 42,
       11, 58, 58, 35, 20, 41, 11, 24, 17, 15, 35, 20, 17, 49, 20, 18, 46,
       42, 11, 58, 58, 35, 20, 17, 46, 20, 17, 19, 49, 20, 76,  7, 46, 62,
        7, 11, 35, 50, 62, 62, 68, 21, 22, 30, 35, 19, 42, 17, 46])

## Pre-processing the data:

As in out char-RNN, our LSTM expects an input that is one-hot encoded meaning, that each character is converted into an integer (by our created dictionary), and then converted into a column vector where only it's corresponding integer index will have the value of 1 and the rest of the vector will be filled with 0's. 
Making a one_hot_encoding function to do this:

In [9]:
def one_hot_encode(arr, n_labels):
    one_hot = np.zeros((arr.size, n_labels), dtype = np.float32)
    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1
    one_hot = one_hot.reshape((*arr.shape, n_labels))
    return one_hot

In [10]:
test_seq = np.array([[3, 5, 1]])
one_hot = one_hot_encode(test_seq, 8)

print(one_hot)

[[[0. 0. 0. 1. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 1. 0. 0.]
  [0. 1. 0. 0. 0. 0. 0. 0.]]]


## Making training mini-batches

To train on this data, we will create mini-batches for training of some desired number of sequence steps.

In [13]:
def get_batches(arr, batch_size, seq_length):
    batch_size_total = batch_size*seq_length
    n_batches = len(arr)//batch_size_total
    
    arr = arr[:n_batches*batch_size_total]
    arr = arr.reshape((batch_size, -1))
    
    for n in range(0, arr.shape[1], seq_length):
        x = arr[:, n:n+seq_length]
        y = np.zeros_like(x)
        try:
            y[:,:-1], y[:,-1] = x[:,1:], arr[:, n+seq_length]
        except IndexError:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
        yield x, y 

Now I'll make some data sets and we can check out what's going on as we batch data. Here I am going to use a batch size of 8 and 50 sequence steps.

In [14]:
batches = get_batches(encoded, 8, 50)
x, y = next(batches)

In [17]:
print('x/n', x[:10, :10])
print('\ny\n', y[:10, :10])

x/n [[77 42 11 58 19 22 30 20 38 62]
 [49 76 46 20 19 42 11 19 20 11]
 [22 46 75 20 76 30 20 11 20 41]
 [49 20 19 42 22 20 53 42 17 22]
 [20 49 11  7 20 42 22 30 20 19]
 [53 18 49 49 17 76 46 20 11 46]
 [20 12 46 46 11 20 42 11 75 20]
 [31  8 15 76 46 49 45 35 50 20]]

y
 [[42 11 58 19 22 30 20 38 62 62]
 [76 46 20 19 42 11 19 20 11 19]
 [46 75 20 76 30 20 11 20 41 76]
 [20 19 42 22 20 53 42 17 22 41]
 [49 11  7 20 42 22 30 20 19 22]
 [18 49 49 17 76 46 20 11 46 75]
 [12 46 46 11 20 42 11 75 20 49]
 [ 8 15 76 46 49 45 35 50 20 74]]


## Building the Network:

In [42]:
class CharRNN(nn.Module):
    def __init__(self, tokens, n_hidden=256, n_layers=2, drop_prob=0.5, lr=0.001):
        super().__init__()
        self.drop_prob = drop_prob
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.lr = lr
        
        self.chars = tokens
        self.int2char = dict(enumerate(self.chars))
        self.char2int = {ch: ii for ii, ch in self.int2char.items()}
        
        self.lstm = nn.LSTM(len(self.chars), n_hidden, n_layers, dropout=drop_prob, batch_first=True)
        self.dropout = nn.Dropout(drop_prob)
        self.fc = nn.Linear(n_hidden, len(self.chars))
        
    def forward(self, x, hidden):
        r_output, hidden = self.lstm(x, hidden)
        out = self.dropout(r_output)
        out = out.contiguous().view(-1, self.n_hidden)
        out = self.fc(out)
        return out, hidden
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        
        hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(), weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
        
        return hidden       

In [43]:
def train(net, data, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
    ''' Training a network 
    
        Arguments
        ---------
        
        net: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        batch_size: Number of mini-sequences per mini-batch, aka batch size
        seq_length: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        print_every: Number of steps for printing training and validation loss
    
    '''
    
    net.train()
    opt = torch.optim.Adam(net.parameters(), lr = lr)
    criterion = nn.CrossEntropyLoss()
    
    val_idx = int(len(data)*(1-val_frac))
    data, val_data = data[:val_idx], data[val_idx:]
    
    counter = 0
    n_chars = len(net.chars)
    for e in range(epochs):
        h = net.init_hidden(batch_size)
        
        for x, y in get_batches(data, batch_size, seq_length):
            
            counter += 1
            x = one_hot_encode(x, n_chars)
            inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
            
            h = tuple([each.data for each in h])

            net.zero_grad()
            
            output, h = net(inputs, h)
            
            loss = criterion(output, targets.view(batch_size*seq_length).long())
            loss.backward()
            nn.utils.clip_grad_norm_(net.parameters(), clip)
            opt.step()
            
            # loss stats
            if counter % print_every == 0:
                val_h = net.init_hidden(batch_size)
                val_losses = []
                net.eval()
                for x, y in get_batches(val_data, batch_size, seq_length):
                    x = one_hot_encode(x, n_chars)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)
                    
                    val_h = tuple([each.data for each in val_h])
                    
                    inputs, targets = x, y                                         
                    output, val_h = net(inputs, val_h)
                    val_loss = criterion(output, targets.view(batch_size*seq_length).long())
                
                    val_losses.append(val_loss.item())
                
                net.train()
                
                print("Epoch: {}/{}...".format(e+1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))

## Instantiating the model:

In [44]:
n_hidden=512
n_layers=2

net = CharRNN(chars, n_hidden, n_layers)
print(net)

CharRNN(
  (lstm): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


In [49]:
batch_size = 128
seq_length = 100
n_epochs = 30

## Training the Model:

train(net, encoded, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length, lr=0.001, print_every=10)

## Checkpoint:

After training, we will save the model so we can load it again later is we need to. I am saving the parameters needed to create the same architecture, the hidden layer hyperparameters and the next characters.

In [51]:
model_name = 'rnn_30_spoch.net'
checkpoint = {'n_hidden': net.n_hidden,
              'n_layers': net.n_layers,
              'state_dict': net.state_dict(),
              'tokens' : net.chars}

with open(model_name, 'wb') as f:
    torch.save(checkpoint, f)

## Making Predictions:

Now that out mddol is trained, we can make predictions about the next characters. We can sample the text and make it more resoanble to handle.
To sample, we pass in a character and have the network predict the next character. Then we take that character, pass it back in, and get another predicted character.

###### To make the prediction more resonable we will do "Top K - sampling." 
Thsi will prevent the network from giving us completely absurd characters while allowing it to introduce some noise and randomness into the sampled text.

In [69]:
def predict(net, char, h = None, top_k = None):
    x = np.array([[net.char2int[char]]])
    x = one_hot_encode(x, len(net.chars))
    inputs = torch.from_numpy(x)
    
    h = tuple([each.data for each in h])
    out, h = net(inputs, h)
    
    p = F.softmax(out, dim =1).data
    
    #GETTING TOP CHARACTERS
    
    if top_k is None:
        top_ch = np.arange(len(net.chars))
    else:
        p, top_ch = p.topk(top_k)
        top_ch = top_ch.numpy().squeeze()
        
    p = p.numpy().squeeze()
    char = np.random.choice(top_ch, p=p/p.sum())
    
    return net.int2char[char], h

#### Priming and Generating Texts:

Priming otherwise the network will start out generating characters at random. In general the first bunch of characters will be a little rough since it hasn't built up a long history of characters to predict from.

In [70]:
def sample(net, size, prime='The', top_k=None):
    net.eval() 
    chars = [ch for ch in prime]
    h = net.init_hidden(1)
    for ch in prime:
        char, h = predict(net, ch, h, top_k=top_k)

    chars.append(char)
    
    for ii in range(size):
        char, h = predict(net, chars[-1], h, top_k=top_k)
        chars.append(char)

    return ''.join(chars)

In [74]:
print(sample(net, 1000, prime='Aditya ', top_k=5))

Aditya the whole
morning.

The stables of her hands was so much that to take it to him alone, and had the
plans that this was the position to holse-stalling and service.

The plincipates of his foot was a strong shade of clear, talked,
and the frowned, tortured a smrile, and so she had so trying to be
true in his brother as the most careful figure when he had been a
meeting of the money, and an intention of sours. His children
had servonce alone to thinks of his sufferings, his silence, to be
alone, and had said to him, and all to go away a ludicrous several times
that all the chief's brown shining.

"Anna, the more ore of the province," he thought.



Chapter 31


Anna had been sorry to spay, though she would send all day to dinn to
her far short heart all the crapper actoon, towards to his children
as she was almost a firm in the diffurent races, but, had
asked him to be able to she say something at once to say something. He
was a face and setting of her eyes, and the servants had sa