<a href="https://colab.research.google.com/github/palash04/Artificial-Intelligence/blob/master/Neural_Networks/RNN/_03_Character_Level_RNN_with_Pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Character Level LSTM

#### The network will train character by character on some text, then generate new text character by character. As an example, we will train on Anna Karenina. This model will be able to generate new text based on the text from the book!

## Genreral Architecture of character-wise RNN
![Screenshot 2020-07-18 at 18 44 33](https://user-images.githubusercontent.com/26361028/87853322-b9b7eb80-c926-11ea-95b7-693375822bde.png)


In [1]:
datapath = '/content/drive/My Drive/Artificial Intelligence/DataSet/RNN/anna.txt'

## Importing libraries

In [2]:
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F

## Load in data

### Load the Anna Karenina text file and convert it into integers for our networks to use

In [3]:
# open text file and read in data as text
with open(datapath,'r') as f:
  text = f.read()

In [4]:
# checking the first 100 characters
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

## Tokenization

In [5]:
# Creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in 
# the network

# Encode the text and map each character to an integer and vice versa
# Creating dictionaries:
# 1. int2char, which maps integers to characters
# 2. char2int, which maps characters to integers
chars = tuple(set(text))
int2char = dict(enumerate(chars))
char2int = {ch:ii for ii,ch in int2char.items()}

# encode the text
encoded = np.array([char2int[ch] for ch in text])

In [6]:
len(chars)

83

In [7]:
int2char[82]

'r'

In [8]:
char2int['P']

2

In [9]:
# Let's see the first 100 encoded characters
encoded[:100]

array([71, 34, 68, 81, 74, 21, 82, 41, 42, 63, 63, 63, 79, 68, 81, 81, 55,
       41, 36, 68, 50, 72, 56, 72, 21, 60, 41, 68, 82, 21, 41, 68, 56, 56,
       41, 68, 56, 72, 26, 21, 19, 41, 21, 70, 21, 82, 55, 41, 75,  0, 34,
       68, 81, 81, 55, 41, 36, 68, 50, 72, 56, 55, 41, 72, 60, 41, 75,  0,
       34, 68, 81, 81, 55, 41, 72,  0, 41, 72, 74, 60, 41, 40, 14,  0, 63,
       14, 68, 55,  6, 63, 63,  3, 70, 21, 82, 55, 74, 34, 72,  0])

In [10]:
print (int2char[12])

q


## Preprocessing the data
As you can see in our char-RNN image above, our LSTM expects an input that is one-hot encoded meaning that each character is converted into an integer (via our created dictionary) and then converted into a column vector where only it's corresponding integer index will have the value of 1 and the rest of the vector will be filled with 0's. Since we're one-hot encoding the data, let's make a function to do that!

In [11]:
def one_hot_encode(arr, n_labels):
  # Initialize the encoded array
  one_hot = np.zeros((np.multiply(*arr.shape), n_labels), dtype=np.float32)

  # Fill the appropriate elements with ones
  one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.

  # Finally reshape it to get back to the original array
  one_hot = one_hot.reshape((*arr.shape,n_labels))

  return one_hot

In [12]:
# check that the function works as expected
test_seq = np.array([[3,5,1]])
one_hot = one_hot_encode(test_seq, 8)

print (one_hot)

[[[0. 0. 0. 1. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 1. 0. 0.]
  [0. 1. 0. 0. 0. 0. 0. 0.]]]


## Making training mini-batches

![Screenshot 2020-07-18 at 19 42 34](https://user-images.githubusercontent.com/26361028/87854322-d526f480-c92e-11ea-86ea-0b5334178969.png)


In [13]:
def get_batches(arr, batch_size, seq_length):
  batch_size_total = batch_size * seq_length
  # total number of batches we can make
  n_batches = len(arr)//batch_size_total

  # keep only enough characters to make full batches
  arr = arr[:n_batches * batch_size_total]
  # reshape into batch size rows
  arr = arr.reshape((batch_size,-1))

  # iterate through the array, one sequence at a time
  for n in range(0, arr.shape[1], seq_length):
    # The features
    x = arr[:, n:n+seq_length]
    # The targets, shifted by one
    y = np.zeros_like(x)
    try:
        y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n+seq_length]
    except IndexError:
        y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
    yield x, y

### Testing the implementation

In [14]:
batches = get_batches(encoded, 8, 50)
x,y = next(batches)

In [15]:
# Printing out the first 10 items in a sequence
print ('x\n', x[:10,:10])
print ('\ny\n',y[:10,:10])

x
 [[71 34 68 81 74 21 82 41 42 63]
 [60 40  0 41 74 34 68 74 41 68]
 [21  0 61 41 40 82 41 68 41 36]
 [60 41 74 34 21 41  8 34 72 21]
 [41 60 68 14 41 34 21 82 41 74]
 [ 8 75 60 60 72 40  0 41 68  0]
 [41 29  0  0 68 41 34 68 61 41]
 [78 44 56 40  0 60 26 55  6 41]]

y
 [[34 68 81 74 21 82 41 42 63 63]
 [40  0 41 74 34 68 74 41 68 74]
 [ 0 61 41 40 82 41 68 41 36 40]
 [41 74 34 21 41  8 34 72 21 36]
 [60 68 14 41 34 21 82 41 74 21]
 [75 60 60 72 40  0 41 68  0 61]
 [29  0  0 68 41 34 68 61 41 60]
 [44 56 40  0 60 26 55  6 41 49]]


# Defining the layer wih pytorch
![Screenshot 2020-07-19 at 08 56 05](https://user-images.githubusercontent.com/26361028/87866295-b0b63100-c99d-11ea-908a-a2c4d8e1e57a.png)


### Model Structure - 
In __ _init___ the suggested structure is as follows - 
- Create and store the necessary dictionaries
- Define the LSTM layer which takes as params: an input size(the number of characters), a hidden layer size(n_hidden), a number of layers(n_layers), a dropout probability (drop_prob), and a batch first boolean (True, since we are batching).
- Define a dropout layer with dropout prob
- Define a fully connected layer with params: input size, n_hidden, and output size (the number of characters).
- Finally initialize the weights. 

### LSTM Inputs/Outputs
Basic LSTM layer looks like as follows:
self.lstm = nn.LSTM(input_size, n_hidden, n_layers, dropout=drop_prob,batch_first=True)

where input_size is the number of characters this cell expects to see as sequential input, and n_hidden is the number of units in the hidden layers in the cell. And we can add dropout by adding a dropout parameter with a specified probability; this will automatically add dropout to the inputs or outputs. Finally, in the forward function, we can stack up the LSTM cells into layers using .view. With this, you pass in a list of cells and it will send the output of one cell into the next cell.

In [16]:
# Check if GPU is available
train_on_gpu = torch.cuda.is_available()
if train_on_gpu:
  print ('Training on gpu')
else:
  print ('No gpu available, training on cpu. Consider making n_epochs very small')

Training on gpu


In [17]:
class CharRNN(nn.Module):
  def __init__(self, tokens, n_hidden = 256, n_layers = 2, drop_prob=0.5, lr=0.001):
    super().__init__()
    self.drop_prob = drop_prob
    self.n_layers = n_layers
    self.n_hidden = n_hidden
    self.lr = lr

    # creating character dictionaries
    self.chars = tokens
    self.int2char = dict(enumerate(self.chars))
    self.char2int = {ch : ii for ii, ch in self.int2char.items()}

    # Define the LSTM
    self.lstm = nn.LSTM(len(self.chars), n_hidden, n_layers, dropout=drop_prob, batch_first = True)

    # Define a dropout layer
    self.dropout = nn.Dropout(drop_prob)

    # Define the final, fully connected output layer
    self.fc = nn.Linear(n_hidden, len(self.chars))

  def forward(self,x,hidden):
    # Get the outputs and the new hidden state from the lstm
    r_output, hidden = self.lstm(x,hidden)

    # pass through a dropout layer
    out = self.dropout(r_output)

    # Stack up LSTM outputs
    out = out.contiguous().view(-1, self.n_hidden)

    # put out throght fully connected layer
    out = self.fc(out)

    return out, hidden

  def init_hidden(self, batch_size):
    # Initialize hidden state
    weight = next(self.parameters()).data

    if train_on_gpu:
      hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
                weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
    else:
      hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
                weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
    return hidden

# Train the Network


In [21]:
def train(net, data, epochs = 10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1,print_every=10):
  ''' Training a network 
    
        Arguments
        ---------
        
        net: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        batch_size: Number of mini-sequences per mini-batch, aka batch size
        seq_length: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        print_every: Number of steps for printing training and validation loss
    
  '''
  net.train()
  opt = torch.optim.Adam(net.parameters(), lr = lr)
  criterion = nn.CrossEntropyLoss()

  # Create training and validation data
  val_idx = int(len(data)*(1-val_frac))
  data, val_data = data[:val_idx], data[val_idx:]

  if train_on_gpu:
    net.cuda()
  
  counter = 0
  n_chars = len(net.chars)
  for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    for x,y in get_batches(data, batch_size, seq_length):
      counter += 1

      # One hot encode our data, and make them torch Tensors
      x = one_hot_encode(x, n_chars)
      inputs, targets = torch.from_numpy(x), torch.from_numpy(y)

      if train_on_gpu:
        inputs, targets = inputs.cuda(), targets.cuda()
      
      # Creating new variables for the hidden state, otherwise
      # we'd backprop through the entire training history
      h = tuple([each.data for each in h])

      # zero accumulated gradients
      net.zero_grad()

      # get the output from the model
      output, h = net(inputs, h)

      # calculate the loss and perform backprop
      loss = criterion(output, targets.view(batch_size*seq_length).long())
      loss.backward()
      # clip_grad_norm helps prevent the exploding gradient problem in RNN / LSTM
      nn.utils.clip_grad_norm_(net.parameters(), clip)
      opt.step()
    
      # loss stats
      if counter % print_every == 0:
        # Get validation loss
        val_h = net.init_hidden(batch_size)
        val_losses = []
        net.eval()
        for x, y in get_batches(val_data, batch_size, seq_length):
            # One-hot encode our data and make them Torch tensors
            x = one_hot_encode(x, n_chars)
            x, y = torch.from_numpy(x), torch.from_numpy(y)
            
            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            val_h = tuple([each.data for each in val_h])
            
            inputs, targets = x, y
            if(train_on_gpu):
                inputs, targets = inputs.cuda(), targets.cuda()

            output, val_h = net(inputs, val_h)
            val_loss = criterion(output, targets.view(batch_size*seq_length).long())
        
            val_losses.append(val_loss.item())
        net.train() # reset to train mode after iterationg through validation data

        print("Epoch: {}/{}...".format(e+1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))


## Instantiating the model
Now we can actually train the network.

In [22]:
# define and print the net
n_hidden = 512
n_layers = 2

net = CharRNN(chars, n_hidden, n_layers)
print (net)

CharRNN(
  (lstm): LSTM(83, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=512, out_features=83, bias=True)
)


In [23]:
batch_size = 128
seq_length = 100
n_epochs = 20   # start smaller if you are just testing initial behavior

# train the model
train(net, encoded, epochs=n_epochs, batch_size=batch_size, seq_length=seq_length, lr=0.001, print_every=10)

Epoch: 1/20... Step: 10... Loss: 3.2708... Val Loss: 3.1814
Epoch: 1/20... Step: 20... Loss: 3.1521... Val Loss: 3.1302
Epoch: 1/20... Step: 30... Loss: 3.1445... Val Loss: 3.1204
Epoch: 1/20... Step: 40... Loss: 3.1154... Val Loss: 3.1186
Epoch: 1/20... Step: 50... Loss: 3.1415... Val Loss: 3.1181
Epoch: 1/20... Step: 60... Loss: 3.1162... Val Loss: 3.1160
Epoch: 1/20... Step: 70... Loss: 3.1115... Val Loss: 3.1152
Epoch: 1/20... Step: 80... Loss: 3.1234... Val Loss: 3.1129
Epoch: 1/20... Step: 90... Loss: 3.1242... Val Loss: 3.1078
Epoch: 1/20... Step: 100... Loss: 3.1074... Val Loss: 3.1001
Epoch: 1/20... Step: 110... Loss: 3.0949... Val Loss: 3.0815
Epoch: 1/20... Step: 120... Loss: 3.0433... Val Loss: 3.0400
Epoch: 1/20... Step: 130... Loss: 3.0003... Val Loss: 2.9717
Epoch: 2/20... Step: 140... Loss: 2.9444... Val Loss: 2.9105
Epoch: 2/20... Step: 150... Loss: 2.8438... Val Loss: 2.7852
Epoch: 2/20... Step: 160... Loss: 2.7475... Val Loss: 2.6990
Epoch: 2/20... Step: 170... Loss:

# Checkpoint

In [24]:
# change the name, for saving multiple files
model_name = '/content/drive/My Drive/Artificial Intelligence/rnn_20_epoch.net'

checkpoint = {'n_hidden': net.n_hidden,
              'n_layers': net.n_layers,
              'state_dict': net.state_dict(),
              'tokens': net.chars}

with open(model_name, 'wb') as f:
    torch.save(checkpoint, f)

# Making Predictions
Now that the model is trained, we'll want to sample from it and make predictions about next characters! To sample, we pass in a character and have the network predict the next character. Then we take that character, pass it back in, and get another predicted character. Just keep doing this and you'll generate a bunch of text!

In [25]:
def predict(net, char, h=None, top_k=None):
        ''' Given a character, predict the next character.
            Returns the predicted character and the hidden state.
        '''
        
        # tensor inputs
        x = np.array([[net.char2int[char]]])
        x = one_hot_encode(x, len(net.chars))
        inputs = torch.from_numpy(x)
        
        if(train_on_gpu):
            inputs = inputs.cuda()
        
        # detach hidden state from history
        h = tuple([each.data for each in h])
        # get the output of the model
        out, h = net(inputs, h)

        # get the character probabilities
        p = F.softmax(out, dim=1).data
        if(train_on_gpu):
            p = p.cpu() # move to cpu
        
        # get top characters
        if top_k is None:
            top_ch = np.arange(len(net.chars))
        else:
            p, top_ch = p.topk(top_k)
            top_ch = top_ch.numpy().squeeze()
        
        # select the likely next character with some element of randomness
        p = p.numpy().squeeze()
        char = np.random.choice(top_ch, p=p/p.sum())
        
        # return the encoded value of the predicted char and the hidden state
        return net.int2char[char], h

### Priming and generating text

Typically you'll want to prime the network so you can build up a hidden state. Otherwise the network will start out generating characters at random. In general the first bunch of characters will be a little rough since it hasn't built up a long history of characters to predict from.

In [26]:
def sample(net, size, prime='The', top_k=None):
        
    if(train_on_gpu):
        net.cuda()
    else:
        net.cpu()
    
    net.eval() # eval mode
    
    # First off, run through the prime characters
    chars = [ch for ch in prime]
    h = net.init_hidden(1)
    for ch in prime:
        char, h = predict(net, ch, h, top_k=top_k)

    chars.append(char)
    
    # Now pass in the previous character and get a new one
    for ii in range(size):
        char, h = predict(net, chars[-1], h, top_k=top_k)
        chars.append(char)

    return ''.join(chars)

In [27]:
print(sample(net, 1000, prime='Anna', top_k=5))

Anna Anna had always been at from them to say a charming, but there was no time that he had to be
the stream of heart for this mind in a calition and horror. He husband had stopped in, and all the countess, she would have been faring this happiness of when that in the couries of sevent of the steps would not take the meeting attraction on her supporition of sent the criel, there were all
the conversation that that would be to be so seeing that it was not in at the chain of them, and had been satisfied that there could not see him her face, that they was already a fine soul in his head."

The more sorring, he could not help, the service were stood a little brisk, the sole waists and to be started over some sensing a sort in
her. The portrait the same work
and shirt that was the best a feeling of thought in, talked away at expression of his heart. They
saw it to be feeling to be discussing the money, and went to a
little cape, he writed
the pleasant coat, she was not so interesting.

She

## Loading a checkpoint


In [28]:
# Here we have loaded in a model that trained over 20 epochs `rnn_20_epoch.net`
with open(model_name, 'rb') as f:
    checkpoint = torch.load(f)
    
loaded = CharRNN(checkpoint['tokens'], n_hidden=checkpoint['n_hidden'], n_layers=checkpoint['n_layers'])
loaded.load_state_dict(checkpoint['state_dict'])

<All keys matched successfully>

In [29]:
# Sample using a loaded model
print(sample(loaded, 2000, top_k=5, prime="And Levin said"))

And Levin said to her
the back of his father with the same chair of the pate, and had been before the serious sick and
tried now and taken the people of the cross and almost,
he was a found in her strange and that share of his feet that the position was not to dinner and heard the position, but all the same toothing was her tond of the fresh alone of the midst of all that there was an ordinary through something another face on the princess. Alexey
Alexandrovitch chomed her hair, with her friend, had
a perfect of the
most crassed settle any people
and her eyes
were assertanily to she had not been atreed it, but so that he had to go on with horses, and his bottle waiting over the meetis antwer and the
mastal, but was at all to have such a man to be still still so something there and their princismens. He had been discreated. But since he had breathing on his strangere standing
to the same smare. "What is a pity."

She had been brought the more
for him that she saw
a believe in the
most c

In [31]:
# Sample using a loaded model
print(sample(loaded, 2000, top_k=10, prime="God"))

God of side, when Levin were to dinner for finter her
family, and had been not and all the way, till the carriage, and
her hard their course of the standing to cress on the wild. But outside
in society that he had
telled the bivelable of the strange.
And her most doge, and
would talk to the close of her, or so, from the carriage of the water as though in the results. Levin was beginning
to repressed the more again.

The looked tooter he could not tell him some tears there was not friendly at the trap. Levin was said, feeling it, exhearting the bashes, from a faces when the princess fool which he was dose bread to her. And one moment his seed about the position with him and
all seriously to go off and actual and words, which and disture, said
that were for his erect arouped into timielity there had heard the sick wife her sons. He lasted on his braid of death, but fir the some door of children he saw with his and furity-in-laborers. Anna felt that he suddenly both had been sunding all o