<a href="https://colab.research.google.com/github/purvasingh96/Deep-learning-with-neural-networks/blob/master/Deep-learning-with-pytorch/3.%20Recurrent%20Neural%20Networks/Char_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text generation with an RNN


This notebook demonstrates how to generate text using a **charecter-level LSTM with PyTorch** using dataset from the book **Anna Karenina**. Given a sequence of charecter from this book, the model will generate longer sequences of data by calling the model repeatedly.

While some of the sentences are grammatical, most do not make sense. The model has not learned the meaning of words, but consider:

* The model is character-based. When training started, the model did not know how to spell an English word, or that words were even a unit of text.

* The structure of the output resembles a play—blocks of text generally begin with a speaker name, in all capital letters similar to the dataset.

* As demonstrated below, the model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure. Below is the **general architecture of the character-wise RNN.**<br>
<img src="https://github.com/purvasingh96/Deep-learning-with-neural-networks/blob/master/Deep-learning-with-pytorch/3.%20Recurrent%20Neural%20Networks/images/lstm_rnn_architecture.png?raw=1"></img> 




# Set Up
### Import PyTorch and other libraries


In [0]:
import torch
from torch import nn
import torch.nn.functional as F
import numpy as np


# Download the Anna Karenina data
 

### Read the data

In [0]:
with open('sample_data/anna.txt', 'r') as f:
  text = f.read()

### First look at the text

In [9]:
print(text[:100])

Chapter 1


Happy families are all alike; every unhappy family is unhappy in its own
way.

Everythin


### GPU Usage
Enable GPU acceleration to execute this notebook faster. In Colab: *Runtime > Change runtime type > Hardware acclerator > GPU*. If running locally make sure TensorFlow version >= 1.11.

# Process the text
### Vectorize the text (Tokenization)
Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [10]:
chars = tuple(set(text))
int2char = dict(enumerate(chars))
char2int = {ch:ii for ii, ch in int2char.items()}

# encode text
encoded = np.array([char2int[ch] for ch in text])
encoded

array([63, 61, 26, ..., 19, 28, 61])

# Pre-processing the data
Our LSTM expects an input that is **one-hot encoded** meaning that each character is converted into an integer (via our created dictionary) and then converted into a column vector where only it's corresponding integer index will have the value of 1 and the rest of the vector will be filled with 0's.

In [0]:
def one_hot_encode(arr, n_labels):
  # initialize encoded array
  # arr.shape = (3,8)
  # np.arange(3) = [0, 1, 2]
  # arr.flatten() = ([[1, 2, 3]]) => ([1, 2, 3])
  one_hot = np.zeros((arr.size, n_labels), dtype=np.float32)
  one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1
  one_hot = one_hot.reshape((*arr.shape, n_labels))
  return one_hot

In [12]:
test_seq = np.array([[3, 4, 5]])
one_hot_encode(test_seq, 8)

array([[[0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0.]]], dtype=float32)

# Making training mini-batches

To train on this data, we create mini batches for training. We want our batches to be multiple sequences of some desired number of sequence steps as below-<br>
<img src="https://github.com/purvasingh96/Deep-learning-with-neural-networks/blob/master/Deep-learning-with-pytorch/3.%20Recurrent%20Neural%20Networks/images/mini_batch_1.png?raw=1"></img><br><br>
In this example, we'll take the encoded characters (passed in as the arr parameter) and split them into multiple sequences, given by batch_size. Each of our sequences will be seq_length long.

# Creating Batches

### 1. Discard text to accomodate completely full mini-batches

* batch_size = `N (2)`
* seq_length = `M (3)`
* no. of charecters in one batch =` N * M (2 * 3 = 6 )`
* Total batches `(K)` that can be made out of the given array :

`len(arr)/ (no. of charecters per batch) = 12/6 = 2`

* Total charecters in array to be kept in-order to accomodate completely full mini-batch - 

`arr[:N * M * K] = uptil arr[10]` (discarding arr[11]=12)




### 2. Split the array into N batches
You can do this by using :<br>`arr.reshape((batch_size, -1))`.<br>
After this the size of array should be -<br>
`N * (M * K)`




### 3. Iterate through mini-batches
The idea is, each batch is of size `(N * M) window` on `N * (M * K) array`. This window slides over by `seq_length`. We also want both input and target arrays.
<br>
Target arrays are basically input arrays shifted over by one charecter.

In [0]:
def get_batches(arr, batch_size, seq_length):
  total_batch_size = batch_size*seq_length
  n_batches = len(arr)//total_batch_size
  arr = arr[:n_batches*total_batch_size]
  arr = arr.reshape((batch_size, -1))
  print(arr.shape)
  # iterate through array, on seq_length at a time
  for n in range(0, arr.shape[1], seq_length):
    x = arr[:, n:n+seq_length]
    y = np.zeros_like(x)
    try:
      y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n+seq_length]
    except IndexError:
      y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
    yield x, y

In [24]:
batches = get_batches(encoded, 8, 50)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

(8, 80250)
x
 [[63 61 26  9 28  8  0 19 50 35]
 [74 13 12 68 19 43 26 73  8 68]
 [ 9 74 12 12 70 69 43  8 10 19]
 [19 12 74 21  8 19 12 74  0 28]
 [28  8 12 12 19  4  0 74 21 19]
 [28 19 69  8 68 19  9  0 26 38]
 [12 12 70 12 28 26 13 28 68 19]
 [ 8 72 68 19  9  0 70 13 73  8]]

y
 [[61 26  9 28  8  0 19 50 35 35]
 [13 12 68 19 43 26 73  8 68 19]
 [74 12 12 70 69 43  8 10 19 56]
 [12 74 21  8 19 12 74  0 28 19]
 [ 8 12 12 19  4  0 74 21 19 28]
 [19 69  8 68 19  9  0 26 38 11]
 [12 70 12 28 26 13 28 68 19 28]
 [72 68 19  9  0 70 13 73  8 11]]


# Defining the network with PyTorch
Below is sample structure of our LSTM model: <br>
<img src="https://github.com/purvasingh96/Deep-learning-with-neural-networks/blob/master/Deep-learning-with-pytorch/3.%20Recurrent%20Neural%20Networks/data/15.%20rnn_classifier.png?raw=1"></img><br>
We will use PyTorch to define the model's architecture and define the forward pass method as well.



### Model Structure
In `__init__` followinf structure can be defined -<br>
* Storing necessasry dictionaries (int2char, char2int)
* Defining LSTM layer that takes the following parameters - 
  * `input size`
  * hidden layer size (`n_hidden`)
  * number of layers (`n_layers`)
  * dropout probability (`drop_prob`)
  * Boolean batch first (`batch_first`)


### LSTM Inputs/Outputs
Basic LSTM can be created as follows - 
```python
self.lstm = nn.LSTM(input_size, n_hidden, n_layers, 
                            dropout=drop_prob, batch_first=True)
 ```
An initial hidden state of all zeros needs to be created as well -<br>
```python
self.init_hidden()
``` 


In [26]:
# check if GPU is available
train_on_gpu = torch.cuda.is_available()
if(train_on_gpu):
    print('Training on GPU!')
else: 
    print('No GPU available, training on CPU; consider making n_epochs very small.')

Training on GPU!


In [0]:
class CharRNN(nn.Module):
  def __init__(self, tokens, n_hidden=256, n_layers=2, drop_prob=0.5, lr=0.001):
    super().__init__()
    self.drop_prob = drop_prob
    self.n_layers = n_layers
    self.n_hidden = n_hidden
    self.lr = lr

    # creating charecter dictionaries
    self.chars = tokens
    self.int2char = dict(enumerate(self.chars))
    self.char2int = {ch:ii for ii, ch in self.int2char.items()}

    # defining LSTM model
    self.lstm = nn.LSTM(len(self.chars), n_hidden, n_layers, batch_first=True, dropout=drop_prob)

    # defining dropout layer
    self.dropout = nn.Dropout(drop_prob)

    # defining final fully-connected layer
    self.fc = nn.Linear(n_hidden, len(self.chars))

  def forward(self, x, hidden):
    # lstm will generate new output and new hidden state
    r_output, hidden = self.lstm(x, hidden)

    # passing x output through dropout layer
    out = self.dropout(t_output)

    # stacking LSTM using view
    # Using contigious to reshape output
    out = out.contigious().view(-1, self.n_hidden)

    # put out through fully connected layer
    out = self.fc(out)

    # returning final output and hidden state
    return out, hidden
  
  def init_hidden(self, batch_size):
    # creating 2 new tensors
    # size = n_layers * batch_size * n_hidden
    # initialize to 0 for hidden and cell state of LSTM
    weight = next(self.parameters()).data
    print(self.parameters())
    if(train_on_gpu):
      hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
                weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
    else:
      hidden - (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
                weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
    return hidden



# Time to Train

Below we will use **Adam optimizer and Cross Entropy**. We calculate loss and perform backpropogation as usual. Few points to note -<br>
* Within the batch loop, we detach the hidden state from its history; this time setting it equal to a new tuple variable because an LSTM has a hidden state that is a tuple of the hidden and cell states.
* We use clip_grad_norm_ to help prevent exploding gradients.

In [0]:
def train(net, data, epochs=10, batch_size=10, seq_length=50, lr=0.001, clip=5, val_frac=0.1, print_every=10):
  net.train()
  opt = torch.optim.Adam(net.parameters(), lr=lr)
  criterion = nn.CrossEntropyLoss()

  # creating training and validation data
  val_idx = int(len(data)*(1-val_frac))
  data, val_data = data[:val_idx], data[val_idx:]

  if(train_on_gpu):
    net.cuda()
  counter = 0
  n_chars = len(net.chars)
  for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    for x, y in get_batches(data, batch_size, seq_length):
      counter += 1

      # one hot encode our data and make them torch tensors
      x = one_hot_encode(x,n_chars)
      inputs, targets = torch.from_numpy(x), torch.from_numpy(y)

      if(train_on_gpu):
        inputs, targets = inputs.cuda(), targets.cuda()
        
      # create new hidden state variable to 
      # avoid traversing entire history 
      h = tuple([each.data for each in h])

      net.zero_grad()
      output, h = net(inputs, h)

      loss = criterion(output, targets.view(batch_size*seq_length).long())
      loss.backward()

      nn.utils.clip_grad_norm(net.parameters(), clip)
      opt.step()

      # loss statistics
      if counter%print_every == 0:
        val_h = net.init_hidden(batch_size)
        val_losses = []
        net.eval()
        for x, y in get_batches(val_data, batch_size, seq_length):
          x = one_hot_encode(x, n_chars)
          x, y = torch.from_numpy(x), torch.from_numpy(y)

          val_h = tuple([each.data for each in val_h])

          inputs, targets = x, y
          if(train_on_gpu):
            inputs, targets = inputs.cuda(), targets.cuda()
            
          output, val_h = net(inputs, val_h)
          val_loss = criterion(output, targets.view(batch_size*seq_length).long()) 

          val_losses.append(val_loss.item())  

        net.train()

### Instantiating the model
Before training the model, we will first create the network with some given hyper-parameters. Then define mini-batches and start training.
