<a href="https://colab.research.google.com/github/abhaysrivastav/ComputerVision/blob/master/Chararacter_Level_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Prior Work

This notebook is based on work by [abhaysrivastav](https://github.com/abhaysrivastav/ComputerVision). 

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). 

In [1]:
# load all libraries and shared functions
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
from shared_lstm_functions import *
jojo_test()


All of our functions are here in memory....


## Load in Data

We're going to use "Call of the Wild" here as our training text. 

In [2]:
# open text file and read in data as `text`
with open('/Users/joshpause/Desktop/Experiments/WhatIsIntelligence/part4/data/the_call_of_the_wild.txt', 'r') as f:
    text = f.read()
text[:100]    

'Chapter I. Into the Primitive\n\n“Old longings nomadic leap,\nChafing at custom’s chain;\nAgain from its'

In [3]:
# encode the text and map each character to an integer and vice versa
chars = tuple(set(text))
int2char = dict(enumerate(chars))
char2int = {ch: ii for ii, ch in int2char.items()}
encoded = np.array([char2int[ch] for ch in text])
encoded[:100]

array([44, 35, 27, 58, 51, 64, 12,  7, 42, 19,  7, 42, 61, 51, 56,  7, 51,
       35, 64,  7, 71, 12, 13,  6, 13, 51, 13, 67, 64, 57, 57, 47, 72, 45,
       10,  7, 45, 56, 61, 48, 13, 61, 48, 17,  7, 61, 56,  6, 27, 10, 13,
       32,  7, 45, 64, 27, 58, 24, 57, 44, 35, 27, 23, 13, 61, 48,  7, 27,
       51,  7, 32, 36, 17, 51, 56,  6, 59, 17,  7, 32, 35, 27, 13, 61, 73,
       57, 25, 48, 27, 13, 61,  7, 23, 12, 56,  6,  7, 13, 51, 17])

In [4]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[44 35 27 58 51 64 12  7 42 19]
 [ 7 51 35 64  6 24  7 27 61 10]
 [11 27 12 10  7 49 45 36 61 10]
 [27 61 51 64 10  7 13 51 19  7]
 [32 35 19  7 26 64  7 10 13 10]
 [17  7 27 61 10  7 27 45 45 24]
 [ 7 35 13  6 19  7 26 13 17  7]
 [61 10  7 45 36 12 13 61 48 24]
 [67 64 61  7 69 27 51 51 35 64]
 [ 7 51 35 64 57 58 27 12 51 61]]

y
 [[35 27 58 51 64 12  7 42 19  7]
 [51 35 64  6 24  7 27 61 10  7]
 [27 12 10  7 49 45 36 61 10 64]
 [61 51 64 10  7 13 51 19  7 26]
 [35 19  7 26 64  7 10 13 10  7]
 [ 7 27 61 10  7 27 45 45 24  7]
 [35 13  6 19  7 26 13 17  7  6]
 [10  7 45 36 12 13 61 48 24  7]
 [64 61  7 69 27 51 51 35 64 11]
 [51 35 64 57 58 27 12 51 61 64]]


## Time to train

In [5]:
if 'net' in locals():
    del net

In [6]:
# define and print the net
net = CharRNN(chars, n_hidden=82, n_layers=3, drop_prob=0.5)
print(net)

CharRNN(
  (lstm): LSTM(74, 82, num_layers=3, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=82, out_features=74, bias=True)
)


In [7]:
n_seqs = 128 # Number of sequences running through the network in one pass.
n_steps = 100 # Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.
n_epochs = 100 # Number of epochs
train(net, encoded, epochs=n_epochs, n_seqs=n_seqs, n_steps=n_steps, lr=0.001, cuda=False, print_every=10)

Epoch: 1/100... Step: 10... Loss: 3.8194... Val Loss: 3.7709
Epoch: 2/100... Step: 20... Loss: 3.5944... Val Loss: 3.5870
Epoch: 3/100... Step: 30... Loss: 3.5214... Val Loss: 3.5043
Epoch: 4/100... Step: 40... Loss: 3.4732... Val Loss: 3.4518
Epoch: 5/100... Step: 50... Loss: 3.4542... Val Loss: 3.4238
Epoch: 5/100... Step: 60... Loss: 3.4048... Val Loss: 3.3944
Epoch: 6/100... Step: 70... Loss: 3.3791... Val Loss: 3.3579
Epoch: 7/100... Step: 80... Loss: 3.3432... Val Loss: 3.3380
Epoch: 8/100... Step: 90... Loss: 3.3387... Val Loss: 3.3149
Epoch: 9/100... Step: 100... Loss: 3.3036... Val Loss: 3.2840
Epoch: 10/100... Step: 110... Loss: 3.2876... Val Loss: 3.2427
Epoch: 10/100... Step: 120... Loss: 3.2142... Val Loss: 3.1902
Epoch: 11/100... Step: 130... Loss: 3.1418... Val Loss: 3.1299
Epoch: 12/100... Step: 140... Loss: 3.0859... Val Loss: 3.0826
Epoch: 13/100... Step: 150... Loss: 3.0546... Val Loss: 3.0429
Epoch: 14/100... Step: 160... Loss: 3.0124... Val Loss: 2.9970
Epoch: 15/1

## Hyperparameters

Here are the hyperparameters for the network.

In defining the model:
* `n_hidden` - The number of units in the hidden layers.
* `n_layers` - Number of hidden LSTM layers to use.

We assume that dropout probability and learning rate will be kept at the default, in this example.

And in training:
* `n_seqs` - Number of sequences running through the network in one pass.
* `n_steps` - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.
* `lr` - Learning rate for training

Here's some good advice from Andrej Karpathy on training the network. I'm going to copy it in here for your benefit, but also link to [where it originally came from](https://github.com/karpathy/char-rnn#tips-and-tricks).

> ## Tips and Tricks

>### Monitoring Validation Loss vs. Training Loss
>If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:

> - If your training loss is much lower than validation loss then this means the network might be **overfitting**. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
> - If your training/validation loss are about equal then your model is **underfitting**. Increase the size of your model (either number of layers or the raw number of neurons per layer)

> ### Approximate number of parameters

> The two most important parameters that control the model are `n_hidden` and `n_layers`. I would advise that you always use `n_layers` of either 2/3. The `n_hidden` can be adjusted based on how much data you have. The two important quantities to keep track of here are:

> - The number of parameters in your model. This is printed when you start training.
> - The size of your dataset. 1MB file is approximately 1 million characters.

>These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:

> - I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make `n_hidden` larger.
> - I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.

> ### Best models strategy

>The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.

>It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.

>By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.

In [10]:
# save the model
model_name = 'cotw_rnn_v25.net'

checkpoint = {'n_hidden': net.n_hidden,
              'n_layers': net.n_layers,
              'state_dict': net.state_dict(),
              'tokens': net.chars}

with open('models/'+model_name, 'wb') as f:
    torch.save(checkpoint, f)

In [11]:
sum(p.numel() for p in net.parameters())

166862