<a href="https://colab.research.google.com/github/kylemcdonald/ml-examples/blob/master/workshop/char_rnn/Char-RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Char RNN

This is a short introduction to char-rnn as introduced by Andrej Karpathy in his [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).

In [0]:
# next step: move to tensorflow char-rnn
# !git clone https://github.com/sherjilozair/char-rnn-tensorflow.git
# !python char-rnn-tensorflow/train.py --data_dir=char-rnn-tensorflow/data/tinyshakespeare/

In [0]:
lfs=require 'lfs'
lfs.chdir("/root/shared/ml-examples/karpathy/char-rnn")
data_root = "/root/shared/ml-examples/karpathy/char-rnn/data/tinyshakespeare"
pretrained_root = "/root/shared/ml-examples/models/char-rnn"

## Sampling

First, let's generate text in a similar style to Shakespeare from a pretrained model.

You can change any of the following parameters, try various combinations. Karpathy has suggestions for the different parameters [here](https://github.com/karpathy/char-rnn#approximate-number-of-parameters).

```
Options
  <model>      model checkpoint to use for sampling
  -seed        random number generator's seed [123]
  -sample       0 to use max at each timestep, 1 to sample at each timestep [1]
  -primetext   used as a prompt to "seed" the state of the LSTM using a given sequence, before we sample. []
  -length      number of characters to sample [2000]
  -temperature temperature of sampling [1]
  -gpuid       which gpu to use. -1 = use CPU [0]
  -opencl      use OpenCL (instead of CUDA) [0]
  -verbose     set to 0 to ONLY print the sampled text, no diagnostics [1]
```

In [0]:
model = pretrained_root.."/tinyshakespeare.t7"
seed = 123
sample = 1
prime_text = "the meaning of life is "
length = 500
temperature = 1

os.execute(
    "th sample.lua "..model..
    " -seed "..seed..
    " -sample "..sample..
    " -primetext \""..prime_text.."\""..
    " -length "..length..
    " -temperature "..temperature..
    " -gpuid -1"
)

creating an lstm...	
seeding with the meaning of life is 	
--------------------------	


the meaning of life is ready,
The sweet bears, be so. Peace that thou hast had fusthes
Unsaid with which becomes a liver to know
Than heaven her mourney and gentle my lost
Of orcagor, for flesh she hath sorchant'd.

GLOUCESTER:
Fater dooply gone! Sir Kate, in twoshed langus.
I, that we must: I kill you allease,
Thou hast done thy gain, like deposed.
But fray thou not this blessed?

COMINIUS:
'Tis none, both Grumio:
I craight you, take our person.

BISHOP OF GABnn:
Why, then he dead, 'tis a gettingabitors
And my head o


CharRNN outputs the probability what character comes next at each iteration.
When `-sample` is set to 1, it samples from probability distribution which CharRNN outputs.

Otherwise, if `-sample` is set to 0, it tends to generate such repeated text as you can see:

In [0]:
sample = 0

os.execute(
    "th sample.lua "..model..
    " -seed "..seed..
    " -sample "..sample..
    " -primetext \""..prime_text.."\""..
    " -length "..length..
    " -temperature "..temperature..
    " -gpuid -1"
)

creating an lstm...	


seeding with the meaning of life is 	
--------------------------	


the meaning of life is strange,
And there the greatest conduct with thee
That with the wearest of the senate of the world,
That the more state of the house of the common made
That she shall stay the sea will be denied,
And the more sire of the princess of the world
Art thou no more still the sun of me,
That we will be a prison.

CAMILLO:
The sail, of this the senate of the world,
That the marriage of the senate of the world,
That the seat of the wearing of the head,
And then the senate of the strength and sweet,
And t


## Training

You can also train your own data.
If you would like to use your own data, then create a single file input.txt and place it into a folder such as `somefolder/input.txt` for example. 

Then you should specify: 

`-data_dir somefolder`


Following parameters are changable, minimum parameters you should specify are data_dir and checkpoint_dir.

```
Options
  -data_dir                  data directory. Should contain the file input.txt with input data [data/tinyshakespeare]
  -rnn_size                  size of LSTM internal state [128]
  -num_layers                number of layers in the LSTM [2]
  -model                     lstm,gru or rnn [lstm]
  -learning_rate             learning rate [0.002]
  -learning_rate_decay       learning rate decay [0.97]
  -learning_rate_decay_after in number of epochs, when to start decaying the learning rate [10]
  -decay_rate                decay rate for rmsprop [0.95]
  -dropout                   dropout for regularization, used after each RNN hidden layer. 0 = no dropout [0]
  -seq_length                number of timesteps to unroll for [50]
  -batch_size                number of sequences to train on in parallel [50]
  -max_epochs                number of full passes through the training data [50]
  -grad_clip                 clip gradients at this value [5]
  -train_frac                fraction of data that goes into train set [0.95]
  -val_frac                  fraction of data that goes into validation set [0.05]
  -init_from                 initialize network parameters from checkpoint at this path []
  -seed                      torch manual random number generator seed [123]
  -print_every               how many steps/minibatches between printing out the loss [1]
  -eval_val_every            every how many iterations should we evaluate on validation data? [1000]
  -checkpoint_dir            output directory where checkpoints get written [cv]
  -savefile                  filename to autosave the checkpont to. Will be inside checkpoint_dir/ [lstm]
  -accurate_gpu_timing       set this flag to 1 to get precise timings when using GPU. Might make code bit slower but reports accurate timings. [0]
  -gpuid                     which gpu to use. -1 = use CPU [0]
  -opencl                    use OpenCL (instead of CUDA) [0]
```

In [0]:
data_dir = "/root/shared/ml-examples/karpathy/char-rnn/data/tinyshakespeare"
os.execute(
    "th train.lua "..
    " -data_dir \""..data_dir.."\""..
    " -checkpoint_dir \""..data_dir.."\""..
    " -gpuid -1"
)

loading data files...	


cutting off end of data so that the batches/sequences divide evenly	
reshaping tensor...	


data load done. Number of data batches in train: 423, val: 23, test: 0	


vocab size: 65	


creating an lstm with 2 layers	


setting forget gate biases to 1 in LSTM layer 1	


setting forget gate biases to 1 in LSTM layer 2	
number of parameters in the model: 240321	
cloning rnn	


cloning criterion	


1/21150 (epoch 0.002), train_loss = 4.19803724, grad/param norm = 5.1721e-01, time/batch = 0.4729s	


2/21150 (epoch 0.005), train_loss = 3.93712133, grad/param norm = 1.4679e+00, time/batch = 0.5227s	


3/21150 (epoch 0.007), train_loss = 3.43764434, grad/param norm = 9.5800e-01, time/batch = 0.4477s	


4/21150 (epoch 0.009), train_loss = 3.41313742, grad/param norm = 7.5143e-01, time/batch = 0.4013s	


5/21150 (epoch 0.012), train_loss = 3.33707270, grad/param norm = 6.9269e-01, time/batch = 0.3920s	


6/21150 (epoch 0.014), train_loss = 3.37127145, grad/param norm = 5.2318e-01, time/batch = 0.3949s	


7/21150 (epoch 0.017), train_loss = 3.36724018, grad/param norm = 4.3217e-01, time/batch = 0.4494s	


8/21150 (epoch 0.019), train_loss = 3.33067083, grad/param norm = 3.9964e-01, time/batch = 0.4644s	


9/21150 (epoch 0.021), train_loss = 3.29356131, grad/param norm = 3.8693e-01, time/batch = 0.4116s	


10/21150 (epoch 0.024), train_loss = 3.38283139, grad/param norm = 3.5561e-01, time/batch = 0.4068s	


11/21150 (epoch 0.026), train_loss = 3.30195265, grad/param norm = 3.5806e-01, time/batch = 0.4057s	


12/21150 (epoch 0.028), train_loss = 3.32249605, grad/param norm = 2.7507e-01, time/batch = 0.3873s	


13/21150 (epoch 0.031), train_loss = 3.30913857, grad/param norm = 2.4440e-01, time/batch = 0.4009s	


14/21150 (epoch 0.033), train_loss = 3.28707813, grad/param norm = 3.4650e-01, time/batch = 0.4010s	


15/21150 (epoch 0.035), train_loss = 3.36023106, grad/param norm = 3.9640e-01, time/batch = 0.3969s	


16/21150 (epoch 0.038), train_loss = 3.33863527, grad/param norm = 3.4813e-01, time/batch = 0.3973s	


17/21150 (epoch 0.040), train_loss = 3.29905544, grad/param norm = 3.9863e-01, time/batch = 0.4140s	


18/21150 (epoch 0.043), train_loss = 3.31918335, grad/param norm = 2.5565e-01, time/batch = 0.4016s	
