# Task 2: Char-RNN

https://github.com/karpathy/char-rnn, from Andrej Kaparthy

In [1]:
import time
import numpy as np
import tensorflow as tf

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
%load_ext autoreload
%autoreload 2

## Part 1: Setup

In [2]:
# import text
with open('data/tinyshakespeare.txt', 'r') as f:
    text=f.read()
    
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))

# and let's get a glance of what the text is
print(text[:500])

Length of text: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [3]:
# The unique characters in the file. We use this to build our encoding for the neural network
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters


In [4]:
# Creating a mapping from unique characters to indices
vocab_to_ind = {c: i for i, c in enumerate(vocab)}
ind_to_vocab = dict(enumerate(vocab))
text_as_int = np.array([vocab_to_ind[c] for c in text], dtype=np.int32)

# We mapped the character as indexes from 0 to len(vocab)
for char,_ in zip(vocab_to_ind, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), vocab_to_ind[char]))
# Show how the first 10 characters from the text are mapped to integers
print ('{} --- characters mapped to int --- > {}'.format(text[:10], text_as_int[:10]))

'\n'   --->    0
' '    --->    1
'!'    --->    2
'$'    --->    3
'&'    --->    4
"'"    --->    5
','    --->    6
'-'    --->    7
'.'    --->    8
'3'    --->    9
':'    --->   10
';'    --->   11
'?'    --->   12
'A'    --->   13
'B'    --->   14
'C'    --->   15
'D'    --->   16
'E'    --->   17
'F'    --->   18
'G'    --->   19
First Citi --- characters mapped to int --- > [18 47 56 57 58  1 15 47 58 47]


## Part 2: Creating batches
Make generator to yield training batches

Let's first clarify the concepts of batches:
1. **batch_size**: Reviewing batches in CNN, if we have 100 samples and we set batch_size as 10, it means that we will send 10 samples to the network at one time. In RNN, batch_size have the same meaning, it defines how many samples we send to the network at one time.
2. **sequence_length**: However, as for RNN, we store memory in our cells, we pass the information through cells, so we have this sequence_length concept, which also called 'steps', it defines how long a sequence is.

From above two concepts, we here clarify the meaning of batch_size in RNN. Here, we define the number of sequences in a batch as N and the length of each sequence as M, so batch_size in RNN **still** represent the number of sequences in a batch but the data size of a batch is actually an array of size **[N, M]**.

In [43]:
# Generates mini batches
def get_batches(array, n_seqs, n_steps):
    '''
    Partition data array into mini-batches
    input: text array
    array: input data
    n_seqs: number of sequences in a batch
    n_steps: length of each sequence
    output:
    x: inputs
    y: targets, which is x with one position shift
       you can check the following figure to get the sence of what a target looks like
    '''
    batch_size = n_seqs * n_steps
    n_batches = int(len(array) / batch_size)
    # we only keep the full batches and ignore the left.
    array = array[:batch_size * n_batches]
    array = array.reshape((n_seqs, -1))
    
    #target array made
    target = np.roll(array, shift = -1)
    
    batch_count = 0
    
    #print(batch_size, n_batches, array.shape)
    while True:
        if(batch_count < n_batches):
            #print(batch_count)
            #print(batch_size)
            
            #yield array[batch_count * batch_size: (batch_count + 1) * batch_size], target[batch_count : batch_count + 1]
            #mprint(array.shape)
            yield (array[:,batch_count * n_steps : (batch_count +1) * n_steps], 
                   target[:,batch_count * n_steps : (batch_count +1) * n_steps]) # yield new batch
            batch_count += 1
        else:
            batch_count = 0
        
     
    

Sanity check that words are fed, 10 characters per line, ten lines each. And that y is one step ahead of x

In [45]:
batch = get_batches(text_as_int, 100, 100)
x, y = next(batch)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[18 47 56 57 58  1 15 47 58 47]
 [53  6  1 15 39 47 59 57  1 25]
 [41 47 59 57 12  0  0 13 50 50]
 [43 56  1 53 44  1 51 63  1 57]
 [ 1 57 51 53 58 46 43 56  5 42]
 [52  1 41 39 50 50 43 42  1 57]
 [ 1 39 52 57 61 43 56  0 32 46]
 [61 52  1 61 47 58 46  1 46 47]
 [58 53  1 56 59 47 52 11  1 50]
 [ 0 35 39 57  1 52 53 58  1 39]]

y
 [[47 56 57 58  1 15 47 58 47 64]
 [ 6  1 15 39 47 59 57  1 25 39]
 [47 59 57 12  0  0 13 50 50 10]
 [56  1 53 44  1 51 63  1 57 53]
 [57 51 53 58 46 43 56  5 42  1]
 [ 1 41 39 50 50 43 42  1 57 53]
 [39 52 57 61 43 56  0 32 46 43]
 [52  1 61 47 58 46  1 46 47 51]
 [53  1 56 59 47 52 11  1 50 43]
 [35 39 57  1 52 53 58  1 39  1]]


## Part 3: Build Char-RNN model
In this section, we will build our char-rnn model, it consists of input layer, rnn_cell layer, output layer, loss and optimizer, we will build them one by one.

The goal is to predict new text after given prime word, so for our training data, we have to define inputs and targets, here is a figure that explains the structure of the Char-RNN network.

![structure](img/charrnn.jpg)

import CharRNN files which contains classes for training and running Recurrent Neural Network

In [46]:
from ecbm4040.CharRNN import *

### Training
Set sampling as False(default), we can start training the network, we automatically save checkpoints in the folder /checkpoints.

In [47]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

In [49]:
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches = batches, max_count =6000 , save_every_n = 2000)

step: 200  loss: 2.2184  0.2252 sec/batch
step: 400  loss: 1.9007  0.2337 sec/batch
step: 600  loss: 1.7328  0.2348 sec/batch
step: 800  loss: 1.6833  0.2295 sec/batch
step: 1000  loss: 1.7086  0.2309 sec/batch
step: 1200  loss: 1.5561  0.2286 sec/batch
step: 1400  loss: 1.5248  0.2322 sec/batch
step: 1600  loss: 1.4901  0.2285 sec/batch
step: 1800  loss: 1.4342  0.2353 sec/batch
step: 2000  loss: 1.4613  0.2281 sec/batch
step: 2200  loss: 1.4382  0.2351 sec/batch
step: 2400  loss: 1.4122  0.2289 sec/batch
step: 2600  loss: 1.4267  0.2340 sec/batch
step: 2800  loss: 1.3927  0.2330 sec/batch
step: 3000  loss: 1.3689  0.2296 sec/batch
step: 3200  loss: 1.4092  0.2293 sec/batch
step: 3400  loss: 1.3497  0.2384 sec/batch
step: 3600  loss: 1.3855  0.2310 sec/batch
step: 3800  loss: 1.3625  0.2305 sec/batch
step: 4000  loss: 1.3653  0.2350 sec/batch
step: 4200  loss: 1.3310  0.2332 sec/batch
step: 4400  loss: 1.3287  0.2258 sec/batch
step: 4600  loss: 1.3292  0.2346 sec/batch
step: 4800  los

In [50]:
# look up checkpoints
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/i6000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i4000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i6000_l256.ckpt"

### Sampling
Set the sampling as True and we can generate new characters one by one. We can use our saved checkpoints to see how the network learned gradually.

In [55]:
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate, sampling=True)

# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

[24 27 30 16  1]
INFO:tensorflow:Restoring parameters from checkpoints/i6000_l256.ckpt
24
27
30
16
1
LORD LEDd RAY:
The mine, whose manners would not have the cleacus.
I have not shed her what, stroig old. War, womberise!

LIONES:
To-die, sir, and your good brother,
And we are welting anoury'd sorrow so,
And he the whict the souse of the markelsher and
Wilhink them as, and talk of heavens and well;
But I will sorch her the confursorting
For that some such a warsing breels fair his
And that the hable worss buckle, and should welcome,
Would they selon son hath and the stir age wish
To pale to hear to him their plantly, womer
Is that we warr'd with my made flesh in service
To honour the bright thoughty in the strength
As to cart thither shall be such tongue.

Secord Murderer:
I am any trive: both me again this heart;
I'll bear me as a time is here and that the
servantiss to the ciden there.

CAMILLO:
What then there is my hard as armine that, that
instrument have their that teld you well,

In [56]:
# choose a checkpoint other than the final one and see the results. It could be nasty, don't worry!
#############################################
#           TODO: YOUR CODE HERE            #
#############################################

samp = model.sample("checkpoints/i2000_l256.ckpt", 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

[24 27 30 16  1]
INFO:tensorflow:Restoring parameters from checkpoints/i2000_l256.ckpt
24
27
30
16
1
LORD YOLNE:
What's thought now think that wall, I more that he
shold thy prince shall be behied thither, that I wall shake
to
do the folior to your sighn there.

PAULINA:
This hand than that I stay of me, stringsh thou,
And mon a suck our senses of this blover,
To bud her ttander, and a potth, on holy
Would show myself that sermors, and sale,
I wI will to his son and so thou ant
Twuily the stack and thy person she say is
seem off will never mesp this polembent.

KING RICHARD II:
As shall speak in his face of his heart
With him and be a man as sullinger to made my, tears,
And stoul a body of the steel of more.

KING EDWARDI I:
Tenderher best he shall bound mishly sund that
To be to hit three talk titn as this sen
Where thou wast think the tinder to the pale.

LEONTES:
Tronoor, and then is hath seel to him.

Provost:

POTIXANES:
Thou shaltst not, but the sair.

Shepherd:
A strove son my,


### Change another type of RNN cell
We are using LSTM cell as the original work, but GRU cell is getting more popular today, let's chage the cell in rnn_cell layer to GRU cell and see how it performs. Your number of step should be the same as above.

**Note: You need to change your saved checkpoints' name or they will rewrite the LSTM results that you have already saved.**

In [57]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

step: 200  loss: 1.9804  0.2098 sec/batch
step: 400  loss: 1.7614  0.2145 sec/batch
step: 600  loss: 1.6194  0.2038 sec/batch
step: 800  loss: 1.5851  0.2152 sec/batch
step: 1000  loss: 1.6259  0.2144 sec/batch
step: 1200  loss: 1.4798  0.2175 sec/batch
step: 1400  loss: 1.4599  0.2153 sec/batch
step: 1600  loss: 1.4197  0.2084 sec/batch
step: 1800  loss: 1.3944  0.2111 sec/batch
step: 2000  loss: 1.4177  0.2140 sec/batch
step: 2200  loss: 1.4151  0.2123 sec/batch
step: 2400  loss: 1.3710  0.2182 sec/batch
step: 2600  loss: 1.4120  0.2126 sec/batch
step: 2800  loss: 1.3581  0.2099 sec/batch
step: 3000  loss: 1.3346  0.2169 sec/batch
step: 3200  loss: 1.3833  0.2077 sec/batch
step: 3400  loss: 1.3252  0.2092 sec/batch
step: 3600  loss: 1.3643  0.2112 sec/batch
step: 3800  loss: 1.3473  0.2184 sec/batch
step: 4000  loss: 1.3351  0.2086 sec/batch
step: 4200  loss: 1.3222  0.2181 sec/batch
step: 4400  loss: 1.3226  0.2112 sec/batch
step: 4600  loss: 1.3323  0.2094 sec/batch
step: 4800  los

In [58]:
model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

[24 27 30 16  1]
INFO:tensorflow:Restoring parameters from checkpoints/i6000_l256.ckpt
24
27
30
16
1
LORD SEAATPAY:
The shape, that I must persomed; to be moded
A shape whilst yet on thee,, be so like you,
That thrusted a wild servant and his sighs.

BENVOLIO:
Then how now, and a maid to the fiends;
To thy heirs but a maidens that your highness
Is alwhours being an angel to his:
Have those all so sorrow with me that:
And how to think you would be so do it,
And their presomed, tell what, to mear thee hate,
To them be trauss in altermed and treasand,
And so betwixt thee, but against the deam,
I have allians to think and shall they breatted:
I'll be a foul-sound marriage of your bort;
That was this will I hope to save his bug?
Alack, and the power, teach my trumpets,
With troop theict that that the strike that third
That you go not to-morrow or a grace,
To blend my stiltune for the father
She will bring us at all man through them wondred.

HORTENSIO:
I will peace tale her to his; and I do

#### Questions
1. Compare your result of two networks that you built and the reasons that caused the difference. (It is a qualitative comparison, it should be based on the specific model that you build.)
2. Discuss the difference between LSTM cells and GRU cells, what are the pros and cons of using GRU cells?

Answer:
1. The GRU cell uses only to two gates, reset and update. Since it doesn't use memory units it trains faster (converges faster to lower loss) and is simpler to build/tweak compared to LSTM. 
2. GRU doesn't use memory cells, which means they aren't as good at using longer sequence information in theory. However, they are able to train faster and learn better on less training data, while being simpler to tune