# Char-RNN

Char-RNN implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. This network is first posted by Andrej Karpathy, you can find out about his original code on https://github.com/karpathy/char-rnn, the original code is written in *lua*.

Here we will implement Char-RNN using Tensorflow!

In [1]:
import time
import numpy as np
import tensorflow as tf

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
%load_ext autoreload
%autoreload 2

## Part 1: Setup
In this part, we will read the data of our input text and process the text for later network training. There are two txt files in the data folder, for computing time consideration, we will use tinyshakespeare.txt here.

In [2]:
with open('data/tinyshakespeare.txt', 'r') as f:
    text=f.read()
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# and let's get a glance of what the text is
print(text[:500])

Length of text: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [3]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters


In [4]:
# Creating a mapping from unique characters to indices
vocab_to_ind = {c: i for i, c in enumerate(vocab)}
ind_to_vocab = dict(enumerate(vocab))
text_as_int = np.array([vocab_to_ind[c] for c in text], dtype=np.int32)

# We mapped the character as indexes from 0 to len(vocab)
for char,_ in zip(vocab_to_ind, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), vocab_to_ind[char]))
# Show how the first 10 characters from the text are mapped to integers
print ('{} --- characters mapped to int --- > {}'.format(text[:10], text_as_int[:10]))

'\n'   --->    0
' '    --->    1
'!'    --->    2
'$'    --->    3
'&'    --->    4
"'"    --->    5
','    --->    6
'-'    --->    7
'.'    --->    8
'3'    --->    9
':'    --->   10
';'    --->   11
'?'    --->   12
'A'    --->   13
'B'    --->   14
'C'    --->   15
'D'    --->   16
'E'    --->   17
'F'    --->   18
'G'    --->   19
First Citi --- characters mapped to int --- > [18 47 56 57 58  1 15 47 58 47]


## Part 2: Creating batches
Now that we have preprocessed our input data, we then need to partition our data, here we will use mini-batches to train our model, so how will we define our batches?

Let's first clarify the concepts of batches:
1. **batch_size**: Reviewing batches in CNN, if we have 100 samples and we set batch_size as 10, it means that we will send 10 samples to the network at one time. In RNN, batch_size have the same meaning, it defines how many samples we send to the network at one time.
2. **sequence_length**: However, as for RNN, we store memory in our cells, we pass the information through cells, so we have this sequence_length concept, which also called 'steps', it defines how long a sequence is.

From above two concepts, we here clarify the meaning of batch_size in RNN. Here, we define the number of sequences in a batch as N and the length of each sequence as M, so batch_size in RNN **still** represent the number of sequences in a batch but the data size of a batch is actually an array of size **[N, M]**.

<span style="color:red">TODO:</span>
finish the get_batches() function below to generate mini-batches.

Hint: this function defines a generator, use *yield*.

In [5]:
import random
def get_batches(array, n_seqs, n_steps):
    '''
    Partition data array into mini-batches
    input:
    array: input data
    n_seqs: number of sequences in a batch
    n_steps: length of each sequence
    output:
    x: inputs
    y: targets, which is x with one position shift
       you can check the following figure to get the sence of what a target looks like
    '''
    # You should now create a loop to generate batches for inputs and targets
    #############################################
    #           TODO: YOUR CODE HERE            #
    #############################################
    while True:
        n_seqs = n_seqs
        n_steps = n_steps
        X_batch = []
        y_batch = []
        array_re=array.reshape(1,array.shape[0])
        for i in range(0,n_seqs):
            start =  int(random.uniform(0, array.shape[0]-n_steps))
            end = int(start+n_steps)
            X_batch.append(array_re[0,start:end]) 
            y_batch.append(array_re[0,start+1:end+1])
        X_batch = np.asarray(X_batch)
        y_batch = np.asarray(y_batch)
        yield X_batch, y_batch

In [6]:
batches = get_batches(text_as_int, 10, 10)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[44  1 46 47 57  1 54 56 47 52]
 [30 37  1 34 21 10  0  0 37 27]
 [53 58 46 43 56  1 52 53 58  1]
 [47 52  1 42 56 43 45 57  1 53]
 [61  8  1 35 43  1 46 39 60 43]
 [47 52 45  1 46 43 56  1 50 43]
 [51 39 52  5 57  0 39 54 54 39]
 [47 50 50  1 63 53 59  1 45 47]
 [ 1 44 39 58 46 43 56 12  0  0]
 [46 53 57 43  1 43 63 43 57  1]]

y
 [[ 1 46 47 57  1 54 56 47 52 41]
 [37  1 34 21 10  0  0 37 27 30]
 [58 46 43 56  1 52 53 58  1 39]
 [52  1 42 56 43 45 57  1 53 44]
 [ 8  1 35 43  1 46 39 60 43  1]
 [52 45  1 46 43 56  1 50 43 45]
 [39 52  5 57  0 39 54 54 39 56]
 [50 50  1 63 53 59  1 45 47 60]
 [44 39 58 46 43 56 12  0  0 22]
 [53 57 43  1 43 63 43 57  1 57]]


## Part 3: Build Char-RNN model
In this section, we will build our char-rnn model, it consists of input layer, rnn_cell layer, output layer, loss and optimizer, we will build them one by one.

The goal is to predict new text after given prime word, so for our training data, we have to define inputs and targets, here is a figure that explains the structure of the Char-RNN network.

![structure](img/charrnn.jpg)

<span style="color:red">TODO:</span>
finish all TODOs in ecbm4040.CharRNN and the blanks in the following cells.

**Note: The training process on following settings of parameters takes about 20 minutes on a GTX 1070 GPU, so you are suggested to use GCP for this task.**

In [6]:
from ecbm4040.CharRNN import *

### Training
Set sampling as False(default), we can start training the network, we automatically save checkpoints in the folder /checkpoints.

In [8]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.006    # Learning rate

In [9]:
model = CharRNN(num_classes=len(vocab), batch_size=batch_size, num_steps=num_steps, cell_type='GRU',
                 rnn_size=rnn_size, num_layers=num_layers, learning_rate=learning_rate, 
                 grad_clip=5, train_keep_prob=0.75, sampling=False)
text_as_int_cropped = text_as_int[:150000,]
batches = get_batches(text_as_int_cropped, batch_size, num_steps)
model.train(batches,6000 , 1000)

step: 200  loss: 2.5713  0.2236 sec/batch
step: 400  loss: 1.9750  0.2216 sec/batch
step: 600  loss: 1.7331  0.2266 sec/batch
step: 800  loss: 1.5270  0.2256 sec/batch
step: 1000  loss: 1.4050  0.2292 sec/batch
step: 1200  loss: 1.3278  0.2325 sec/batch
step: 1400  loss: 1.2534  0.2322 sec/batch
step: 1600  loss: 1.2110  0.2274 sec/batch
step: 1800  loss: 1.1421  0.2441 sec/batch
step: 2000  loss: 1.1154  0.2292 sec/batch
step: 2200  loss: 1.0896  0.2256 sec/batch
step: 2400  loss: 1.0586  0.2201 sec/batch
step: 2600  loss: 0.9941  0.2407 sec/batch
step: 2800  loss: 1.0151  0.2326 sec/batch
step: 3000  loss: 0.9704  0.2287 sec/batch
step: 3200  loss: 0.9716  0.2292 sec/batch
step: 3400  loss: 0.9545  0.2346 sec/batch
step: 3600  loss: 0.9270  0.2291 sec/batch
step: 3800  loss: 0.8783  0.2303 sec/batch
step: 4000  loss: 0.8999  0.2296 sec/batch
step: 4200  loss: 0.8528  0.2321 sec/batch
step: 4400  loss: 0.8589  0.2316 sec/batch
step: 4600  loss: 0.8594  0.2326 sec/batch
step: 4800  los

In [10]:
# look up checkpoints
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints\\i6000_GRU_l256.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2000_GRU_l256.ckpt"
all_model_checkpoint_paths: "checkpoints\\i3000_GRU_l256.ckpt"
all_model_checkpoint_paths: "checkpoints\\i4000_GRU_l256.ckpt"
all_model_checkpoint_paths: "checkpoints\\i5000_GRU_l256.ckpt"
all_model_checkpoint_paths: "checkpoints\\i6000_GRU_l256.ckpt"

### Sampling
Set the sampling as True and we can generate new characters one by one. We can use our saved checkpoints to see how the network learned gradually.

In [11]:
model = CharRNN(num_classes=len(vocab), batch_size=batch_size, num_steps=num_steps, cell_type='GRU',
                 rnn_size=rnn_size, num_layers=num_layers, learning_rate=learning_rate, 
                 grad_clip=5, train_keep_prob=0.5, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="Lord ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints\i6000_GRU_l256.ckpt
Lord disposed
And carms, alreadven first, we sto, 'em such a part--and
dissching but the mighlood, and with
most posuch, no.
I must do hast not that I am clount,
Which should part him to pack the good rouf the world
Will not tave him be abong. This call'd mine in's; woulds but but
And was a such a bastwn and thus which
In alble than a crack'd thus-- he, who, my note, if they
Upon of these warn.

SICINIUS:
Why, thou hast alngething of honour, and so show not meddle; sithe,
In banish'd for place, as he was enemy in, he was
to call'd to his charge: and his own part,
That flatter, and the people bid with good loves,
He wants not care what cannot speak.
Which has charge this ance?

SICINIUS:
Why, thou art leadcy,
Pray you, come of, and my soldiing his choldeven,
And mage have been best have their noble consul!

CORIOLANUS:
To, I could for weal on eight, ade: who shall lear your harm'd
In a mage of it. What's to my wit

In [12]:
# choose a checkpoint other than the final one and see the results. It could be nasty, don't worry!
#############################################
#           TODO: YOUR CODE HERE            #
#############################################
checkpoint = ('checkpoints\i2000_GRU_l256.ckpt')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints\i2000_GRU_l256.ckpt
LORD QZKKKKZZKNKZUZZUZKKKZKKU&K&KKZZKNKKKKKKKKKZZKKKKKKKZKZKKKUKUKNKZKUKKU&K&K$KZKKUZKKZKZU&KKKKKKKZZKDKZKKKUKKKKKKZKUZZUKKUKKKDKKKNKKKKKKKKKZKKKUKKNZKKKKUKKUKZKZDKKZKKKKKKKKKKKKUZKUKKKKKKU3KKKKKZZZKKZKNZKKNKUKKKKZKKKKKKKKUZKUKKUKKUKKKKKKKKKUKU&ZUZKKZKKNKKKN&KKNKKKDUKUKKN&KKKKKKDZKUNKKKKKNKKUKKKKK&KKKZKZKZU&ZKUKKZDKKNKKUKKNKKNK&KKZKNKKKK&KKKKKKZN&KKZKKUZKKKZKKKKKKKKKKKNKNKKKKKNKKNKKKNKKKKKKKKKNKKKKKUKKDKKNKN&KKKKU3KKKKKKKKKKKKKKUKKNKKKDZKKNKKKNKNKU&XKZKKUK&KKUKKKNKZKKKNZKKKKKUKZKKKZ&KN&KKKKKKKKKNKKNKKZKKUKZKKKK&KUZKUKUKKKKZKKKKKKKKKK&ZKKKK&KKDKKKKKKKKKNZKKKUKKDKKKKZKKUZKNKKZKKKK&ZKUKKKNKKKKKZDKKKKKNKKKNKDKN3KK&KKKKKNZKKUKKKZKKUKZUKKUKZNKKNKKKUKKKKKKN&ZKKKK&ZKKDKKKKKKUKUKNKKKKKKKNZKKNKKKKKUKKPNKZNKZKKKKKKNKZNKKUKKKKKKKNZKKDKKKKKKKKU&KUKKKKKDZKKUKKKKKKZKKNZKKKKKKKKKKKKUKZKDDKKKKNKKNKKKDKKKDZKKK&KKNKZKUKZNKUKKKKZDKKKKKKKZKKDKKKZKKKKKKKKKKU&ZKKKDKDKDKZKKDKZKKKZKZKKZKZZZZZKKKKKKZKKUKKNK&KPZKKKKKKKUKKUKKZKKU$KKUKZKKDNK

### Change another type of RNN cell
We are using LSTM cell as the original work, but GRU cell is getting more popular today, let's chage the cell in rnn_cell layer to GRU cell and see how it performs. Your number of step should be the same as above.

**Note: You need to change your saved checkpoints' name or they will rewrite the LSTM results that you have already saved.**

In [13]:
model = CharRNN(num_classes=len(vocab), batch_size=batch_size, num_steps=num_steps, cell_type='LSTM',
                 rnn_size=rnn_size, num_layers=num_layers, learning_rate=learning_rate, 
                 grad_clip=5, train_keep_prob=0.6, sampling=False)
text_as_int_cropped = text_as_int
batches = get_batches(text_as_int_cropped, batch_size, num_steps)
model.train(batches,6000 , 1000)

step: 200  loss: 2.5041  0.2176 sec/batch
step: 400  loss: 2.1127  0.2216 sec/batch
step: 600  loss: 1.9410  0.2186 sec/batch
step: 800  loss: 1.8380  0.2301 sec/batch
step: 1000  loss: 1.7607  0.2326 sec/batch
step: 1200  loss: 1.6997  0.2279 sec/batch
step: 1400  loss: 1.7003  0.2265 sec/batch
step: 1600  loss: 1.6225  0.2380 sec/batch
step: 1800  loss: 1.6203  0.2276 sec/batch
step: 2000  loss: 1.5745  0.2274 sec/batch
step: 2200  loss: 1.5443  0.2251 sec/batch
step: 2400  loss: 1.5936  0.2277 sec/batch
step: 2600  loss: 1.5504  0.2325 sec/batch
step: 2800  loss: 1.5406  0.2163 sec/batch
step: 3000  loss: 1.5324  0.2291 sec/batch
step: 3200  loss: 1.4848  0.2232 sec/batch
step: 3400  loss: 1.5186  0.2276 sec/batch
step: 3600  loss: 1.4813  0.2301 sec/batch
step: 3800  loss: 1.4859  0.2297 sec/batch
step: 4000  loss: 1.4651  0.2254 sec/batch
step: 4200  loss: 1.5150  0.2241 sec/batch
step: 4400  loss: 1.4410  0.2229 sec/batch
step: 4600  loss: 1.4781  0.2265 sec/batch
step: 4800  los

In [14]:
model = CharRNN(num_classes=len(vocab), batch_size=batch_size, num_steps=num_steps, cell_type='LSTM',
                 rnn_size=rnn_size, num_layers=num_layers, learning_rate=learning_rate, 
                 grad_clip=5, train_keep_prob=0.5, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="Lord ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints\i6000_LSTM_l256.ckpt
Lord a danger way,
And honourable starvers as you seem to,
I'll not be bridge, to the craw of this dead,
Worse of his brought.
I did not have to that when I will have a servant that I
have to be some on my son as they have been
Within the present sovere thousand mind,
To hear me to the sours;
The stoop of the woel strew the window and those that stays
Of a these head of a charm, the people
Of myself anothort with a man of all,
A prince, things but a master of you and brought the cursed thee
Which time as I would bear to me as thou,
And this which hath not hear'st the triand of thee,
That this world will be an each of the call.

CAPULET:
Ay, breathed me but stating where the calms of the peace that he had basily,
With standing of a cold of the creature with the commonier shall be breath of all.

CAPULET:
He shall be thyself too merting is a fearful arms
And by the wisdient sorrow and things stands.

GLOUCESTER:
A