# Task 2: Char-RNN

Char-RNN implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. This network is first posted by Andrej Karpathy, you can find out about his original code on https://github.com/karpathy/char-rnn, the original code is written in *lua*.

Here we will implement Char-RNN using Tensorflow!

In [2]:
import time
import numpy as np
import tensorflow as tf

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
%load_ext autoreload
%autoreload 2

## Part 1: Setup
In this part, we will read the data of our input text and process the text for later network training. There are two txt files in the data folder, for computing time consideration, we will use tinyshakespeare.txt here.

In [3]:
with open('data/tinyshakespeare.txt', 'r') as f:
    text=f.read()
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# and let's get a glance of what the text is
print(text[:500])

Length of text: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [4]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters


In [5]:
# Creating a mapping from unique characters to indices
vocab_to_ind = {c: i for i, c in enumerate(vocab)}
ind_to_vocab = dict(enumerate(vocab))
text_as_int = np.array([vocab_to_ind[c] for c in text], dtype=np.int32)

# We mapped the character as indexes from 0 to len(vocab)
for char,_ in zip(vocab_to_ind, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), vocab_to_ind[char]))
# Show how the first 10 characters from the text are mapped to integers
print ('{} --- characters mapped to int --- > {}'.format(text[:10], text_as_int[:10]))

'\n'   --->    0
' '    --->    1
'!'    --->    2
'$'    --->    3
'&'    --->    4
"'"    --->    5
','    --->    6
'-'    --->    7
'.'    --->    8
'3'    --->    9
':'    --->   10
';'    --->   11
'?'    --->   12
'A'    --->   13
'B'    --->   14
'C'    --->   15
'D'    --->   16
'E'    --->   17
'F'    --->   18
'G'    --->   19
First Citi --- characters mapped to int --- > [18 47 56 57 58  1 15 47 58 47]


## Part 2: Creating batches
Now that we have preprocessed our input data, we then need to partition our data, here we will use mini-batches to train our model, so how will we define our batches?

Let's first clarify the concepts of batches:
1. **batch_size**: Reviewing batches in CNN, if we have 100 samples and we set batch_size as 10, it means that we will send 10 samples to the network at one time. In RNN, batch_size have the same meaning, it defines how many samples we send to the network at one time.
2. **sequence_length**: However, as for RNN, we store memory in our cells, we pass the information through cells, so we have this sequence_length concept, which also called 'steps', it defines how long a sequence is.

From above two concepts, we here clarify the meaning of batch_size in RNN. Here, we define the number of sequences in a batch as N and the length of each sequence as M, so batch_size in RNN **still** represent the number of sequences in a batch but the data size of a batch is actually an array of size **[N, M]**.

<span style="color:red">TODO:</span>
finish the get_batches() function below to generate mini-batches.

Hint: this function defines a generator, use *yield*.

In [6]:
def get_batches(array, n_seqs, n_steps):
    '''
    Partition data array into mini-batches
    input:
    array: input data
    n_seqs: number of sequences in a batch
    n_steps: length of each sequence
    output:
    x: inputs
    y: targets, which is x with one position shift
       you can check the following figure to get the sence of what a target looks like
    '''
    batch_size = n_seqs * n_steps
    n_batches = int(len(array) / batch_size)
    # we only keep the full batches and ignore the left.
    array = array[:batch_size * n_batches]
    array = array.reshape((n_seqs, -1))
    
    # You should now create a loop to generate batches for inputs and targets
    #############################################
    #           TODO: YOUR CODE HERE            #
    #############################################
    count = 0
    while True:
        if (count < n_batches-1):
            index = count * n_steps
            count = count + 1
            index_next = count * n_steps
            yield array[:,index:index_next], array[:,index+1:index_next+1]
        else:
            count = 0

In [7]:
batches = get_batches(text_as_int, 10, 10)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[18 47 56 57 58  1 15 47 58 47]
 [ 1 43 52 43 51 63 11  0 37 43]
 [52 58 43 42  1 60 47 56 58 59]
 [56 44 53 50 49  6  0 27 52  1]
 [47 52  1 57 54 47 58 43  1 53]
 [56 57  6  1 39 52 42  1 57 58]
 [46 47 51  1 42 53 61 52  1 58]
 [ 1 40 43 43 52  1 57 47 52 41]
 [50 58 57  1 51 39 63  1 57 46]
 [57 47 53 52  1 53 44  1 56 43]]

y
 [[47 56 57 58  1 15 47 58 47 64]
 [43 52 43 51 63 11  0 37 43 58]
 [58 43 42  1 60 47 56 58 59 43]
 [44 53 50 49  6  0 27 52  1 54]
 [52  1 57 54 47 58 43  1 53 44]
 [57  6  1 39 52 42  1 57 58 39]
 [47 51  1 42 53 61 52  1 58 53]
 [40 43 43 52  1 57 47 52 41 43]
 [58 57  1 51 39 63  1 57 46 39]
 [47 53 52  1 53 44  1 56 43 60]]


## Part 3: Build Char-RNN model
In this section, we will build our char-rnn model, it consists of input layer, rnn_cell layer, output layer, loss and optimizer, we will build them one by one.

The goal is to predict new text after given prime word, so for our training data, we have to define inputs and targets, here is a figure that explains the structure of the Char-RNN network.

![structure](img/charrnn.jpg)

<span style="color:red">TODO:</span>
finish all TODOs in ecbm4040.CharRNN and the blanks in the following cells.

**Note: The training process on following settings of parameters takes about 20 minutes on a GTX 1070 GPU, so you are suggested to use GCP for this task.**

In [8]:
from ecbm4040.CharRNN import *

### Training
Set sampling as False(default), we can start training the network, we automatically save checkpoints in the folder /checkpoints.

In [9]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

In [11]:
model = CharRNN(len(vocab), batch_size, num_steps, 'LSTM', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

step: 200  loss: 2.1588  0.1017 sec/batch
step: 400  loss: 1.7505  0.0918 sec/batch
step: 600  loss: 1.6490  0.0928 sec/batch
step: 800  loss: 1.5370  0.0947 sec/batch
step: 1000  loss: 1.4581  0.0947 sec/batch
step: 1200  loss: 1.4119  0.0967 sec/batch
step: 1400  loss: 1.3556  0.0957 sec/batch
step: 1600  loss: 1.3140  0.0947 sec/batch
step: 1800  loss: 1.3287  0.0948 sec/batch
step: 2000  loss: 1.3004  0.0957 sec/batch
step: 2200  loss: 1.3125  0.0947 sec/batch
step: 2400  loss: 1.2702  0.0928 sec/batch
step: 2600  loss: 1.2601  0.0957 sec/batch
step: 2800  loss: 1.2962  0.0947 sec/batch
step: 3000  loss: 1.2568  0.0928 sec/batch
step: 3200  loss: 1.2473  0.0987 sec/batch
step: 3400  loss: 1.2212  0.0958 sec/batch
step: 3600  loss: 1.2015  0.0937 sec/batch
step: 3800  loss: 1.1923  0.0957 sec/batch
step: 4000  loss: 1.1978  0.0967 sec/batch
step: 4200  loss: 1.1839  0.0947 sec/batch
step: 4400  loss: 1.2163  0.0967 sec/batch
step: 4600  loss: 1.1727  0.0937 sec/batch
step: 4800  los

In [12]:
# look up checkpoints
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints\\i6000_l256_LSTM.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2000_l256_LSTM.ckpt"
all_model_checkpoint_paths: "checkpoints\\i4000_l256_LSTM.ckpt"
all_model_checkpoint_paths: "checkpoints\\i6000_l256_LSTM.ckpt"

### Sampling
Set the sampling as True and we can generate new characters one by one. We can use our saved checkpoints to see how the network learned gradually.

In [28]:
model = CharRNN(len(vocab), batch_size, num_steps,'LSTM', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints\i6000_l256_LSTM.ckpt
vQQfQ-Q
:f:
:fvfQ

--
?:ffhf:

Q::-Q:Q:
:--::fQQ-:QffQ
-?:Q--

?-Q
Qf:QQQ
?Q
-QfQfQ:f:
:Q-:fffhh:fv:-::
::QfQ:-
-??fQQ
-?-


:f:QQ-:fQQfQfhfh:Q:fhh:::-QQ:
---:-?--:QfQQ--::Q-::
--
:
Q-?--?fhfvQfvfQ:QQ
:
?:Q:fv-:fh
Q:QQQ:-:fhQ:
--
??--

:-Q:Q-Q::fv-
-:fv
Q:Q
-Q
---?QQQ:-???:Q-QQfQ:---:-
:
QQ:fv:fhhh::



???fh
::-?-QQf:Q
:-:Q-?:-:QfhQ-::
Q--

Q
Q
-:
Q
:--:fhff:Q--Q::
?-QQ
?::fhQQ
:ffQ:QfhhhfQfQfv:-
-Q:
Q:
-::-?f:
?:::fQ
:Q:fv
?fQQ:QQ
?:
--Q
Q-?-??fQQ:

Q:
?ffhf::-Q
:
-?Q:fhf:fv
-Q-
:Q:
?fQQ-::

:fhffhQ-Qfh:QQQQfQff:ffh
-:fh
:
:
Q

-QQ:ffhQ:
:Qf:-Q:Q
:-Q

:f::Qff:Qff:fvQ
QQ:f:
-:QQQ-:f:

-Q-??Q:QQ-Qffv::::-::fhfhh:fQ:
::-:::Qfh
Q::
::::
?-

:QQ-Q
-::
QQQffffh
-?f:Q-


?:-
Q


:QQQf:-Q
:Q--

---
-

QQ:f:::fhf:f:

Q:

-QQQQ:QfQQ-Q:Qfh

--
-QQ-Q
::::f:f:Q:-?fQ-:--::fv--Q
:ff:-?fvQ-?-:Q---?fh:Qffh
---:f:ffh:-:-
Q:::fv::
---
::

Q
??f:Q
??:-:Q:
-
Q-:Q
:QQQfhfQfQ-QQ
:
Q
?-::::fh:--QQ:
-QQQ-?ffh

--:
QfhhQ
QQ

Q:fvQfQfQQ

In [27]:
# choose a checkpoint other than the final one and see the results. It could be nasty, don't worry!
#############################################
#           TODO: YOUR CODE HERE            #
#############################################
checkpoint = 'checkpoints\i4000_l256_LSTM.ckpt'
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints\i4000_l256_LSTM.ckpt
Hnr
nkIV
kV

ddrkVdr
rkkIaUdr
rdknndnndrkkInIaddnrdddrddUk

r
nr
ddUVkIknkVdUndnkVnIk
ndnkV
nIIIVdUZnrddrk

r
knnnkIa
rrk
rdnndkIndrdkk
k
r
drk
nrdndrnIV
ndrnkIVkkk
rkkVkVdndndrnIV
rkknkVdddnnIIadnnnnndUV


k
rdUnkndrrnndrnknnrkVkVknrnk

k
rnnrnIkV
nkIVdUnIIIakkIkIV'UkIV'kIanrrnrndnnr

drrdkIVdrnr
nIV'ndknnkkVknr
dnkndnnIknkndUkVdUdnkV
knnIVkknndnIkVnrk

drndr
nk


nnIIannnknddUZUknrdUndk
r
dnrdrrndr
rrkVnnrnIndUkkV
kkk

ndkV'VnnnInr
nnrnkVkVdr
dndUndnrkkkkIkIIankkVnnrrnddddk

rndrrdndrdndnIanrddndUV
nrnnnknkndk



nIIVnIkIknIk

nr
kIVdUZnknkV
rrkIIkIIa

r
dkkIIak
dkVnkknIkIk
drdUdndkIa
rdrrnr
nIaUZUVnrnnkV
kIadrkIankInkkVkndrnkV'nnrndrrrdnndUdrnnnInknnIV
rnkndkIak
r
k
rkVddddnr
kVddddnIVnkV
nIk


dndrkIa
nnkVk
k
nknk
dUdUdkVdkIV'kInk
rrrnInrnrkk
nInk
k


nddUZkIadUkndnrnIV'dUk
kIaUZIVnr
kVknknddnIkkVnnIVdr
dUdr
nIa
ddUkV
r
rk
dr
nkVndkkInrddUnnrdnnkkV'dnnndnnrrknIadr
nInkIkV
k
kkVnIVk
rknIInIndkIIIankV'UdrrrnIn

### Change another type of RNN cell
We are using LSTM cell as the original work, but GRU cell is getting more popular today, let's chage the cell in rnn_cell layer to GRU cell and see how it performs. Your number of step should be the same as above.

**Note: You need to change your saved checkpoints' name or they will rewrite the LSTM results that you have already saved.**

In [28]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

step: 200  loss: 1.8857  0.0947 sec/batch
step: 400  loss: 1.5841  0.0947 sec/batch
step: 600  loss: 1.5242  0.0957 sec/batch
step: 800  loss: 1.4287  0.0947 sec/batch
step: 1000  loss: 1.3806  0.0957 sec/batch
step: 1200  loss: 1.3095  0.0947 sec/batch
step: 1400  loss: 1.2732  0.0957 sec/batch
step: 1600  loss: 1.2519  0.0947 sec/batch
step: 1800  loss: 1.2442  0.0977 sec/batch
step: 2000  loss: 1.2283  0.0928 sec/batch
step: 2200  loss: 1.2558  0.0937 sec/batch
step: 2400  loss: 1.2050  0.0967 sec/batch
step: 2600  loss: 1.1946  0.0977 sec/batch
step: 2800  loss: 1.2570  0.0967 sec/batch
step: 3000  loss: 1.2130  0.0957 sec/batch
step: 3200  loss: 1.2075  0.0947 sec/batch
step: 3400  loss: 1.1821  0.0937 sec/batch
step: 3600  loss: 1.1744  0.0957 sec/batch
step: 3800  loss: 1.1452  0.0957 sec/batch
step: 4000  loss: 1.1615  0.0967 sec/batch
step: 4200  loss: 1.1536  0.0967 sec/batch
step: 4400  loss: 1.1618  0.0937 sec/batch
step: 4600  loss: 1.1497  0.0987 sec/batch
step: 4800  los

In [33]:
model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints\i6000_l256_GRU.ckpt
3jmZjym
3Jc3Jjyy
mZjxJmZymJcJcJyJjyjmyJJmZy
y
ym

JJJcJjyyjJJmJJy
yJjJjjyy
my
Jy
yJcm
yJcJmmyJc
yy

3JJmJJcy
JcJcJjJmmymJjyyJmmZymymJJc3mJmmyJJjmZyy
yjyjmZmmZ
yyJm
3ymJJmmmm
3
m
3

myjJcyjyJyyyjyjx
m
ymy
3y
yJjxmyJJJmyyyJyyjjjm
3m
Jyyjm
JmZyjJJmJy
3ymZyyy
y
yjy

3mZmy
m
yyJyy
JmyjxjjJyyyjxVyJy
y
mZymy
3Jc3
3
3jy
JJyjxmy
3Jjm
yJmZ
JjxV
mJJjjy

myJJJyJJJmZmyJcmy
y
yJJcJJjmmyyJyjm
my
mJmJmmym
mJymJjjJJc3Jc
ymyyy
Jyjjmyjxjjyyjy

Jyjjjy

Jc3
3yjyjJmJmmJjm
mJjxm
Jjm
mJcJJcyjmymyJm

Jjy

JcymmZmmyjjmyjJm
mmZyJyJm

mm
yjmJmmmJmZmJc
3Jc
JJmZJcymZjmZmmyymmJym
yJyJmJjJjjjJjyymJcmZymZyjJjjxmJjxjm
3mmZmJc

3mmm
JyymJc

JJmyyJy

JyyJJjjmJjxVJyJyJcJc3jxVyyyJJyjym
yjyjyyjx
myJcmm
JjjmmJcy
3yjJcmyjjm

JjymyjxJcmm
3jyjxjxJjjmm
3JJmZ
mm
yjy
mZ
JmJymZjy
JcymZyyJmZyJJmyJcJc3JcmyyJjJmy
y
JmZmZJcyJy
Jc
mJyjmyJmyyyy
myyJcJyJyyJc3mJjjxjJm
3
mJy
Jc
yymJc3m
ymZm

ym
mZ

y
3
ym
mZmZymJjxmZ
3yJjy
3
yyymyyjmJjm
mm

3jJmZjmmJm

JJyJjJjymJJjjxV

#### Questions
1. Compare your result of two networks that you built and the reasons that caused the difference. (It is a qualitative comparison, it should be based on the specific model that you build.)
2. Discuss the difference between LSTM cells and GRU cells, what are the pros and cons of using GRU cells?

Answer:
**Fill in here.**