# Task 2: Char-RNN

Char-RNN implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data. This network is first posted by Andrej Karpathy, you can find out about his original code on https://github.com/karpathy/char-rnn, the original code is written in *lua*.

Here we will implement Char-RNN using Tensorflow!

In [1]:
import time
import numpy as np
import tensorflow as tf

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
%load_ext autoreload
%autoreload 2

  from ._conv import register_converters as _register_converters


## Part 1: Setup
In this part, we will read the data of our input text and process the text for later network training. There are two txt files in the data folder, for computing time consideration, we will use tinyshakespeare.txt here.

In [2]:
with open('data/tinyshakespeare.txt', 'r') as f:
    text=f.read()
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# and let's get a glance of what the text is
print(text[:500])

Length of text: 1115394 characters
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [3]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters


In [4]:
# Creating a mapping from unique characters to indices
vocab_to_ind = {c: i for i, c in enumerate(vocab)}
ind_to_vocab = dict(enumerate(vocab))
text_as_int = np.array([vocab_to_ind[c] for c in text], dtype=np.int32)

# We mapped the character as indexes from 0 to len(vocab)
for char,_ in zip(vocab_to_ind, range(20)):
    print('{:6s} ---> {:4d}'.format(repr(char), vocab_to_ind[char]))
# Show how the first 10 characters from the text are mapped to integers
print ('{} --- characters mapped to int --- > {}'.format(text[:10], text_as_int[:10]))

'\n'   --->    0
' '    --->    1
'!'    --->    2
'$'    --->    3
'&'    --->    4
"'"    --->    5
','    --->    6
'-'    --->    7
'.'    --->    8
'3'    --->    9
':'    --->   10
';'    --->   11
'?'    --->   12
'A'    --->   13
'B'    --->   14
'C'    --->   15
'D'    --->   16
'E'    --->   17
'F'    --->   18
'G'    --->   19
First Citi --- characters mapped to int --- > [18 47 56 57 58  1 15 47 58 47]


## Part 2: Creating batches
Now that we have preprocessed our input data, we then need to partition our data, here we will use mini-batches to train our model, so how will we define our batches?

Let's first clarify the concepts of batches:
1. **batch_size**: Reviewing batches in CNN, if we have 100 samples and we set batch_size as 10, it means that we will send 10 samples to the network at one time. In RNN, batch_size have the same meaning, it defines how many samples we send to the network at one time.
2. **sequence_length**: However, as for RNN, we store memory in our cells, we pass the information through cells, so we have this sequence_length concept, which also called 'steps', it defines how long a sequence is.

From above two concepts, we here clarify the meaning of batch_size in RNN. Here, we define the number of sequences in a batch as N and the length of each sequence as M, so batch_size in RNN **still** represent the number of sequences in a batch but the data size of a batch is actually an array of size **[N, M]**.

<span style="color:red">TODO:</span>
finish the get_batches() function below to generate mini-batches.

Hint: this function defines a generator, use *yield*.

In [5]:
def get_batches(array, n_seqs, n_steps):
    '''
    Partition data array into mini-batches
    input:
    array: input data
    n_seqs: number of sequences in a batch
    n_steps: length of each sequence
    output:
    x: inputs
    y: targets, which is x with one position shift
       you can check the following figure to get the sence of what a target looks like
    '''
    batch_size = n_seqs * n_steps
    n_batches = int(len(array) / batch_size)
    # we only keep the full batches and ignore the left.
    array = array[:batch_size * n_batches]
    array = array.reshape((n_seqs, -1))
    
    # You should now create a loop to generate batches for inputs and targets
    #############################################
    #           TODO: YOUR CODE HERE            #
    #############################################
    
    for i in range(n_seqs, array.shape[1]):
        yield array[:,i-n_seqs:i], array[:,i-n_seqs+1:i+1]

In [6]:
batches = get_batches(text_as_int, 10, 10)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[18 47 56 57 58  1 15 47 58 47]
 [ 1 43 52 43 51 63 11  0 37 43]
 [52 58 43 42  1 60 47 56 58 59]
 [56 44 53 50 49  6  0 27 52  1]
 [47 52  1 57 54 47 58 43  1 53]
 [56 57  6  1 39 52 42  1 57 58]
 [46 47 51  1 42 53 61 52  1 58]
 [ 1 40 43 43 52  1 57 47 52 41]
 [50 58 57  1 51 39 63  1 57 46]
 [57 47 53 52  1 53 44  1 56 43]]

y
 [[47 56 57 58  1 15 47 58 47 64]
 [43 52 43 51 63 11  0 37 43 58]
 [58 43 42  1 60 47 56 58 59 43]
 [44 53 50 49  6  0 27 52  1 54]
 [52  1 57 54 47 58 43  1 53 44]
 [57  6  1 39 52 42  1 57 58 39]
 [47 51  1 42 53 61 52  1 58 53]
 [40 43 43 52  1 57 47 52 41 43]
 [58 57  1 51 39 63  1 57 46 39]
 [47 53 52  1 53 44  1 56 43 60]]


## Part 3: Build Char-RNN model
In this section, we will build our char-rnn model, it consists of input layer, rnn_cell layer, output layer, loss and optimizer, we will build them one by one.

The goal is to predict new text after given prime word, so for our training data, we have to define inputs and targets, here is a figure that explains the structure of the Char-RNN network.

![structure](img/charrnn.jpg)

<span style="color:red">TODO:</span>
finish all TODOs in ecbm4040.CharRNN and the blanks in the following cells.

**Note: The training process on following settings of parameters takes about 20 minutes on a GTX 1070 GPU, so you are suggested to use GCP for this task.**

In [7]:
from ecbm4040.CharRNN import *

### Training
Set sampling as False(default), we can start training the network, we automatically save checkpoints in the folder /checkpoints.

In [8]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

In [9]:
model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

NotImplementedError: Please edit this function.

In [None]:
# look up checkpoints
tf.train.get_checkpoint_state('checkpoints')

### Sampling
Set the sampling as True and we can generate new characters one by one. We can use our saved checkpoints to see how the network learned gradually.

In [None]:
model = CharRNN(len(vocab), batch_size, num_steps, rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

In [None]:
# choose a checkpoint other than the final one and see the results. It could be nasty, don't worry!
#############################################
#           TODO: YOUR CODE HERE            #
#############################################
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

### Change another type of RNN cell
We are using LSTM cell as the original work, but GRU cell is getting more popular today, let's chage the cell in rnn_cell layer to GRU cell and see how it performs. Your number of step should be the same as above.

**Note: You need to change your saved checkpoints' name or they will rewrite the LSTM results that you have already saved.**

In [None]:
# these are preset parameters, you can change them to get better result
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
rnn_size = 256           # Size of hidden layers in rnn_cell
num_layers = 2           # Number of hidden layers
learning_rate = 0.005    # Learning rate

model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate)
batches = get_batches(text_as_int, batch_size, num_steps)
model.train(batches, 6000, 2000)

In [None]:
model = CharRNN(len(vocab), batch_size, num_steps, 'GRU', rnn_size,
               num_layers, learning_rate, sampling=True)
# choose the last checkpoint and generate new text
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = model.sample(checkpoint, 1000, len(vocab), vocab_to_ind, ind_to_vocab, prime="LORD ")
print(samp)

In [None]:
def my_gen():
    for i in range(10):
        yield i

a = my_gen()

In [None]:
b = next(a)


In [None]:
test = [1,2,3,4,5,6]
print(np.array(test).reshape((3,-1)))

#### Questions
1. Compare your result of two networks that you built and the reasons that caused the difference. (It is a qualitative comparison, it should be based on the specific model that you build.)
2. Discuss the difference between LSTM cells and GRU cells, what are the pros and cons of using GRU cells?

Answer:
**Fill in here.**