![Big Data University](https://ibm.box.com/shared/static/jvcqp2iy2jlx2b32rmzdt0tx8lvxgzkp.png)
# <center> Text generation using RNN/LSTM (Character-level)</center>
<div class="alert alert-block alert-info">
<font size = 3><strong>In this notebook you will learn the How to use TensorFlow for create a Recurrent Neural Network</strong></font>
<br>    
- <a href="#intro">Introduction</a>
<br>
- <p><a href="#arch">Architectures</a></p>
    - <a href="#lstm">Long Short-Term Memory Model (LSTM)</a>

- <p><a href="#build">Building a LSTM with TensorFlow</a></p>
</div>
----------------

This code implements a Recurrent Neural Network with LSTM/RNN units for training/sampling from character-level language models. In other words the model takes a text file as input and trains the RNN network that learns to predict the next character in a sequence.  
The RNN can then be used to generate text character by character that will look like the original training data. 

This code is based on this [blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness/), and the code is an step-by-step implimentation of the [character-level implimentation](https://github.com/crazydonkey200/tensorflow-char-rnn).




In [1]:
from __future__ import print_function, division

import tensorflow as tf
import time

In [2]:
print('TensorFlow version:', tf.__version__)

TensorFlow version: 1.1.0


### Data loader
The following cell is a class that help to read data from input file.

In [3]:
import codecs
import os
import collections
from six.moves import cPickle
import numpy as np

class TextLoader():
    def __init__(self, data_dir, batch_size, seq_length, encoding='utf-8'):
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.seq_length = seq_length
        self.encoding = encoding

        input_file = os.path.join(data_dir, "input.txt")
        vocab_file = os.path.join(data_dir, "vocab.pkl")
        tensor_file = os.path.join(data_dir, "data.npy")

        if not (os.path.exists(vocab_file) and os.path.exists(tensor_file)):
            print("reading text file")
            self.preprocess(input_file, vocab_file, tensor_file)
        else:
            print("loading preprocessed files")
            self.load_preprocessed(vocab_file, tensor_file)
        self.create_batches()
        self.reset_batch_pointer()

    def preprocess(self, input_file, vocab_file, tensor_file):
        with codecs.open(input_file, "r", encoding=self.encoding) as f:
            data = f.read()
        counter = collections.Counter(data)
        count_pairs = sorted(counter.items(), key=lambda x: -x[1])
        self.chars, _ = zip(*count_pairs)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        with open(vocab_file, 'wb') as f:
            cPickle.dump(self.chars, f)
        self.tensor = np.array(list(map(self.vocab.get, data)))
        np.save(tensor_file, self.tensor)

    def load_preprocessed(self, vocab_file, tensor_file):
        with open(vocab_file, 'rb') as f:
            self.chars = cPickle.load(f)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        self.tensor = np.load(tensor_file)
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

    def create_batches(self):
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

        # When the data (tensor) is too small, let's give them a better error message
        if self.num_batches==0:
            assert False, "Not enough data. Make seq_length and batch_size small."

        self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length]
        xdata = self.tensor
        ydata = np.copy(self.tensor)
        ydata[:-1] = xdata[1:]
        ydata[-1] = xdata[0]
        self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
        self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)


    def next_batch(self):
        x, y = self.x_batches[self.pointer], self.y_batches[self.pointer]
        self.pointer += 1
        return x, y

    def reset_batch_pointer(self):
        self.pointer = 0

### Parameters
#### Batch, number_of_batch, batch_size and seq_length
what is batch, number_of_batch, batch_size and seq_length in the charcter level example?  

Lets assume the input is 'here is an example'. Then:
- txt_length = 18  
- seq_length = 3  
- batch_size = 2  
- number_of_batch = 18/3*2 = 3
- batch = array (['h','e','r'],['e',' ','i'])
- sample Seq = 'her'  


So, what are our actual parameters?


In [4]:
batch_size = 60 #minibatch size, i.e. size of dataset in each epoch
seq_length = 50 #RNN sequence length
num_epochs = 25 # you should change it to 50 if you want to see a relatively good results
learning_rate = 0.002
decay_rate = 0.97
rnn_size = 128 #size of RNN hidden state
num_layers = 2 #number of layers in the RNN

### LSTM Architecture
- each LSTM cell has an input layre, which its size is 128 units. 
- 128 is dimensionality of embedding vector.



#### rnn_size = num_units = num_hidden_units:   = LSTM size


- Each LSTM cell has a hidden layer, where there are some hidden units.
- The argument n_hidden=128 of BasicLSTMCell is the number of hidden units of the LSTM (inside A).
- Each LSTM cell keeps a vector, called __hidden state__ vector, of size n_hidden=128.
- A __hidden state__ vector; which is the memory of the LSTM, accumulates using its (forget, input, and output) gates through time. 
- For each LSTM cell that we initialise, we need to supply a value (128 in this case) for the hidden dimension, or as some people like to call it, the number of units in the LSTM cell. 
- "num_units" is equivalant to "size of RNN hidden state"
- rnn_size= 128, is also the dimension size of W2V/embedding, for each character/word.
- An LSTM keeps two pieces of information as it propagates through time: 
    - A __hidden state__ vector
    - A __previous time-step output__
- To make the name num_units more intuitive, you can think of it as the number of hidden units in the LSTM cell, or the number of memory units in the cell.
- number of hidden units is the dimensianality of the output (= dimesianality of the state) of the LSTM cell.

#### num_layers = 2 
- number of layers in the RNN
- An input of MultiRNNCell is __cells__ which is list of RNNCells that will be composed in this order.

In [5]:
!mkdir -p ../data/character_model
!wget -nv -O ../data/character_model/input.txt https://ibm.box.com/shared/static/a3f9e9mbpup09toq35ut7ke3l3lf03hg.txt 

2017-05-06 18:47:01 URL:https://public.boxcloud.com/d/1/byc_Ao0uctAPEKFl_SiDkFsv_-GQILLDYaMOFMFX7tURITWFXoSXrNes0uKB-9XfRPcj4xqssV48m-AorsAazPBcKQY6V79pUqAGKAwPbORETERAV_6SyloabyBR-1kS1HKMc-U-0mo68EZoEoFn-tCXDfeV9LMt6-_8CaiIwlv9b5dwQa354bhPppN9xBHkyEldjYzWmYr4OZw4hQqZdSaVR-23a9jhJIGNH3I4iCRDpxkW8O9l7LljyNI01sqXriyU2H1_qCNA02e038XL3aQCA9XsbiEQHrMXPOLyFuERMQCPgqa2ixmaC0O4g_FY0yBY5vjYY52Hv79aC11PhEq21EHPXS79mmCoB-Rj5lAnIIvUIP7TVDRkKi7a5GqdqyfHquuntM7p7dvlOX8eVG1W0QF-2jUAQmy8kHV1Lt8NoeKaCGiGxgDofrNfWxgPM8Ca9GxMg3vHO06PrUt5bdenKvQrlRF6y_KKvoHvx2dBrpKwPN6fA-gTyGV-OEHTjfDY6JiiIuCZwrFNuoPxXZSqUSBSl7A1jH6fObHhXvuxS6wZ2AmPk9DEo6iHlbMyW55izQzYIH0YAVOeAZwfQRxeVIldhVeb2uO6zDnXpKS0vRcZZy0bNjN-dREI8HwYmr-hu1xUVdtaY15599DRE87_iVnq6_aDG1kD3vpPIIx6kPPBH6bT8z_JoFxGL6mNzPcDixrk-w04q7glnXaEcCVMRT0gb_-F00jnS_QG-k0fxsBCIZzAHy_nu5Hl-pAZ2TBps2U8DAu7f2DJZcUKIY4Ail6u3ywk5tww_HMVJWAK-n7AqzYY3G16txjbffoDVba6JtAl9BvuReY0GpWc_3gUWM3n0V3LnxEB2IL9EqvQaUx4IHUrSZwsD6P-V3TPOFWtbmtKcmTPKNqgvn8g7AGVD1utVrvhLtlecrMsovNo-jwV

In [6]:
data_loader = TextLoader('../data/character_model/', batch_size, seq_length)
vocab_size = data_loader.vocab_size
data_loader.vocab_size

loading preprocessed files


65

In [7]:
data_loader.num_batches

371

### Input and output

In [8]:
x,y = data_loader.next_batch()

In [9]:
x

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ..., 
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]])

In [10]:
x.shape  #batch_size=60, seq_length=50

(60, 50)

In [11]:
y

array([[ 9,  7,  6, ...,  4,  7,  0],
       [ 4, 14, 22, ...,  9, 20,  5],
       [20, 10, 29, ..., 10, 18,  4],
       ..., 
       [ 2,  0,  6, ..., 21,  0,  6],
       [ 7,  7,  4, ...,  2,  3,  0],
       [ 7,  0, 33, ...,  9, 23,  0]])

In [12]:
print('Vocabulary size:', data_loader.vocab_size)

Vocabulary size: 65


In [13]:
print(", ".join(sorted(list(data_loader.chars))))


,  , !, $, &, ', ,, -, ., 3, :, ;, ?, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z


In [14]:
data_loader.vocab['t']

2

### Defining stacked RNN Cell

__BasicRNNCell__ is the most basic RNN cell.

In [15]:
# a two layer cell
with tf.variable_scope('multi_rnn_cell'):
    stacked_cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.BasicRNNCell(rnn_size) for _ in range(num_layers)])

In [16]:
# hidden state size
stacked_cell.output_size

128

In [17]:
stacked_cell.state_size

(128, 128)

In [18]:
input_data = tf.placeholder(tf.int32, [batch_size, seq_length])# a 60x50
targets = tf.placeholder(tf.int32, [batch_size, seq_length]) # a 60x50

The memory state of the network is initialized with a vector of zeros and gets updated after reading each character.

__BasicRNNCell.zero_state(batch_size, dtype)__ Return zero-filled state tensor(s).

Args:

batch_size: int, float, or unit Tensor representing the batch size.  
dtype: the data type to use for the state.

In [19]:
initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size ? 60x128

In [20]:
input_data

<tf.Tensor 'Placeholder:0' shape=(60, 50) dtype=int32>

In [21]:
session = tf.Session()

In [22]:
feed_dict={input_data:x, targets:y}

In [23]:
session.run(input_data, feed_dict)

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ..., 
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]], dtype=int32)

### Embedding

In [24]:
with tf.variable_scope('rnnlm',reuse=False):
    softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
    softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65)
    with tf.device("/cpu:0"):
        embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128
        #input_data is a matrix of 60x50 and embedding is dictionary of 65x128 for all 65 characters
        # embedding_lookup goes to each row of input_data, and for each character in the row, finds the correspond vector in embedding
        # it creates a 60*50*[1*128] matrix
        # so, the first elemnt of em, is a matrix of 50x128, which each row of it is vector representing that character
        em = tf.nn.embedding_lookup(embedding, input_data) # em is 60x50x[1*128]
        # split: Splits a tensor into sub tensors.
        # syntax:  tf.split(split_dim, num_split, value, name='split')
        # it will split the 60x50x[1x128] matrix into 50 matrix of 60x[1*128]
        inputs = tf.split(em, seq_length, 1)
        # It will convert the list to 50 matrix of [60x128]
        inputs = [tf.squeeze(input_, [1]) for input_ in inputs]

In [25]:
session.run(tf.global_variables_initializer())
session.run(embedding)

array([[ 0.15802948,  0.12496774,  0.16846476, ..., -0.11029783,
         0.10446949, -0.10728429],
       [ 0.03655826,  0.05616073, -0.11711955, ..., -0.08184877,
        -0.0203063 , -0.11410461],
       [ 0.1367863 , -0.02998486, -0.11551569, ...,  0.13584189,
        -0.10375368,  0.15036653],
       ..., 
       [-0.0543799 , -0.1526051 , -0.1581445 , ...,  0.16187204,
         0.06553376,  0.06251667],
       [ 0.05188106,  0.03752673,  0.17359488, ...,  0.06455946,
        -0.06128889, -0.096526  ],
       [ 0.01784262,  0.11226384,  0.17503782, ..., -0.05319381,
         0.15254994, -0.07819793]], dtype=float32)

In [26]:
em = tf.nn.embedding_lookup(embedding, input_data)
em

<tf.Tensor 'embedding_lookup:0' shape=(60, 50, 128) dtype=float32>

In [27]:
emp = session.run(em,feed_dict={input_data:x})
emp.shape

(60, 50, 128)

In [28]:
emp[0]

array([[-0.0278158 ,  0.03496425, -0.06117989, ...,  0.15364163,
         0.17234583, -0.14929119],
       [-0.13231784,  0.02223931,  0.08811866, ..., -0.12554383,
         0.0763071 , -0.11975383],
       [-0.0894724 ,  0.07967277, -0.07505681, ...,  0.11697145,
        -0.07735235, -0.16337179],
       ..., 
       [ 0.03655826,  0.05616073, -0.11711955, ..., -0.08184877,
        -0.0203063 , -0.11410461],
       [-0.16884759,  0.10350727, -0.16924985, ...,  0.13376759,
         0.09569244, -0.0209478 ],
       [-0.0894724 ,  0.07967277, -0.07505681, ...,  0.11697145,
        -0.07735235, -0.16337179]], dtype=float32)

In [29]:
inputs = tf.split(em, seq_length, 1)
inputs[0:5]

[<tf.Tensor 'split:0' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:1' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:2' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:3' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:4' shape=(60, 1, 128) dtype=float32>]

In [30]:
inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
inputs[0:5]

[<tf.Tensor 'Squeeze:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_1:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_2:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_3:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_4:0' shape=(60, 128) dtype=float32>]

### Feeding a batch of 50 sequence to a RNN:
- Step 1:  first character of each of the 50 sentences (in a batch) is input in parallel.  
- Step 2:  second character of each of the 50 sentences is input in parallel. 
- Step n: nth character of each of the 50 sentences is input in parallel.  

The parallelism is only for efficiency.  Each character in a batch is handled in parallel,  but the network sees one character of a sequence at a time and does the computations accordingly. All the computations involving the characters of all sequences in a batch at a given time step are done in parallel. 

In [31]:
session.run(inputs[0],feed_dict={input_data:x})

array([[-0.0278158 ,  0.03496425, -0.06117989, ...,  0.15364163,
         0.17234583, -0.14929119],
       [-0.175194  , -0.13956085,  0.05114289, ...,  0.02340321,
        -0.14861377,  0.01708506],
       [-0.16209128, -0.06615045,  0.01765828, ..., -0.07865245,
         0.00767247,  0.17622088],
       ..., 
       [ 0.00171761,  0.02305156, -0.15469344, ..., -0.09677739,
         0.00373751, -0.02744491],
       [-0.13231784,  0.02223931,  0.08811866, ..., -0.12554383,
         0.0763071 , -0.11975383],
       [-0.11929911, -0.00828248, -0.10003815, ..., -0.05853085,
        -0.13604917, -0.16325609]], dtype=float32)

In [32]:
stacked_cell.state_size

(128, 128)

In [33]:
#outputs is 50x[60*128]
outputs, last_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, initial_state, stacked_cell, loop_function=None, scope='rnnlm')

In [34]:
outputs[0:5]

[<tf.Tensor 'rnnlm_1/multi_rnn_cell/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_1/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_2/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_3/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_4/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>]

In [35]:
test = outputs[0]
test

<tf.Tensor 'rnnlm_1/multi_rnn_cell/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>

In [36]:
session.run(tf.global_variables_initializer())
session.run(test,feed_dict={input_data:x})

array([[ 0.05824382,  0.02748017,  0.08114314, ...,  0.05969861,
        -0.00070989,  0.10022673],
       [ 0.0702572 , -0.02185716, -0.01672457, ..., -0.08219045,
        -0.0184204 ,  0.03360308],
       [ 0.02013035,  0.00143867,  0.04153982, ..., -0.01863404,
        -0.09647927,  0.02204642],
       ..., 
       [ 0.03273392, -0.02835438,  0.12730832, ..., -0.03715776,
         0.00983012, -0.09039374],
       [-0.01099309,  0.17606096,  0.08424649, ..., -0.14397293,
        -0.04705437,  0.02811577],
       [-0.0236993 ,  0.00772038,  0.00658628, ..., -0.01797423,
         0.03807817, -0.07155046]], dtype=float32)

outputs is 50x[60*128]. We need to reshape it to [60x50x128]. Then we can calculate the softmax:

softmax_w is [rnn_size, vocab_size], [128x65]

[60x50x128]x[128x65]+[60x50]

In [37]:
output = tf.reshape(tf.concat(outputs, 1), [-1, rnn_size])
output

<tf.Tensor 'Reshape:0' shape=(3000, 128) dtype=float32>

In [38]:
logits = tf.matmul(output, softmax_w) + softmax_b
logits

<tf.Tensor 'add:0' shape=(3000, 65) dtype=float32>

In [39]:
probs = tf.nn.softmax(logits)
probs

<tf.Tensor 'Softmax:0' shape=(3000, 65) dtype=float32>

In [40]:
session.run(tf.global_variables_initializer())
session.run(probs,feed_dict={input_data:x})

array([[ 0.01222247,  0.01239169,  0.01315727, ...,  0.01299655,
         0.01246927,  0.01485467],
       [ 0.01300001,  0.0150417 ,  0.01482361, ...,  0.01177573,
         0.01591454,  0.01351067],
       [ 0.01423361,  0.01429424,  0.01438452, ...,  0.01387808,
         0.01175493,  0.01666627],
       ..., 
       [ 0.01343981,  0.01711948,  0.01274421, ...,  0.01058192,
         0.0162006 ,  0.01851727],
       [ 0.00842934,  0.01167001,  0.01234786, ...,  0.01293614,
         0.01714483,  0.01246518],
       [ 0.01316325,  0.01136755,  0.01680106, ...,  0.01283109,
         0.01323074,  0.01329494]], dtype=float32)

In [41]:
loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([logits],
                [tf.reshape(targets, [-1])],
                [tf.ones([batch_size * seq_length])],
                vocab_size)

In [42]:
cost = tf.reduce_sum(loss) / batch_size / seq_length
cost
        

<tf.Tensor 'truediv_1:0' shape=() dtype=float32>

In [43]:
final_state = last_state
final_state

(<tf.Tensor 'rnnlm_1/multi_rnn_cell_49/cell_0/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_49/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>)

In [44]:
lr = tf.Variable(0.0, trainable=False)

In [45]:
grad_clip =5.
tvars = tf.trainable_variables()

In [46]:
tvars

[<tf.Variable 'rnnlm/softmax_w:0' shape=(128, 65) dtype=float32_ref>,
 <tf.Variable 'rnnlm/softmax_b:0' shape=(65,) dtype=float32_ref>,
 <tf.Variable 'rnnlm/embedding:0' shape=(65, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/weights:0' shape=(256, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/biases:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/weights:0' shape=(256, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/biases:0' shape=(128,) dtype=float32_ref>]

In [47]:
session.run(tf.global_variables_initializer())
[v.name for v in tf.global_variables()]

['rnnlm/softmax_w:0',
 'rnnlm/softmax_b:0',
 'rnnlm/embedding:0',
 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/weights:0',
 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/biases:0',
 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/weights:0',
 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/biases:0',
 'Variable:0']

In [48]:
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
grads

[<tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_0:0' shape=(128, 65) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_1:0' shape=(65,) dtype=float32>,
 <tensorflow.python.framework.ops.IndexedSlices at 0x7f7a15a3b7f0>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_3:0' shape=(256, 128) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_4:0' shape=(128,) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_5:0' shape=(256, 128) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_6:0' shape=(128,) dtype=float32>]

In [49]:
session.run(grads, feed_dict)[0]

array([[ -1.09968632e-02,  -5.91168646e-03,  -4.88472986e-04, ...,
          1.23671023e-03,   8.99837760e-04,   9.71516303e-04],
       [ -1.02021801e-03,  -1.97622250e-03,  -1.39479572e-03, ...,
         -1.61420525e-04,   1.18729095e-05,  -4.82227551e-05],
       [  3.01676599e-04,  -3.85406148e-03,  -4.42664605e-03, ...,
          5.55628678e-04,   4.58189053e-04,   5.04673808e-04],
       ..., 
       [  2.70265481e-03,   3.95567995e-03,   1.72807439e-03, ...,
         -7.03196973e-04,  -5.23120281e-04,  -6.10215415e-04],
       [ -8.60623829e-03,  -3.62226507e-03,  -6.48075994e-03, ...,
          1.22241932e-03,   8.51926336e-04,   1.07544393e-03],
       [ -8.31407309e-03,  -5.29507268e-03,  -3.01819318e-03, ...,
          1.02002721e-03,   7.05382670e-04,   9.10283183e-04]], dtype=float32)

In [50]:
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.apply_gradients(zip(grads, tvars))

# Using classes
Now that we have learned how the networks work, we can put all together:

In [51]:
class LSTMModel():
    def __init__(self,sample=False):
        rnn_size = 128 # size of RNN hidden state vector
        batch_size = 60 # minibatch size, i.e. size of dataset in each epoch
        seq_length = 50 # RNN sequence length
        num_layers = 2 # number of layers in the RNN
        vocab_size = 65
        grad_clip = 5.
        if sample:
            print("sample mode")
            batch_size = 1
            seq_length = 1
        # model.cell.state_size is (128, 128)
        with tf.variable_scope('lstm_model_cell'):
            reuse = tf.get_variable_scope().reuse
            self.stacked_cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.BasicRNNCell(rnn_size, reuse=reuse) 
                                                         for _ in range(num_layers)])

        self.input_data = tf.placeholder(tf.int32, [batch_size, seq_length])
        self.targets = tf.placeholder(tf.int32, [batch_size, seq_length])
        # Initial state of the LSTM memory.
        # The memory state of the network is initialized with a vector of zeros and gets updated after reading each char. 
        self.initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size

        with tf.variable_scope('rnnlm_class1'):
            softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
            softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65
            with tf.device("/cpu:0"):
                embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128
                inputs = tf.split(tf.nn.embedding_lookup(embedding, self.input_data), seq_length, 1)
                inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
                #inputs = tf.split(em, seq_length, 1)

        # The value of state is updated after processing each batch of chars.
        outputs, last_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, self.initial_state, self.stacked_cell, loop_function=None, scope='rnnlm_class1')
        output = tf.reshape(tf.concat(outputs,1), [-1, rnn_size])
        self.logits = tf.matmul(output, softmax_w) + softmax_b
        self.probs = tf.nn.softmax(self.logits)
        loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.ones([batch_size * seq_length])],
                vocab_size)
        self.cost = tf.reduce_sum(loss) / batch_size / seq_length
        self.final_state = last_state
        self.lr = tf.Variable(0.0, trainable=False)
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),grad_clip)
        optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))
        
    def sample(self, sess, chars, vocab, num=200, prime='The ', sampling_type=1):
        state = sess.run(self.stacked_cell.zero_state(1, tf.float32))
        for char in prime[:-1]:
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [state] = sess.run([self.final_state], feed)

        def weighted_pick(weights):
            t = np.cumsum(weights)
            s = np.sum(weights)
            return(int(np.searchsorted(t, np.random.rand(1)*s)))

        ret = prime
        char = prime[-1]
        for n in range(num):
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [probs, state] = sess.run([self.probs, self.final_state], feed)
            p = probs[0]

            if sampling_type == 0:
                sample = np.argmax(p)
            elif sampling_type == 2:
                if char == ' ':
                    sample = weighted_pick(p)
                else:
                    sample = np.argmax(p)
            else: # sampling_type == 1 default:
                sample = weighted_pick(p)

            pred = chars[sample]
            ret += pred
            char = pred
        return ret


the input is always a matrix of of shape [n x m]. Where n is the batch size, m is the feature size. 
In our case, the input shape will be [60 x ??]. 

 
size of data is 1113000, number of batches are 371, batch size is 60 and sequence length is 50. so, 50*60*371= 1113000

we have 50 epochs. 
each input matrix will represent 1 update per epoch.

### Creating the LSTM object

In [52]:
with tf.variable_scope("rnn"):
    model = LSTMModel()

In [53]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
e=1
sess.run(tf.assign(model.lr, learning_rate * (decay_rate ** e)))
data_loader.reset_batch_pointer()
state = sess.run(model.initial_state)
state

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),
 array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32))

In [54]:
x, y = data_loader.next_batch()
feed = {model.input_data: x, model.targets: y, model.initial_state:state}

In [55]:
train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
train_loss

4.2207117

In [56]:
state

(array([[-0.03888231, -0.03096804,  0.04803599, ..., -0.15671399,
          0.11745685,  0.01139924],
        [ 0.28310701,  0.01702675, -0.3102383 , ...,  0.22127898,
         -0.21215115, -0.25415346],
        [-0.17626575,  0.08315198, -0.24285816, ..., -0.18454163,
          0.1577169 ,  0.09897756],
        ..., 
        [ 0.14232227,  0.22863497,  0.00102451, ...,  0.04010071,
         -0.0842349 , -0.13486421],
        [-0.0808434 ,  0.08186   , -0.08142276, ..., -0.1167951 ,
         -0.02547013, -0.19040173],
        [-0.23224947,  0.17555398, -0.23806581, ..., -0.05937954,
         -0.21333382,  0.00371905]], dtype=float32),
 array([[ 0.15038921, -0.20966865, -0.10630569, ...,  0.08369136,
         -0.25725228, -0.02398055],
        [-0.17314234,  0.12231099, -0.08771722, ..., -0.19649088,
         -0.1640528 ,  0.06887512],
        [ 0.08831589, -0.17550877, -0.17147467, ...,  0.10027228,
         -0.1832839 , -0.14662866],
        ..., 
        [ 0.16485128,  0.05310654, -0

# Train usinng LSTMModel class

In [58]:
initial_lr = 0.01
num_epochs = 50

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for e in range(num_epochs): # num_epochs is 5 for test, but should be higher
        current_lr = initial_lr * (decay_rate ** e)
        sess.run(tf.assign(model.lr, current_lr))
        print('Epoch {} ({} / {} batches, lr={:.4f})'.format(
            e+1,
            (e+1) * data_loader.num_batches, 
            num_epochs * data_loader.num_batches,
            current_lr
        ))
        data_loader.reset_batch_pointer()
        state = sess.run(model.initial_state) # (2x[60x128])
        for b in range(data_loader.num_batches): #for each batch
            start = time.time()
            x, y = data_loader.next_batch()
            feed = {model.input_data: x, model.targets: y, model.initial_state:state}
            train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
            end = time.time()
        print("Train_loss={:.3f}   Time/Batch={:.3f} ms".format(
            train_loss, 
            (end - start) * 1000
        ))
        print()
        #model.sample(sess, data_loader.chars , data_loader.vocab, num=200, prime='The ', sampling_type=1)

Epoch 1 (371 / 18550 batches, lr=0.0100)
Train_loss=1.738   Time/Batch=21.019 ms

Epoch 2 (742 / 18550 batches, lr=0.0097)
Train_loss=1.651   Time/Batch=21.111 ms

Epoch 3 (1113 / 18550 batches, lr=0.0094)
Train_loss=1.610   Time/Batch=24.332 ms

Epoch 4 (1484 / 18550 batches, lr=0.0091)
Train_loss=1.596   Time/Batch=20.391 ms

Epoch 5 (1855 / 18550 batches, lr=0.0089)
Train_loss=1.583   Time/Batch=21.730 ms

Epoch 6 (2226 / 18550 batches, lr=0.0086)
Train_loss=1.581   Time/Batch=27.467 ms

Epoch 7 (2597 / 18550 batches, lr=0.0083)
Train_loss=1.581   Time/Batch=25.622 ms

Epoch 8 (2968 / 18550 batches, lr=0.0081)
Train_loss=1.572   Time/Batch=21.569 ms

Epoch 9 (3339 / 18550 batches, lr=0.0078)
Train_loss=1.567   Time/Batch=21.797 ms

Epoch 10 (3710 / 18550 batches, lr=0.0076)
Train_loss=1.571   Time/Batch=20.609 ms

Epoch 11 (4081 / 18550 batches, lr=0.0074)
Train_loss=1.574   Time/Batch=23.571 ms

Epoch 12 (4452 / 18550 batches, lr=0.0072)
Train_loss=1.580   Time/Batch=23.896 ms

Epo

# Sample

In [59]:
sess = tf.InteractiveSession()
with tf.variable_scope("sample_test"):
    sess.run(tf.global_variables_initializer())
    m = LSTMModel(sample=True)

sample mode


In [60]:
prime='The '
num=200
sampling_type=1
vocab=data_loader.vocab
chars=data_loader.chars 

In [61]:
sess.run(m.initial_state)

(array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32),
 array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  

In [62]:
#print state
sess.run(tf.global_variables_initializer())
state=sess.run(m.initial_state)
for char in prime[:-1]:
    x = np.zeros((1, 1))
    x[0, 0] = vocab[char]
    feed = {m.input_data: x, m.initial_state:state}
    [state] = sess.run([m.final_state], feed)

In [63]:
state

(array([[ 0.10137668, -0.08077186, -0.10750609,  0.19757108, -0.00856418,
         -0.08256163,  0.05348716,  0.14192471, -0.17346245,  0.22914389,
         -0.02725136,  0.18006587, -0.04484245,  0.08095631, -0.14055172,
          0.04505174,  0.03672248,  0.26555255, -0.05147567, -0.1171177 ,
         -0.11120223, -0.15657797,  0.11259519, -0.09635724, -0.03603043,
          0.1233211 , -0.10203333,  0.04217954,  0.05316259, -0.11408927,
         -0.06685334, -0.15804359, -0.0777316 ,  0.00410224, -0.41452974,
          0.05459766, -0.09610467, -0.04428494, -0.12285881, -0.13246138,
          0.10123338,  0.02276151, -0.03532519,  0.01705571, -0.20884265,
         -0.11671029, -0.01063112,  0.07384278,  0.14188001, -0.05980527,
          0.04676616, -0.07080595, -0.20595013,  0.09804212, -0.0092013 ,
         -0.113457  , -0.08490174,  0.06722686, -0.17515771,  0.1102583 ,
          0.13089943,  0.07752761, -0.08167016,  0.06896663, -0.2031312 ,
         -0.0597404 ,  0.13616781, -0.

In [64]:
def weighted_pick(weights):
    t = np.cumsum(weights)
    s = np.sum(weights)
    return(int(np.searchsorted(t, np.random.rand(1)*s)))

ret = prime
char = prime[-1]
for n in range(num):
    x = np.zeros((1, 1))
    x[0, 0] = vocab[char]
    feed = {m.input_data: x, m.initial_state:state}
    [probs, state] = sess.run([m.probs, m.final_state], feed)
    p = probs[0]

    if sampling_type == 0:
        sample = np.argmax(p)
    elif sampling_type == 2:
        if char == ' ':
            sample = weighted_pick(p)
        else:
            sample = np.argmax(p)
    else: # sampling_type == 1 default:
        sample = weighted_pick(p)

    pred = chars[sample]
    ret += pred
    char = pred


In [65]:
ret

"The ExuedB,kjB?Q!hBCjd$JGjQUITK kpU.cXB'dZhV\nTx,Ap,Tu;QYfx vtweUSuqa UygPBQfCWypqn3Ub!jSSXfAA;JdiUOZqRR?3aZr$Vb\ny'U;PVF.$&GGaF?H:&!EDPiim &GAl'xYxxA&zIlG'ajq,,pv&zAUTrUIRu ?luAmv-tOTL$h:EKZ,kZgl?noLE$wtTc"

# Sample using function

In [66]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
state=sess.run(m.initial_state)
m.sample(sess, data_loader.chars , data_loader.vocab, num=200, prime='The ', sampling_type=1)

"The 3ESYEnEyh.3seImxt.k\nU;Ss\n?k?am$KGASlvrd-PoXvX:CyDNDDXOHF ?Hclt?oFG-u?rRaob'yU&KwNW3QdNHO3 WzsAISCQl?wlca$AfV&awKe\niw&w'J'Gz!&h'uKM,uRzJpN:yv?jNUHHlqCDYnjhHScjKHs?q'mkp\nK$YpdksOTGjztDsrs$K-W&McaY$LD!VQ"