<img src = "https://i.imgur.com/UjutVJd.jpg" align = "center">


# Text generation using RNN/LSTM (Character-level)
In this notebook you will learn the How to use TensorFlow for create a Recurrent Neural Network<br />

# Table of contents

<div>
- <a href="#intro">Introduction</a><br />
- <a href="#arch">Architectures</a><br />
- <a href="#lstm">Long Short-Term Memory Model (LSTM)</a><br />
- <a href="#build">Building a LSTM with TensorFlow</a>
</div>

<hr>

This code implements a Recurrent Neural Network with LSTM/RNN units for training/sampling from character-level language models. In other words, the model takes a text file as input and trains the RNN network that learns to predict the next character in a sequence.  
The RNN can then be used to generate text character by character that will look like the original training data. 

This code is based on this [blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness/), and the code is an step-by-step implimentation of the [character-level implimentation](https://github.com/crazydonkey200/tensorflow-char-rnn).




First, import the requiered libraries:

In [0]:
import tensorflow as tf
import time
import codecs
import os
import collections
from six.moves import cPickle
import numpy as np
#from tensorflow.python.ops import rnn_cell
#from tensorflow.python.ops import seq2seq

### Data loader
The following cell is a class that help to read data from input file.

In [0]:
class TextLoader():
    def __init__(self, data_dir, batch_size, seq_length, encoding='utf-8'):
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.seq_length = seq_length
        self.encoding = encoding

        input_file = os.path.join(data_dir, "input.txt")
        vocab_file = os.path.join(data_dir, "vocab.pkl")
        tensor_file = os.path.join(data_dir, "data.npy")

        if not (os.path.exists(vocab_file) and os.path.exists(tensor_file)):
            print("reading text file")
            self.preprocess(input_file, vocab_file, tensor_file)
        else:
            print("loading preprocessed files")
            self.load_preprocessed(vocab_file, tensor_file)
        self.create_batches()
        self.reset_batch_pointer()

    def preprocess(self, input_file, vocab_file, tensor_file):
        with codecs.open(input_file, "r", encoding=self.encoding) as f:
            data = f.read()
        counter = collections.Counter(data)
        count_pairs = sorted(counter.items(), key=lambda x: -x[1])
        self.chars, _ = zip(*count_pairs)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        with open(vocab_file, 'wb') as f:
            cPickle.dump(self.chars, f)
        self.tensor = np.array(list(map(self.vocab.get, data)))
        np.save(tensor_file, self.tensor)

    def load_preprocessed(self, vocab_file, tensor_file):
        with open(vocab_file, 'rb') as f:
            self.chars = cPickle.load(f)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        self.tensor = np.load(tensor_file)
        self.num_batches = int(self.tensor.size / (self.batch_size * self.seq_length))

    def create_batches(self):
        self.num_batches = int(self.tensor.size / (self.batch_size * self.seq_length))

        # When the data (tensor) is too small, let's give them a better error message
        if self.num_batches==0:
            assert False, "Not enough data. Make seq_length and batch_size small."

        self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length]
        xdata = self.tensor
        ydata = np.copy(self.tensor)
        ydata[:-1] = xdata[1:]
        ydata[-1] = xdata[0]
        self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
        self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)


    def next_batch(self):
        x, y = self.x_batches[self.pointer], self.y_batches[self.pointer]
        self.pointer += 1
        return x, y

    def reset_batch_pointer(self):
        self.pointer = 0

### Parameters
#### Batch, number_of_batch, batch_size and seq_length
what is batch, number_of_batch, batch_size and seq_length in the charcter level example?  

Lets assume the input is this sentence: '__here is an example__'. Then:
- txt_length = 18  
- seq_length = 3  
- batch_size = 2  
- number_of_batchs = 18/3*2 = 3
- batch = array (['h','e','r'],['e',' ','i'])
- sample Seq = 'her'  

Ok, now, lets look at a real dataset, with real parameters. 

In [0]:
seq_length = 50 # RNN sequence length
batch_size = 60  # minibatch size, i.e. size of data in each epoch
num_epochs = 125 # you should change it to 50 if you want to see a relatively good results
learning_rate = 0.002
decay_rate = 0.97
rnn_size = 128 # size of RNN hidden state (output dimension)
num_layers = 2 #number of layers in the RNN

We download the input file, and print a part of it:

In [36]:
!wget -nv -O input.txt https://ibm.box.com/shared/static/a3f9e9mbpup09toq35ut7ke3l3lf03hg.txt 
with open('input.txt', 'r') as f:
    read_data = f.read()
    print read_data[0:100]
f.closed

2019-08-09 02:44:01 URL:https://public.boxcloud.com/d/1/b1!pH4oZukfQRqec1J4FKXkpcWIy-XMH12q_W958Nf7lizg0KLr81jYZe3RrhUkx8ZWM0b7wNEEX3oBub5qv0FPq2ewGsnQLDnzabkWJXnCpMN1xNusVmsKBoy4Qhej6GF0usx3KsUryo5r6HJrnapAcMZD9GkIz4E54KwbgV4mgPIyztjqqBS52S8d99-ivk___Oz0RHyhB6Y36Vc-pvqKCdhm4ah5RXPHBfCdf6GRvUHkpWYJMCZtVew5DETWNuKFXxk60a01TyNIO2bBNu3593hPqA90RKGEjF9_BVyn0PNzj5SfLlrtlL20nMlLbQnVkNowUYUDLMz9h9OfO6UWzQVeU_SSxQ3BoWtVq1NWbNJ6HWdlvykIojrXTB87hyYfTSOHMYG-wAtxOfuA6lJGiD_k2Gl62NIqAPgTgK05NB8QV1qtRmUJaUAThNx7PogK7-c00MpO7MZe1p5_CUJEnRCob0-L2cpiE3ByDy5Tn65d9n19ECuQWL6d33uIO6KWZitFmKkAgxky-MP9TpnZ7AmyCTfeVgmgPCGg7IqaHg8E3Z82_2NIkF4yhd_FE_VWHsMWSDx1bdSBg9Lja0UFI3My2YM0dOI32PTJgNP2cTzrBIwr6zl3AdWfmVOuiYKWQEaujBqfQj9UiDb1Jz7rrynYz1rzdDa8lxMBbLp2K17hR4xD7cPTw-uDLsGOhzySNaQz0udYVd1jDdqU47gms2Xu0-yqZEJi2flVaw-YZamjofap3eV3Gc5xHsptH1l9TfUcNNdUrcqpGLSY2zTHf0VAxTgSbthLNBPR2vPr93128E2kyA-XGGSehVVdzpQANxYOw3mYdGhGsnp-gPEiOCZntvI9sy79lZQFDC4G0srdiYl0LebYISW-BtibvQhpYsALlfRK5y2PyhJP42_-5TuQYmYxqQYCVZyU7jiufJImq

True

2019-08-09 02:58:45 URL:https://public.boxcloud.com/d/1/b1!X8OnV-jC7hvBi7kpiBw5omZlGyrCv2ICSoT8HiabK3OQwLBdbMw1FDqvOqGpGK2zCuJz6SasgGExehmKjz-0ShVsPpCYbTXZiOtGXM-J5kUp0hUr3fVBwECOAe4Cy1UwEp4v9trA7EpQeGaI23TpqAhpTEGP7LZ27eAFb6_JndsCXabuT8t22OdfHwREKqOvP_kTb1orWuQhLqalB6-jbo-0IcRsLSrr5_pXJ9Kg9nUQgcFkXEe4zrYsdlhgsWUtlc878hTA5FlImpg9gdYSB2O_83bqt_yG9MDX9o33eehARPbMi7vtyz6mLWeDhMLJXqz9_ZNtfyD6qrrX0H1cixkilzBqOdpm-WORwfqgDN_c-bFSX62uMQKu1CngElhkXHt-TNc9MClw6lZvI0FwS0R2kUbQd3EGZkj0AhtpPY4CON9CLYoEXd_KIUItu13K2zg2k5D-zsD6GOFGOfBpmSwiXd-78RnofY6KhYu8HTCl3v_Op4DuZ2ComXY8150sZLFBnmBUewsBBsv2Gq_3JVuvREckvqPtCr7LX7TBXc3nL3dEnLiticCWZ51ghd5y0Lf3HuJ-aZnm-mihWx9n3CKh5rg1mG_cOzb2ndGtpzyJPY-wjkcKMkMC1aHg6rWkY-bEY0oIqNen7iHqOI6fzwpD7qhA03iZvLD7zI8viAs_NVTJUsVLdKlxSgjgnCs5RFwC7M3rbkMV-sif8WUVrPaNblKgNN6eZAAmpd5J5XGxBHLqcLWD-s3cypJFRc63PFKjaWl2Va-pfsGJMNCyGGVDPAx9YemwOfzXMt4BWzbGbOwYTOHnenX4laT9GJVPfNrFSVA5jeg-I9L0zJBkjdvLbHNhiksCDDU965NZ-eCTZJqYDaiGtSa31Mn8P6s23UNorKcGqvw4-hQcY_1CHnSlSyIJXambc9YxV7XgmaRkX

True

Now, we can read the data at batches using the __TextLoader__ class. It will convert the characters to numbers, and represent each sequence as a vector in batches:

In [37]:
data_loader = TextLoader('', batch_size, seq_length)
vocab_size = data_loader.vocab_size
print "vocabulary size:" ,data_loader.vocab_size
print "Characters:" ,data_loader.chars
print "vocab number of 'F':",data_loader.vocab['F']
print "Character sequences (first batch):", data_loader.x_batches[0]

reading text file
vocabulary size: 65
Characters: (u' ', u'e', u't', u'o', u'a', u'h', u's', u'r', u'n', u'i', u'\n', u'l', u'd', u'u', u'm', u'y', u',', u'w', u'f', u'c', u'g', u'I', u'b', u'p', u':', u'.', u'A', u'v', u'k', u'T', u"'", u'E', u'O', u'N', u'R', u'S', u'L', u'C', u';', u'W', u'U', u'H', u'M', u'B', u'?', u'G', u'!', u'D', u'-', u'F', u'Y', u'P', u'K', u'V', u'j', u'q', u'x', u'z', u'J', u'Q', u'Z', u'X', u'3', u'&', u'$')
vocab number of 'F': 49
Character sequences (first batch): [[49  9  7 ...  1  4  7]
 [19  4 14 ... 14  9 20]
 [ 8 20 10 ...  8 10 18]
 ...
 [21  2  0 ...  0 21  0]
 [ 9  7  7 ...  0  2  3]
 [ 3  7  0 ...  5  9 23]]
loading preprocessed files
vocabulary size: 65
Characters: (u' ', u'e', u't', u'o', u'a', u'h', u's', u'r', u'n', u'i', u'\n', u'l', u'd', u'u', u'm', u'y', u',', u'w', u'f', u'c', u'g', u'I', u'b', u'p', u':', u'.', u'A', u'v', u'k', u'T', u"'", u'E', u'O', u'N', u'R', u'S', u'L', u'C', u';', u'W', u'U', u'H', u'M', u'B', u'?', u'G', u'!', 

### Input and output

In [38]:
x,y = data_loader.next_batch()
x

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ...,
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]])

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ...,
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]])

In [39]:
x.shape  #batch_size =60, seq_length=50

(60, 50)

(60, 50)

Here, __y__ is the next character for each character in __x__:

In [40]:
y

array([[ 9,  7,  6, ...,  4,  7,  0],
       [ 4, 14, 22, ...,  9, 20,  5],
       [20, 10, 29, ..., 10, 18,  4],
       ...,
       [ 2,  0,  6, ..., 21,  0,  6],
       [ 7,  7,  4, ...,  2,  3,  0],
       [ 7,  0, 33, ...,  9, 23,  0]])

array([[ 9,  7,  6, ...,  4,  7,  0],
       [ 4, 14, 22, ...,  9, 20,  5],
       [20, 10, 29, ..., 10, 18,  4],
       ...,
       [ 2,  0,  6, ..., 21,  0,  6],
       [ 7,  7,  4, ...,  2,  3,  0],
       [ 7,  0, 33, ...,  9, 23,  0]])

### LSTM Architecture
Each LSTM cell has 5 parts:
1. Input
2. prv_state
3. prv_output
4. new_state
5. new_output


- Each LSTM cell has an input layre, which its size is 128 units in our case. The input vector's dimension also is 128, which is the dimensionality of embedding vector, so called, dimension size of W2V/embedding, for each character/word.
- Each LSTM cell has a hidden layer, where there are some hidden units. The argument n_hidden=128 of BasicLSTMCell is the number of hidden units of the LSTM (inside A). It keeps the size of the output and state vector. It is also known as, rnn_size, num_units, num_hidden_units, and LSTM size
- An LSTM keeps two pieces of information as it propagates through time: 
    - __hidden state__ vector: Each LSTM cell accept a vector, called __hidden state__ vector, of size n_hidden=128, and its value is returned to the LSTM cell in the next step. The __hidden state__ vector; which is the memory of the LSTM, accumulates using its (forget, input, and output) gates through time. "num_units" is equivalant to "size of RNN hidden state". number of hidden units is the dimensianality of the output (= dimesianality of the state) of the LSTM cell.
    - __previous time-step output__: For each LSTM cell that we initialize, we need to supply a value (128 in this case) for the hidden dimension, or as some people like to call it, the number of units in the LSTM cell. 


#### num_layers = 2 
- number of layers in the RNN, is defined by num_layers
- An input of MultiRNNCell is __cells__ which is list of RNNCells that will be composed in this order.

### Defining stacked RNN Cell

__BasicRNNCell__ is the most basic RNN cell.

In [41]:
cell = tf.contrib.rnn.BasicRNNCell(rnn_size)

W0809 02:44:05.834495 140666729650048 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0809 02:44:05.836189 140666729650048 deprecation.py:323] From <ipython-input-9-cbc7d8d66937>:1: __init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.


In [42]:
# a two layer cell
stacked_cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)

W0809 02:44:05.855052 140666729650048 deprecation.py:323] From <ipython-input-10-32025279f672>:1: __init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
W0809 02:44:05.858805 140666729650048 rnn_cell_impl.py:1642] At least two cells provided to MultiRNNCell are the same object and will share weights.


In [43]:
# hidden state size
stacked_cell.output_size

128

128

__state__ varibale keeps output and new_state of the LSTM, so it is a touple of size:

In [44]:
stacked_cell.state_size

(128, 128)

(128, 128)

Lets define input data:

In [45]:
input_data = tf.placeholder(tf.int32, [batch_size, seq_length])# a 60x50
input_data

<tf.Tensor 'Placeholder:0' shape=(60, 50) dtype=int32>

<tf.Tensor 'Placeholder_2:0' shape=(60, 50) dtype=int32>

and target data:

In [46]:
targets = tf.placeholder(tf.int32, [batch_size, seq_length]) # a 60x50
targets

<tf.Tensor 'Placeholder_1:0' shape=(60, 50) dtype=int32>

<tf.Tensor 'Placeholder_3:0' shape=(60, 50) dtype=int32>

The memory state of the network is initialized with a vector of zeros and gets updated after reading each character.

__BasicRNNCell.zero_state(batch_size, dtype)__ Return zero-filled state tensor(s). In this function, batch_size
representing the batch size.

In [0]:
initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size ? 60x128

Lets check the value of the input_data again:

In [48]:
session = tf.Session()
feed_dict={input_data:x, targets:y}
session.run(input_data, feed_dict)

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ...,
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]], dtype=int32)

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ...,
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]], dtype=int32)

### Embedding
In this section, we build a 128-dim vector for each character. As we have 60 batches, and 50 character in each sequence, it will generate a [60,50,128] matrix.

__Notice:__ The function `tf.get_variable()` is used to share a variable and to initialize it in one place. `tf.get_variable()` is used to get or create a variable instead of a direct call to `tf.Variable`. 

In [49]:
with tf.variable_scope('rnnlm', reuse=False):
    softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
    softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65)
    #with tf.device("/cpu:0"):
        
    # embedding variable is initialized randomely
    embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128

    # embedding_lookup goes to each row of input_data, and for each character in the row, finds the correspond vector in embedding
    # it creates a 60*50*[1*128] matrix
    # so, the first elemnt of em, is a matrix of 50x128, which each row of it is vector representing that character
    em = tf.nn.embedding_lookup(embedding, input_data) # em is 60x50x[1*128]
    # split: Splits a tensor into sub tensors.
    # syntax:  tf.split(split_dim, num_split, value, name='split')
    # it will split the 60x50x[1x128] matrix into 50 matrix of 60x[1*128]
    inputs = tf.split(em, seq_length, 1)
    # It will convert the list to 50 matrix of [60x128]
    inputs = [tf.squeeze(input_, [1]) for input_ in inputs]

W0809 02:44:07.156378 140666729650048 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


ValueError: ignored

Lets take a look at the __embedding__, __em__, and __inputs__ variabbles:

Embedding variable is initialized with random values:

In [0]:
session.run(tf.global_variables_initializer())
#print embedding.shape
session.run(embedding)

array([[ 0.10515676,  0.16609721,  0.14853133, ...,  0.15221782,
         0.07032926,  0.08683513],
       [-0.09505893,  0.1740468 ,  0.00039575, ...,  0.04179668,
        -0.04488787, -0.03425205],
       [ 0.03661935, -0.14359385, -0.1002887 , ...,  0.07241286,
         0.04698692,  0.014552  ],
       ...,
       [-0.06432796,  0.12638037,  0.09174477, ...,  0.0686183 ,
         0.04608312,  0.08143659],
       [ 0.13529946, -0.11840481,  0.02908492, ..., -0.13664408,
        -0.1443422 ,  0.12054236],
       [-0.03822321,  0.03628762,  0.14055218, ..., -0.02858269,
         0.1762139 ,  0.03097825]], dtype=float32)

The first elemnt of em, is a matrix of 50x128, which each row of it is vector representing that character

In [0]:
em = tf.nn.embedding_lookup(embedding, input_data)
emp = session.run(em,feed_dict={input_data:x})
print emp.shape
emp[0]

(60, 50, 128)


array([[ 0.10532202, -0.050962  ,  0.14200698, ..., -0.01312895,
         0.0417673 , -0.10974091],
       [ 0.08354788,  0.16374655,  0.03215314, ...,  0.05715887,
        -0.00169612, -0.16226545],
       [-0.04881635, -0.11528797, -0.0641548 , ..., -0.06162267,
        -0.03448358, -0.11319518],
       ...,
       [-0.09505893,  0.1740468 ,  0.00039575, ...,  0.04179668,
        -0.04488787, -0.03425205],
       [-0.09910723,  0.0268001 , -0.00342618, ..., -0.10508522,
         0.08648534,  0.05864352],
       [-0.04881635, -0.11528797, -0.0641548 , ..., -0.06162267,
        -0.03448358, -0.11319518]], dtype=float32)

Let's consider each sequence as a sentence of length 50 characters, then, the first item in __inputs__ is a [60x128] vector which represents the first characters of 60 sentences.

In [0]:
inputs = tf.split(em, seq_length, 1)
inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
inputs[0:5]

[<tf.Tensor 'Squeeze:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_1:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_2:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_3:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_4:0' shape=(60, 128) dtype=float32>]

### Feeding a batch of 50 sequence to a RNN:

The feeding process for iputs is as following:

- Step 1:  first character of each of the 50 sentences (in a batch) is entered in parallel.  
- Step 2:  second character of each of the 50 sentences is input in parallel. 
- Step n: nth character of each of the 50 sentences is input in parallel.  

The parallelism is only for efficiency.  Each character in a batch is handled in parallel,  but the network sees one character of a sequence at a time and does the computations accordingly. All the computations involving the characters of all sequences in a batch at a given time step are done in parallel. 

In [0]:
session.run(inputs[0],feed_dict={input_data:x})

array([[ 0.10532202, -0.050962  ,  0.14200698, ..., -0.01312895,
         0.0417673 , -0.10974091],
       [ 0.15246011,  0.13689722,  0.07332276, ..., -0.13954665,
         0.05134563,  0.08537458],
       [ 0.02448134, -0.12534583, -0.10239477, ..., -0.00183405,
         0.033241  ,  0.02454503],
       ...,
       [ 0.04817937,  0.13605626,  0.13869353, ...,  0.06541134,
         0.09054609, -0.03779064],
       [ 0.08354788,  0.16374655,  0.03215314, ...,  0.05715887,
        -0.00169612, -0.16226545],
       [ 0.00266905, -0.05825555, -0.0354684 , ..., -0.17366202,
         0.12272467,  0.14772232]], dtype=float32)

Feeding the RNN with one batch, we can check the new output and new state of network:

In [0]:
#outputs is 50x[60*128]
outputs, new_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, initial_state, stacked_cell, loop_function=None, scope='rnnlm')
new_state

W0809 02:44:09.418715 140666729650048 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell_impl.py:459: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


(<tf.Tensor 'rnnlm_1/rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/Tanh_98:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/Tanh_99:0' shape=(60, 128) dtype=float32>)

In [0]:
outputs[0:5]

[<tf.Tensor 'rnnlm_1/rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/Tanh_1:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/Tanh_3:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/Tanh_5:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/Tanh_7:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/Tanh_9:0' shape=(60, 128) dtype=float32>]

Let's check the output of network after feeding it with first batch:

In [0]:
first_output = outputs[0]
session.run(tf.global_variables_initializer())
session.run(first_output,feed_dict={input_data:x})

array([[-0.01865317, -0.11144683, -0.03006576, ...,  0.04930625,
        -0.00160041, -0.01676752],
       [-0.03037416, -0.02258308, -0.10357679, ..., -0.05289789,
        -0.05367209,  0.0380863 ],
       [-0.04261411, -0.11313707, -0.02332036, ...,  0.0326977 ,
        -0.01151852, -0.01092285],
       ...,
       [ 0.08442006, -0.05332744, -0.04347272, ..., -0.02535881,
        -0.03022373,  0.01977276],
       [-0.0684814 ,  0.07481834,  0.02154858, ..., -0.03946618,
         0.15615083, -0.0466455 ],
       [ 0.0146853 ,  0.06645272,  0.05034451, ..., -0.02579963,
        -0.05267765,  0.01782084]], dtype=float32)

As it was explained, __outputs__ variable is a 50x[60x128] tensor. We need to reshape it back to [60x50x128] to be able to calculate the probablity of the next character using the softmax. The __softmax_w__ shape is [rnn_size, vocab_size],whihc is [128x65] in our case. Threfore, we have a fully connected layer on top of LSTM cells, which help us to decode the next charachter. We can use the __softmax(output * softmax_w + softmax_b)__ for this purpose. The shape of the matrixis would be:

softmax([60x50x128]x[128x65]+[1x65]) = [60x50x65]

We can do it step-by-step:

In [0]:
output = tf.reshape(tf.concat( outputs,1), [-1, rnn_size])
output

<tf.Tensor 'Reshape:0' shape=(3000, 128) dtype=float32>

In [0]:
logits = tf.matmul(output, softmax_w) + softmax_b
logits

<tf.Tensor 'add:0' shape=(3000, 65) dtype=float32>

In [0]:
probs = tf.nn.softmax(logits)
probs

<tf.Tensor 'Softmax:0' shape=(3000, 65) dtype=float32>

Here is the probablity of the next chracter in all batches:

In [0]:
session.run(tf.global_variables_initializer())
session.run(probs,feed_dict={input_data:x})

array([[0.01351181, 0.01894985, 0.01724928, ..., 0.01401815, 0.01509671,
        0.01561719],
       [0.01207687, 0.016046  , 0.01560649, ..., 0.01688782, 0.01442331,
        0.01454099],
       [0.01538991, 0.01605703, 0.01702183, ..., 0.01485373, 0.01972784,
        0.01738826],
       ...,
       [0.01246773, 0.02733147, 0.01335132, ..., 0.01508607, 0.01616714,
        0.01762396],
       [0.01376728, 0.01850387, 0.02263522, ..., 0.02187546, 0.01835082,
        0.01668858],
       [0.01855796, 0.01469661, 0.01407986, ..., 0.01041665, 0.01255076,
        0.0137532 ]], dtype=float32)

Now, we are in the position to calculate the cost of training with __loss function__, and keep feedng the network to learn it. But, the question is: what the LSTM networks learn?

In [0]:
grad_clip =5.
tvars = tf.trainable_variables()
tvars

[<tf.Variable 'rnnlm/softmax_w:0' shape=(128, 65) dtype=float32_ref>,
 <tf.Variable 'rnnlm/softmax_b:0' shape=(65,) dtype=float32_ref>,
 <tf.Variable 'rnnlm/embedding:0' shape=(65, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/kernel:0' shape=(256, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/bias:0' shape=(128,) dtype=float32_ref>]

# All together
Now, let's put all of parts together in a class, and train the model:

In [0]:
class LSTMModel():
    def __init__(self,sample=False):
        rnn_size = 128 # size of RNN hidden state vector
        batch_size = 60 # minibatch size, i.e. size of dataset in each epoch
        seq_length = 50 # RNN sequence length
        num_layers = 2 # number of layers in the RNN
        vocab_size = 65
        grad_clip = 5.
        if sample:
            print(">> sample mode:")
            batch_size = 1
            seq_length = 1
        # The core of the model consists of an LSTM cell that processes one char at a time and computes probabilities of the possible continuations of the char. 
        basic_cell = tf.contrib.rnn.BasicRNNCell(rnn_size)
        # model.cell.state_size is (128, 128)
        self.stacked_cell = tf.contrib.rnn.MultiRNNCell([basic_cell] * num_layers)

        self.input_data = tf.placeholder(tf.int32, [batch_size, seq_length], name="input_data")
        self.targets = tf.placeholder(tf.int32, [batch_size, seq_length], name="targets")
        # Initial state of the LSTM memory.
        # The memory state of the network is initialized with a vector of zeros and gets updated after reading each char. 
        self.initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size

        with tf.variable_scope('rnnlm_class1'):
            softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
            softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65
            with tf.device("/cpu:0"):
                embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128
                inputs = tf.split(tf.nn.embedding_lookup(embedding, self.input_data), seq_length, 1)
                inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
                #inputs = tf.split(em, seq_length, 1)
                
                


        # The value of state is updated after processing each batch of chars.
        outputs, last_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, self.initial_state, self.stacked_cell, loop_function=None, scope='rnnlm_class1')
        output = tf.reshape(tf.concat(outputs,1), [-1, rnn_size])
        self.logits = tf.matmul(output, softmax_w) + softmax_b
        self.probs = tf.nn.softmax(self.logits)
        loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.ones([batch_size * seq_length])],
                vocab_size)
        self.cost = tf.reduce_sum(loss) / batch_size / seq_length
        self.final_state = last_state
        self.lr = tf.Variable(0.0, trainable=False)
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),grad_clip)
        optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))
    
    
    def sample(self, sess, chars, vocab, num=200, prime='The ', sampling_type=1):
        state = sess.run(self.stacked_cell.zero_state(1, tf.float32))
        #print state
        for char in prime[:-1]:
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [state] = sess.run([self.final_state], feed)

        def weighted_pick(weights):
            t = np.cumsum(weights)
            s = np.sum(weights)
            return(int(np.searchsorted(t, np.random.rand(1)*s)))

        ret = prime
        char = prime[-1]
        for n in range(num):
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [probs, state] = sess.run([self.probs, self.final_state], feed)
            p = probs[0]

            if sampling_type == 0:
                sample = np.argmax(p)
            elif sampling_type == 2:
                if char == ' ':
                    sample = weighted_pick(p)
                else:
                    sample = np.argmax(p)
            else: # sampling_type == 1 default:
                sample = weighted_pick(p)

            pred = chars[sample]
            ret += pred
            char = pred
        return ret

### Creating the LSTM object
Now we create a LSTM model:

In [0]:
with tf.variable_scope("rnn"):
    model = LSTMModel()

W0809 02:44:12.747869 140666729650048 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/clip_ops.py:286: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


# Train usinng LSTMModel class
We can train our model through feeding batches:

In [0]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for e in range(num_epochs): # num_epochs is 5 for test, but should be higher
        sess.run(tf.assign(model.lr, learning_rate * (decay_rate ** e)))
        data_loader.reset_batch_pointer()
        state = sess.run(model.initial_state) # (2x[60x128])
        for b in range(data_loader.num_batches): #for each batch
            start = time.time()
            x, y = data_loader.next_batch()
            feed = {model.input_data: x, model.targets: y, model.initial_state:state}
            train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
            end = time.time()
        print("{}/{} (epoch {}), train_loss = {:.3f}, time/batch = {:.3f}" \
                .format(e * data_loader.num_batches + b, num_epochs * data_loader.num_batches, e, train_loss, end - start))
        with tf.variable_scope("rnn", reuse=True):
            sample_model = LSTMModel(sample=True)
            print sample_model.sample(sess, data_loader.chars , data_loader.vocab, num=50, prime='The ', sampling_type=1)
            print '----------------------------------'

370/46375 (epoch 0), train_loss = 1.934, time/batch = 0.013
>> sample mode:
The lif: he ster thre thuek.

SATY UTESO
BENTUL:
I, of
----------------------------------
741/46375 (epoch 1), train_loss = 1.779, time/batch = 0.013
>> sample mode:
The are
sucker, there my loath, abodes
I water's denw-
----------------------------------
1112/46375 (epoch 2), train_loss = 1.704, time/batch = 0.013
>> sample mode:
The it. Heret with, nesty it
hum? I at iur not that is
----------------------------------
1483/46375 (epoch 3), train_loss = 1.656, time/batch = 0.013
>> sample mode:
The other you bern;
Qur not.

GREY:
Set that happile
T
----------------------------------
1854/46375 (epoch 4), train_loss = 1.623, time/batch = 0.013
>> sample mode:
The ears: the must I saces son,
I'll thou was pitife
T
----------------------------------
2225/46375 (epoch 5), train_loss = 1.598, time/batch = 0.013
>> sample mode:
The wenfer tell..

WARWICK:
Nobbawted that not proves 
----------------------------------


# Thanks for completing this lesson!
Created by: <a href = "https://linkedin.com/in/saeedaghabozorgi"> Saeed Aghabozorgi </a></h4>
This code is based on this [blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness/), and the code is an step-by-step implimentation of the [character-level implimentation](https://github.com/crazydonkey200/tensorflow-char-rnn).

<hr>

<p>Copyright &copy; 2017 IBM <a href="https://cognitiveclass.ai/?utm_source=ML0151&utm_medium=lab&utm_campaign=cclab">IBM Cognitive Class</a>. This notebook and its source code are released under the terms of the <a href="https://cognitiveclass.ai/mit-license/">MIT License</a>.</p>