# Char RNN From Scratch - Sherlock Holmes

In this notebook, I'll build a character-wise RNN trained on Sherlock Holmes and generate new text based on the text from the book.

References:<br />
Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)<br />
[implementation in Torch](https://github.com/karpathy/char-rnn)<br />
[here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html)<br />
[Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow)


In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

Load the text file and convert it into integers for our network to use.

In [2]:
with open('cnus.txt', 'r') as f:
    text=f.read()
vocab = sorted(set(text))
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

Let's check out the first 500 characters / interger encodings to make sure everything is peachy.

In [3]:
text[:500]

'\n\n\n\n                          THE COMPLETE SHERLOCK HOLMES\n\n                               Arthur Conan Doyle\n\n\n\n                                Table of contents\n\n               A Study In Scarlet\n\n               The Sign of the Four\n\n                  The Adventures of Sherlock Holmes\n               A Scandal in Bohemia\n               The Red-Headed League\n               A Case of Identity\n               The Boscombe Valley Mystery\n               The Five Orange Pips\n               The Man wit'

In [4]:
encoded[:500]

array([ 0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, 45, 33, 30,  1,
       28, 40, 38, 41, 37, 30, 45, 30,  1, 44, 33, 30, 43, 37, 40, 28, 36,
        1, 33, 40, 37, 38, 30, 44,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1, 26, 72, 74, 62, 75, 72,  1, 28, 69, 68, 55,
       68,  1, 29, 69, 79, 66, 59,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1, 45, 55, 56, 66, 59,  1, 69, 60,
        1, 57, 69, 68, 74, 59, 68, 74, 73,  0,  0,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1, 26,  1, 44, 74, 75, 58, 79,  1,
       34, 68,  1, 44, 57, 55, 72, 66, 59, 74,  0,  0,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1, 45, 62, 59,  1, 44, 63, 61,
       68,  1, 69, 60,  1

Check the length of our vocabulary list.

In [5]:
len(vocab)

97

## Making mini-batches for training

In [7]:
def get_batches(arr, batch_size, n_steps):
    '''Create a generator that returns batches of size
       batch_size x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       batch_size: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    chars_per_batch = batch_size * n_steps
    n_batches = len(arr)//chars_per_batch
    
    # Keep only enough characters to make full batches
    arr = arr[:n_batches * chars_per_batch]
    
    # Reshape into batch_size rows
    arr = arr.reshape((batch_size, -1))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:n+n_steps]
        # The targets, shifted by one
        y_temp = arr[:, n+1:n+n_steps+1]
        
        y = np.zeros(x.shape, dtype=x.dtype)
        y[:,:y_temp.shape[1]] = y_temp
        
        yield x, y

In [8]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)

In [9]:
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[ 0  0  0  0  1  1  1  1  1  1]
 [ 1  1  1  1  1 56 66 69 69 58]
 [74  1 77 55 73  1 72 59 67 69]
 [44 74 72 59 59 74 11  3  0  0]
 [68  1 62 63 73  1 72 69 69 67]
 [ 1  1 60 69 66 66 69 77 59 58]
 [ 0  1  1  1  1  1 63 68 73 59]
 [ 1  1  1  1 66 69 69 65 59 58]
 [69 66 69 75 72 59 58  1 56 72]
 [55 72 58 66 79  1 59 68 69 75]]

y
 [[ 0  0  0  1  1  1  1  1  1  1]
 [ 1  1  1  1 56 66 69 69 58  1]
 [ 1 77 55 73  1 72 59 67 69 76]
 [74 72 59 59 74 11  3  0  0  1]
 [ 1 62 63 73  1 72 69 69 67  1]
 [ 1 60 69 66 66 69 77 59 58  1]
 [ 1  1  1  1  1 63 68 73 59 72]
 [ 1  1  1 66 69 69 65 59 58  1]
 [66 69 75 72 59 58  1 56 72 63]
 [72 58 66 79  1 59 68 69 75 61]]


## Building the model


### Inputs

Create our input placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called `keep_prob`.

In [10]:
def build_inputs(batch_size, num_steps):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
    targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs, targets, keep_prob

### LSTM Cell

`build_lstm` function creates the LSTM cells we'll use in the hidden layer and the initial state.

In [11]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    ### Build the LSTM Cell
    
    def build_cell(lstm_size, keep_prob):
        # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        
        # Add dropout to the cell
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return drop
    
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

### RNN Output

Create the output layer. Connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character.

In [12]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        x: Input tensor
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # That is, the shape should be batch_size*num_steps rows by lstm_size columns
    seq_output = tf.concat(lstm_output, axis=1)
    x = tf.reshape(seq_output, [-1, in_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x, softmax_w) + softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits, name='predictions')
    
    return out, logits

### Training loss

Use the logits and targets and calculate the softmax cross-entropy loss. First we need to one-hot encode the targets, then reshape the one-hot targets so it's a 2D tensor with size $(M*N) \times C$ where $C$ is the number of classes/characters we have. Since the LSTM outputs are reshaped and run through a fully connected layer with $C$ units, the logits will also have size $(M*N) \times C$.

Then we run the logits and targets through `tf.nn.softmax_cross_entropy_with_logits` and find the mean to get the loss.

In [13]:
def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per batch_size per step
    y_one_hot = tf.one_hot(targets, num_classes)
    y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
    
    # Softmax cross entropy loss
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
    loss = tf.reduce_mean(loss)
    return loss

### Optimizer

Build the optimizer to clip the gradients above some threshold to address issues like gradients exploding and disappearing. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.

In [14]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

### Put the network together

In [15]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)

        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs, num_classes)
        
        # Run each sequence step through the RNN and collect the outputs
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss = build_loss(self.logits, self.targets, lstm_size, num_classes)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)

## Hyperparameters

Hyperparameters we need to tune for the network. 

* `batch_size` - Number of sequences running through the network in one pass
* `num_steps` - Number of characters in the sequence the network is trained on
* `lstm_size` - The number of units in the hidden layers
* `num_layers` - Number of hidden LSTM layers to use
* `learning_rate` - Learning rate for training
* `keep_prob` - The dropout keep probability when training

Andrej Karpathy's advice on [training](https://github.com/karpathy/char-rnn#tips-and-tricks)


In [16]:
batch_size = 100        # Sequences per batch
num_steps = 100         # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001   # Learning rate
keep_prob = 0.5         # Dropout keep probability

## Training

Passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. We also save some checkpoints every n iteration.

In [18]:
epochs = 20
# Print losses every N interations
print_every_n = 50

# Save every N iterations
save_every_n = 200

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            if (counter % print_every_n == 0):
                end = time.time()
                print('Epoch: {}/{}... '.format(e+1, epochs),
                      'Training Step: {}... '.format(counter),
                      'Training loss: {:.4f}... '.format(batch_loss),
                      '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))

Epoch: 1/20...  Training Step: 50...  Training loss: 3.0730...  0.4601 sec/batch
Epoch: 1/20...  Training Step: 100...  Training loss: 2.9392...  0.4634 sec/batch
Epoch: 1/20...  Training Step: 150...  Training loss: 2.7657...  0.4595 sec/batch
Epoch: 1/20...  Training Step: 200...  Training loss: 2.4986...  0.4603 sec/batch
Epoch: 1/20...  Training Step: 250...  Training loss: 2.2682...  0.4621 sec/batch
Epoch: 1/20...  Training Step: 300...  Training loss: 2.2063...  0.4618 sec/batch
Epoch: 2/20...  Training Step: 350...  Training loss: 2.1164...  0.4598 sec/batch
Epoch: 2/20...  Training Step: 400...  Training loss: 2.0545...  0.4614 sec/batch
Epoch: 2/20...  Training Step: 450...  Training loss: 2.0349...  0.4609 sec/batch
Epoch: 2/20...  Training Step: 500...  Training loss: 1.9373...  0.4620 sec/batch
Epoch: 2/20...  Training Step: 550...  Training loss: 1.8834...  0.4621 sec/batch
Epoch: 2/20...  Training Step: 600...  Training loss: 1.8656...  0.4621 sec/batch
Epoch: 2/20...  T

#### Check the saved checkpoints

In [19]:
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/i6760_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3200_l512.ckpt"
all_model_checkpoint_pa

## Sampling

We'll use the trained-up network to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. To reduce noise and make things a little less random, only choose a new character from the top N most likely characters.

In [20]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [21]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

Pass in the path to a checkpoint and sample from the network.

In [22]:
tf.train.latest_checkpoint('checkpoints')

'checkpoints/i6760_l512.ckpt'

In [23]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i6760_l512.ckpt
Farlan," he cried. "It's an
     information that he has set one of these, and the conclusions that
     I had no doubt. I have not brought at the stript all where he
     were that it was not to start in and should show her to--within a
     station which has been sent in that dark boy, who had suspicions
     which has been a seroratie, and there he was of his house. I would
     have a confluence of money as to the side of the maid that, went in
     the house of an instructive. He is a state, with a cabbin of
     spreadful man, was the matter of any chorting silence. It was too dark
     to see it, and then to me, and I could not see him to think.

     "I think we have a clinned to him as if you can discover, I wished to
     still think what the case were a secret was from the trouser of his father
     and his hand with as were sure at the sight of the singular sorn of
     the torthest streamed of her which 

In [24]:
checkpoint = 'checkpoints/i200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i200_l512.ckpt
Fareserad tothit oee ote tir hire hhae  oot ate hed hretat thar teon.  aod  in ot ee aon. I waed ha  oe ane tine
     whe at aon tot in atee
     andthe the eth es ortas tin thhe  ised hes aot he thar end oote ao  hhe
     an she eretha  ee os et eo tooe  an hee on ter ar hhe  an e ae  on than eso an toee oo ee oter too hhe son hte he te ot ae e ot tin  hoe  oo  aot hot he on hae too e e oe ortee os oe hhre
     oe san ee astas taod the e hand ane the an oe tho e ondton ha soet hit tit tit ot ot oor the
     at at or hee
 
     I ansd ane hon  ho on et an on ao e ao he ton he tot an the eest an  tertas et han sos toe  ateth hes an thord tees ser ersote hos tere on at too he os erte hes ote toe eroret hh e and areded ho  tee teo teot ho eed osree thoe
     teot an he ero ot ot hae
     hae tae  ther ton ot ir he one.  on oe hin hat ao too  hh ead tat tat e ot oe  the e as oot oo tee at hed tat har ae or on an tit e ot 

In [25]:
checkpoint = 'checkpoints/i600_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i600_l512.ckpt
Farryould of
     heverent to sally has bould, were, as waress toly out of out to
     as is a merasion the marke on the remat
     wolled to to to then
     which, sain I see the tild trees at ance to
     with a mearise wercer.


     "The sarks,"
    aris."

     "In's thet the sole of the meare, have her hind of throm."

     "I have the moren. Ho dess it is anly and at to the
     a colm at on the markent of the rows or
     cone of a carling the
     at a mas hind trased tho dight, was and the lear
     seress it ande to so the cas into his, wiste
     hese and wat and has hourder.

     "This selo have there wat ta ser at and the
     some had the soull. There hive to sert here sarling to
     the
     shan hored."

     "Yes, watery."

     "I was hid in andedse that has and here ofer
     sere and the courser, his whan I cand it tiss
     thit thas in.

     "There wost on the same our he sar he has there of 

In [26]:
checkpoint = 'checkpoints/i1200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i1200_l512.ckpt
Farrancy;
      a fout of the matter. That his care.

     "I am no saight all some him. How, you an insent that I sam at the
     solise of that there of the croubses. We comes at his concinally
     stopet and wat they hooke and the betther the poce of this
     complain to be when the propinite off once this trach were a stree

     and seep to the morres of his stires. I with him. I have not his bright
     of the pluan and store thing. Thenes the serven and any sen in hore and
     the stare, which was surget a latel and him would be a more wasting it
     the soot," seid he. I was a pringioned, ard he were this mential
     sure of the man when he seored him all whon he had the past, but I
     sheally be it is, and as I surd to be a most of the pack, with a
     plase was the stare. How have station the mark in the ment of the san
     singer."

     "Thit he have the sirning--an the poor of a singular, then i

### Modify the network - add an embedding layer

Adding an embedding layer allows the model to learn a distributed representation of the characters, which can help improve its ability to generalize to new sequences of characters.

In [50]:
def get_embed(input_data, vocab_size, embedding_size):
    """
    Create embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embedding_size: Number of embedding dimensions
    :return: Embedded input.
    """
    embedding = tf.Variable(tf.random_uniform([len(vocab), embedding_size], -1, 1))
    embed = tf.nn.embedding_lookup(embedding, input_data)
    
    return embed

In [51]:
def build_inputs_with_embed(batch_size, num_steps, embedding_size):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        embedding_size: Dimensionality of the embedding vector
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
    targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
    
    # Define embedding layer
    embed = get_embed(inputs, len(vocab), embedding_size)
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs, targets, embed, keep_prob

In [52]:
class CharRNN_modified:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, embedding_size=128, sampling=False):
    
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.embed, self.keep_prob = build_inputs_with_embed(batch_size, num_steps, embedding_size)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)

        ### Run the data through the RNN layers
        # Run each sequence step through the embedding layer
        x = self.embed
        
        # Run each sequence step through the RNN and collect the outputs
        outputs, state = tf.nn.dynamic_rnn(cell, x, initial_state=self.initial_state)
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss = build_loss(self.logits, self.targets, lstm_size, num_classes)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)


In [53]:
embedding_size = 256

epochs = 20
# Print losses every N interations
print_every_n = 50

# Save every N iterations
save_every_n = 200

model = CharRNN_modified(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate, embedding_size=embedding_size)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            if (counter % print_every_n == 0):
                end = time.time()
                print('Epoch: {}/{}... '.format(e+1, epochs),
                      'Training Step: {}... '.format(counter),
                      'Training loss: {:.4f}... '.format(batch_loss),
                      '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))

Epoch: 1/20...  Training Step: 50...  Training loss: 2.3888...  0.4892 sec/batch
Epoch: 1/20...  Training Step: 100...  Training loss: 1.9972...  0.4861 sec/batch
Epoch: 1/20...  Training Step: 150...  Training loss: 1.8225...  0.4906 sec/batch
Epoch: 1/20...  Training Step: 200...  Training loss: 1.7021...  0.4872 sec/batch
Epoch: 1/20...  Training Step: 250...  Training loss: 1.6357...  0.4898 sec/batch
Epoch: 1/20...  Training Step: 300...  Training loss: 1.5928...  0.4893 sec/batch
Epoch: 2/20...  Training Step: 350...  Training loss: 1.5419...  0.4906 sec/batch
Epoch: 2/20...  Training Step: 400...  Training loss: 1.4862...  0.4889 sec/batch
Epoch: 2/20...  Training Step: 450...  Training loss: 1.4750...  0.4907 sec/batch
Epoch: 2/20...  Training Step: 500...  Training loss: 1.3771...  0.4895 sec/batch
Epoch: 2/20...  Training Step: 550...  Training loss: 1.3709...  0.4912 sec/batch
Epoch: 2/20...  Training Step: 600...  Training loss: 1.3867...  0.4869 sec/batch
Epoch: 2/20...  T

In [54]:
tf.train.latest_checkpoint('checkpoints')

'checkpoints/i6760_l512.ckpt'

In [56]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = CharRNN_modified(len(vocab), lstm_size=lstm_size, sampling=True, embedding_size=256)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)


In [57]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i6760_l512.ckpt
Farndame and I
     had seen the possession of them, sir. I thought how was it, I was
     that I could not have had me thrown outside. I have not the signs of a
     child. I can hear that the first time that you have already caused all
     the time, and that in the way there is one thing about this. If I
     waited they can take the case in the station in the hall and all which
     I can speak to-day, and I can gather yourself that I would not trust
     me a little that we may still see the criminal.

     "This is the time of such a stare at all," he remarked with a
     low, prisoner. "They will be the second side of her at the
     sound of the meaning," said Holmes. "I am no possible path as you
     are in the police, so that it was one of them to tell me only a
     lady's secrecy."

     "I have not a case which I had not. When they are in the county to
     them all this in this middle--and your profess

In [58]:
checkpoint = 'checkpoints/i200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i200_l512.ckpt
Fard to that the sard and. The comined who whele he strould how her had had no though a sent which way to
     asterding if
     tather is.

     "There. I was an this mence as
     the can at indore to my a tour had been of this astion to me have at the colman, before where have to the shoust was a trince. This was the coul to me, and there all this sating of
     his seet who whay the
     the sene what is he.

     "As him tell on to take is."

    "Whet as she sal to as a sermar of the whough as she was a serponed of my and any were to me a telpested tation and to my of is in all the with that I shuld have tears there is when he was all the dong, but the some of he searth and, as to
    sare a seet, and he shough is this
     an all striet, bourding that hat the his stertion an sence as as his had on he with ald, so misting other thene with a lite. He was he as
     betring that him
     the chare alone. Then they

In [59]:
checkpoint = 'checkpoints/i600_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i600_l512.ckpt
Farrren of his
     pulicam into my horse. I've the time of attaced. He his arting of
     the matter. As was in his last attation. A corning what, we have
     the sort on the sternel of the store with shore that when the
     diractic and went to some one was the strench. A man hust to see
     the son who wished or his cold shorsed of the door, and three
     thrown one in the company of he had been a singrion would
     heard it that the sight was her sine with a can of the made,
     and there was a morning that. That was the moors of the morning
     had the dreak of them. That the did seen and seen that what he
     seen the papers were in the said which which take the perpor and and
     assorting and held. He was now, said he at the sail that there is
     her and side--only was heard and hould the moor, the carrage when
     I shall be some treet. I am this her thought in the most had heard a
     pearous wa

In [60]:
checkpoint = 'checkpoints/i1200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i1200_l512.ckpt
Farryer.

     "It is this price that if you have say as you will be."

     "Well, then?" He comes. "It would have how that you ask to any
     sound. I am since here there are not interesting to be disappied,
     we have no security what we have a man who should be all that you
     were that?"

     "I have been comes what I have alragade all anything whose impression was
     already asked at the person in my head too, and his body arred
     to-thir one of the compression of the problem and took the passage and his
     hand, when there was the conclearently so to the matter. How could
     they had thought in the crime, but he was starting to bar me to
     any minutes of marriage of the persencion to this shoulder. The
     stretches were the case and the silence, and the silent was stole
     in his present."

     "At the man?"

     "And you say where were shilled," said the short and had threw
     the co