# Anna KaRNNa

In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [13]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use.

In [38]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

In [39]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [40]:
chars[:100]

array([ 8,  9, 51, 77,  3, 18, 10, 37, 71, 11, 11, 11, 56, 51, 77, 77, 79,
       37, 25, 51, 30, 62, 41, 62, 18, 53, 37, 51, 10, 18, 37, 51, 41, 41,
       37, 51, 41, 62, 35, 18, 82, 37, 18, 38, 18, 10, 79, 37,  2, 73,  9,
       51, 77, 77, 79, 37, 25, 51, 30, 62, 41, 79, 37, 62, 53, 37,  2, 73,
        9, 51, 77, 77, 79, 37, 62, 73, 37, 62,  3, 53, 37, 23, 81, 73, 11,
       81, 51, 79, 12, 11, 11, 65, 38, 18, 10, 79,  3,  9, 62, 73], dtype=int32)

Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.

Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.

The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set.

In [41]:
def split_data(chars, batch_size, num_steps, split_frac=0.9):
    """ 
    Split character data into training and validation sets, inputs and targets for each set.
    
    Arguments
    ---------
    chars: character array
    batch_size: Size of examples in each of batch
    num_steps: Number of sequence steps to keep in the input and pass to the network
    split_frac: Fraction of batches to keep in the training set
    
    
    Returns train_x, train_y, val_x, val_y
    """
    
    slice_size = batch_size * num_steps
    n_batches = int(len(chars) / slice_size)
    
    # Drop the last few characters to make only full batches
    x = chars[: n_batches*slice_size]
    y = chars[1: n_batches*slice_size + 1]
    
    # Split the data into batch_size slices, then stack them into a 2D matrix 
    x = np.stack(np.split(x, batch_size))
    y = np.stack(np.split(y, batch_size))
    
    # Now x and y are arrays with dimensions batch_size x n_batches*num_steps
    
    # Split into training and validation sets, keep the virst split_frac batches for training
    split_idx = int(n_batches*split_frac)
    train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
    val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
    
    return train_x, train_y, val_x, val_y

In [42]:
train_x, train_y, val_x, val_y = split_data(chars, 10, 200)

In [43]:
train_x.shape

(10, 178400)

In [44]:
train_x[:,:10]

array([[ 8,  9, 51, 77,  3, 18, 10, 37, 71, 11],
       [54, 73, 46, 37,  9, 18, 37, 30, 23, 38],
       [37, 63, 51,  3, 63,  9, 62, 73, 13, 37],
       [23,  3,  9, 18, 10, 37, 81, 23,  2, 41],
       [37,  3,  9, 18, 37, 41, 51, 73, 46,  0],
       [37, 69,  9, 10, 23,  2, 13,  9, 37, 41],
       [ 3, 37,  3, 23, 11, 46, 23, 12, 11, 11],
       [23, 37,  9, 18, 10, 53, 18, 41, 25,  4],
       [ 9, 51,  3, 37, 62, 53, 37,  3,  9, 18],
       [18, 10, 53, 18, 41, 25, 37, 51, 73, 46]], dtype=int32)

I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch.

In [45]:
def get_batch(arrs, num_steps):
    batch_size, slice_size = arrs[0].shape
    
    n_batches = int(slice_size/num_steps)
    for b in range(n_batches):
        yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]

In [55]:
def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,
              learning_rate=0.001, grad_clip=5, sampling=False):
        
    if sampling == True:
        batch_size, num_steps = 1, 1

    tf.reset_default_graph()
    
    # Declare placeholders we'll feed into the graph
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
        x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')
    
    with tf.name_scope('targets'):
        targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
        y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')
        y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])
    
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    # Build the RNN layers
    with tf.name_scope("RNN_cells"):
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)
    
    with tf.name_scope("RNN_init_state"):
        initial_state = cell.zero_state(batch_size, tf.float32)

    # Run the data through the RNN layers
    with tf.name_scope("RNN_forward"):
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=initial_state)
    
    final_state = state
    
    # Reshape output so it's a bunch of rows, one row for each cell output
    with tf.name_scope('sequence_reshape'):
        seq_output = tf.concat(outputs, axis=1,name='seq_output')
        output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')
    
    # Now connect the RNN outputs to a softmax layer and calculate the cost
    with tf.name_scope('logits'):
        # my own experiment:
#        softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),
#                               name='softmax_w')
        softmax_w = tf.Variable(tf.random_uniform((lstm_size, num_classes), name='softmax_w'))
        softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')
        logits = tf.matmul(output, softmax_w) + softmax_b
        tf.summary.histogram('softmax_w', softmax_w)
        tf.summary.histogram('softmax_b', softmax_b)

    with tf.name_scope('predictions'):
        preds = tf.nn.softmax(logits, name='predictions')
        tf.summary.histogram('predictions', preds)
    
    with tf.name_scope('cost'):
        loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')
        cost = tf.reduce_mean(loss, name='cost')
        tf.summary.scalar('cost', cost)

    # Optimizer for training, using gradient clipping to control exploding gradients
    with tf.name_scope('train'):
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
        train_op = tf.train.AdamOptimizer(learning_rate)
        optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    merged = tf.summary.merge_all()
    
    # Export the nodes 
    export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',
                    'keep_prob', 'cost', 'preds', 'optimizer', 'merged']
    Graph = namedtuple('Graph', export_nodes)
    local_dict = locals()
    graph = Graph(*[local_dict[each] for each in export_nodes])
    
    return graph

## Hyperparameters

Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability.

In [56]:
batch_size = 100
num_steps = 10
lstm_size = 10
num_layers = 2
learning_rate = 0.001

## Training

Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint.

In [57]:
!mkdir -p checkpoints/anna

In [58]:
epochs = 1
save_every_n = 100
train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)

model = build_rnn(len(vocab), 
                  batch_size=batch_size,
                  num_steps=num_steps,
                  learning_rate=learning_rate,
                  lstm_size=lstm_size,
                  num_layers=num_layers)

saver = tf.train.Saver(max_to_keep=100)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter('./logs/2/train', sess.graph)
    test_writer = tf.summary.FileWriter('./logs/2/test')
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/anna20.ckpt')
    
    n_batches = int(train_x.shape[1]/num_steps)
    iterations = n_batches * epochs
    for e in range(epochs):
        
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
            iteration = e*n_batches + b
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: 0.5,
                    model.initial_state: new_state}
            summary, batch_loss, new_state, _ = sess.run([model.merged, model.cost, 
                                                          model.final_state, model.optimizer], 
                                                          feed_dict=feed)
            loss += batch_loss
            end = time.time()
            print('Epoch {}/{} '.format(e+1, epochs),
                  'Iteration {}/{}'.format(iteration, iterations),
                  'Training loss: {:.4f}'.format(loss/b),
                  '{:.4f} sec/batch'.format((end-start)))
            
            train_writer.add_summary(summary, iteration)
        
            if (iteration%save_every_n == 0) or (iteration == iterations):
                # Check performance, notice dropout has been set to 1
                val_loss = []
                new_state = sess.run(model.initial_state)
                for x, y in get_batch([val_x, val_y], num_steps):
                    feed = {model.inputs: x,
                            model.targets: y,
                            model.keep_prob: 1.,
                            model.initial_state: new_state}
                    summary, batch_loss, new_state = sess.run([model.merged, model.cost, 
                                                               model.final_state], feed_dict=feed)
                    val_loss.append(batch_loss)
                    
                test_writer.add_summary(summary, iteration)

                print('Validation loss:', np.mean(val_loss),
                      'Saving checkpoint!')
                #saver.save(sess, "checkpoints/anna/i{}_l{}_{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))

Epoch 1/1  Iteration 1/1786 Training loss: 4.4191 0.0591 sec/batch
Epoch 1/1  Iteration 2/1786 Training loss: 4.4171 0.0272 sec/batch
Epoch 1/1  Iteration 3/1786 Training loss: 4.4151 0.0302 sec/batch
Epoch 1/1  Iteration 4/1786 Training loss: 4.4134 0.0265 sec/batch
Epoch 1/1  Iteration 5/1786 Training loss: 4.4122 0.0261 sec/batch
Epoch 1/1  Iteration 6/1786 Training loss: 4.4102 0.0284 sec/batch
Epoch 1/1  Iteration 7/1786 Training loss: 4.4088 0.0296 sec/batch
Epoch 1/1  Iteration 8/1786 Training loss: 4.4077 0.0311 sec/batch
Epoch 1/1  Iteration 9/1786 Training loss: 4.4062 0.0321 sec/batch
Epoch 1/1  Iteration 10/1786 Training loss: 4.4046 0.0412 sec/batch
Epoch 1/1  Iteration 11/1786 Training loss: 4.4032 0.0294 sec/batch
Epoch 1/1  Iteration 12/1786 Training loss: 4.4014 0.0333 sec/batch
Epoch 1/1  Iteration 13/1786 Training loss: 4.4000 0.0271 sec/batch
Epoch 1/1  Iteration 14/1786 Training loss: 4.3984 0.0267 sec/batch
Epoch 1/1  Iteration 15/1786 Training loss: 4.3967 0.0275

Epoch 1/1  Iteration 126/1786 Training loss: 3.9756 0.0317 sec/batch
Epoch 1/1  Iteration 127/1786 Training loss: 3.9727 0.0333 sec/batch
Epoch 1/1  Iteration 128/1786 Training loss: 3.9698 0.0221 sec/batch
Epoch 1/1  Iteration 129/1786 Training loss: 3.9665 0.0261 sec/batch
Epoch 1/1  Iteration 130/1786 Training loss: 3.9632 0.0271 sec/batch
Epoch 1/1  Iteration 131/1786 Training loss: 3.9599 0.0257 sec/batch
Epoch 1/1  Iteration 132/1786 Training loss: 3.9566 0.0306 sec/batch
Epoch 1/1  Iteration 133/1786 Training loss: 3.9537 0.0317 sec/batch
Epoch 1/1  Iteration 134/1786 Training loss: 3.9507 0.0360 sec/batch
Epoch 1/1  Iteration 135/1786 Training loss: 3.9479 0.0313 sec/batch
Epoch 1/1  Iteration 136/1786 Training loss: 3.9452 0.0273 sec/batch
Epoch 1/1  Iteration 137/1786 Training loss: 3.9428 0.0291 sec/batch
Epoch 1/1  Iteration 138/1786 Training loss: 3.9399 0.0276 sec/batch
Epoch 1/1  Iteration 139/1786 Training loss: 3.9371 0.0249 sec/batch
Epoch 1/1  Iteration 140/1786 Trai

Epoch 1/1  Iteration 251/1786 Training loss: 3.7146 0.0217 sec/batch
Epoch 1/1  Iteration 252/1786 Training loss: 3.7129 0.0293 sec/batch
Epoch 1/1  Iteration 253/1786 Training loss: 3.7116 0.0274 sec/batch
Epoch 1/1  Iteration 254/1786 Training loss: 3.7100 0.0259 sec/batch
Epoch 1/1  Iteration 255/1786 Training loss: 3.7085 0.0267 sec/batch
Epoch 1/1  Iteration 256/1786 Training loss: 3.7071 0.0201 sec/batch
Epoch 1/1  Iteration 257/1786 Training loss: 3.7057 0.0206 sec/batch
Epoch 1/1  Iteration 258/1786 Training loss: 3.7043 0.0254 sec/batch
Epoch 1/1  Iteration 259/1786 Training loss: 3.7031 0.0259 sec/batch
Epoch 1/1  Iteration 260/1786 Training loss: 3.7016 0.0305 sec/batch
Epoch 1/1  Iteration 261/1786 Training loss: 3.7003 0.0268 sec/batch
Epoch 1/1  Iteration 262/1786 Training loss: 3.6993 0.0306 sec/batch
Epoch 1/1  Iteration 263/1786 Training loss: 3.6978 0.0309 sec/batch
Epoch 1/1  Iteration 264/1786 Training loss: 3.6965 0.0234 sec/batch
Epoch 1/1  Iteration 265/1786 Trai

Epoch 1/1  Iteration 371/1786 Training loss: 3.5795 0.0256 sec/batch
Epoch 1/1  Iteration 372/1786 Training loss: 3.5786 0.0302 sec/batch
Epoch 1/1  Iteration 373/1786 Training loss: 3.5778 0.0249 sec/batch
Epoch 1/1  Iteration 374/1786 Training loss: 3.5769 0.0254 sec/batch
Epoch 1/1  Iteration 375/1786 Training loss: 3.5760 0.0280 sec/batch
Epoch 1/1  Iteration 376/1786 Training loss: 3.5752 0.0272 sec/batch
Epoch 1/1  Iteration 377/1786 Training loss: 3.5743 0.0317 sec/batch
Epoch 1/1  Iteration 378/1786 Training loss: 3.5735 0.0295 sec/batch
Epoch 1/1  Iteration 379/1786 Training loss: 3.5726 0.0323 sec/batch
Epoch 1/1  Iteration 380/1786 Training loss: 3.5717 0.0247 sec/batch
Epoch 1/1  Iteration 381/1786 Training loss: 3.5709 0.0222 sec/batch
Epoch 1/1  Iteration 382/1786 Training loss: 3.5702 0.0211 sec/batch
Epoch 1/1  Iteration 383/1786 Training loss: 3.5695 0.0251 sec/batch
Epoch 1/1  Iteration 384/1786 Training loss: 3.5688 0.0193 sec/batch
Epoch 1/1  Iteration 385/1786 Trai

Epoch 1/1  Iteration 493/1786 Training loss: 3.4944 0.0252 sec/batch
Epoch 1/1  Iteration 494/1786 Training loss: 3.4937 0.0318 sec/batch
Epoch 1/1  Iteration 495/1786 Training loss: 3.4931 0.0233 sec/batch
Epoch 1/1  Iteration 496/1786 Training loss: 3.4925 0.0211 sec/batch
Epoch 1/1  Iteration 497/1786 Training loss: 3.4918 0.0277 sec/batch
Epoch 1/1  Iteration 498/1786 Training loss: 3.4912 0.0371 sec/batch
Epoch 1/1  Iteration 499/1786 Training loss: 3.4907 0.0305 sec/batch
Epoch 1/1  Iteration 500/1786 Training loss: 3.4902 0.0307 sec/batch
Validation loss: 3.10919 Saving checkpoint!
Epoch 1/1  Iteration 501/1786 Training loss: 3.4896 0.0239 sec/batch
Epoch 1/1  Iteration 502/1786 Training loss: 3.4891 0.0236 sec/batch
Epoch 1/1  Iteration 503/1786 Training loss: 3.4884 0.0242 sec/batch
Epoch 1/1  Iteration 504/1786 Training loss: 3.4879 0.0250 sec/batch
Epoch 1/1  Iteration 505/1786 Training loss: 3.4873 0.0220 sec/batch
Epoch 1/1  Iteration 506/1786 Training loss: 3.4866 0.0213 

Epoch 1/1  Iteration 614/1786 Training loss: 3.4347 0.0242 sec/batch
Epoch 1/1  Iteration 615/1786 Training loss: 3.4343 0.0294 sec/batch
Epoch 1/1  Iteration 616/1786 Training loss: 3.4338 0.0247 sec/batch
Epoch 1/1  Iteration 617/1786 Training loss: 3.4334 0.0201 sec/batch
Epoch 1/1  Iteration 618/1786 Training loss: 3.4329 0.0229 sec/batch
Epoch 1/1  Iteration 619/1786 Training loss: 3.4324 0.0212 sec/batch
Epoch 1/1  Iteration 620/1786 Training loss: 3.4319 0.0269 sec/batch
Epoch 1/1  Iteration 621/1786 Training loss: 3.4316 0.0269 sec/batch
Epoch 1/1  Iteration 622/1786 Training loss: 3.4312 0.0306 sec/batch
Epoch 1/1  Iteration 623/1786 Training loss: 3.4308 0.0313 sec/batch
Epoch 1/1  Iteration 624/1786 Training loss: 3.4303 0.0286 sec/batch
Epoch 1/1  Iteration 625/1786 Training loss: 3.4299 0.0320 sec/batch
Epoch 1/1  Iteration 626/1786 Training loss: 3.4294 0.0310 sec/batch
Epoch 1/1  Iteration 627/1786 Training loss: 3.4290 0.0266 sec/batch
Epoch 1/1  Iteration 628/1786 Trai

Epoch 1/1  Iteration 734/1786 Training loss: 3.3871 0.0266 sec/batch
Epoch 1/1  Iteration 735/1786 Training loss: 3.3868 0.0312 sec/batch
Epoch 1/1  Iteration 736/1786 Training loss: 3.3864 0.0278 sec/batch
Epoch 1/1  Iteration 737/1786 Training loss: 3.3860 0.0317 sec/batch
Epoch 1/1  Iteration 738/1786 Training loss: 3.3857 0.0258 sec/batch
Epoch 1/1  Iteration 739/1786 Training loss: 3.3853 0.0220 sec/batch
Epoch 1/1  Iteration 740/1786 Training loss: 3.3850 0.0249 sec/batch
Epoch 1/1  Iteration 741/1786 Training loss: 3.3847 0.0381 sec/batch
Epoch 1/1  Iteration 742/1786 Training loss: 3.3843 0.0297 sec/batch
Epoch 1/1  Iteration 743/1786 Training loss: 3.3840 0.0226 sec/batch
Epoch 1/1  Iteration 744/1786 Training loss: 3.3836 0.0267 sec/batch
Epoch 1/1  Iteration 745/1786 Training loss: 3.3833 0.0264 sec/batch
Epoch 1/1  Iteration 746/1786 Training loss: 3.3829 0.0219 sec/batch
Epoch 1/1  Iteration 747/1786 Training loss: 3.3826 0.0228 sec/batch
Epoch 1/1  Iteration 748/1786 Trai

Epoch 1/1  Iteration 853/1786 Training loss: 3.3508 0.0262 sec/batch
Epoch 1/1  Iteration 854/1786 Training loss: 3.3506 0.0361 sec/batch
Epoch 1/1  Iteration 855/1786 Training loss: 3.3503 0.0296 sec/batch
Epoch 1/1  Iteration 856/1786 Training loss: 3.3500 0.0417 sec/batch
Epoch 1/1  Iteration 857/1786 Training loss: 3.3497 0.0417 sec/batch
Epoch 1/1  Iteration 858/1786 Training loss: 3.3495 0.0257 sec/batch
Epoch 1/1  Iteration 859/1786 Training loss: 3.3493 0.0237 sec/batch
Epoch 1/1  Iteration 860/1786 Training loss: 3.3490 0.0239 sec/batch
Epoch 1/1  Iteration 861/1786 Training loss: 3.3488 0.0298 sec/batch
Epoch 1/1  Iteration 862/1786 Training loss: 3.3486 0.0296 sec/batch
Epoch 1/1  Iteration 863/1786 Training loss: 3.3483 0.0227 sec/batch
Epoch 1/1  Iteration 864/1786 Training loss: 3.3481 0.0304 sec/batch
Epoch 1/1  Iteration 865/1786 Training loss: 3.3478 0.0270 sec/batch
Epoch 1/1  Iteration 866/1786 Training loss: 3.3475 0.0243 sec/batch
Epoch 1/1  Iteration 867/1786 Trai

Epoch 1/1  Iteration 974/1786 Training loss: 3.3215 0.0431 sec/batch
Epoch 1/1  Iteration 975/1786 Training loss: 3.3213 0.0485 sec/batch
Epoch 1/1  Iteration 976/1786 Training loss: 3.3211 0.0255 sec/batch
Epoch 1/1  Iteration 977/1786 Training loss: 3.3209 0.0360 sec/batch
Epoch 1/1  Iteration 978/1786 Training loss: 3.3207 0.0269 sec/batch
Epoch 1/1  Iteration 979/1786 Training loss: 3.3204 0.0286 sec/batch
Epoch 1/1  Iteration 980/1786 Training loss: 3.3202 0.0247 sec/batch
Epoch 1/1  Iteration 981/1786 Training loss: 3.3199 0.0244 sec/batch
Epoch 1/1  Iteration 982/1786 Training loss: 3.3198 0.0245 sec/batch
Epoch 1/1  Iteration 983/1786 Training loss: 3.3196 0.0262 sec/batch
Epoch 1/1  Iteration 984/1786 Training loss: 3.3194 0.0253 sec/batch
Epoch 1/1  Iteration 985/1786 Training loss: 3.3191 0.0296 sec/batch
Epoch 1/1  Iteration 986/1786 Training loss: 3.3189 0.0225 sec/batch
Epoch 1/1  Iteration 987/1786 Training loss: 3.3187 0.0300 sec/batch
Epoch 1/1  Iteration 988/1786 Trai

Epoch 1/1  Iteration 1092/1786 Training loss: 3.2970 0.0326 sec/batch
Epoch 1/1  Iteration 1093/1786 Training loss: 3.2968 0.0281 sec/batch
Epoch 1/1  Iteration 1094/1786 Training loss: 3.2966 0.0289 sec/batch
Epoch 1/1  Iteration 1095/1786 Training loss: 3.2964 0.0287 sec/batch
Epoch 1/1  Iteration 1096/1786 Training loss: 3.2961 0.0289 sec/batch
Epoch 1/1  Iteration 1097/1786 Training loss: 3.2959 0.0376 sec/batch
Epoch 1/1  Iteration 1098/1786 Training loss: 3.2958 0.0293 sec/batch
Epoch 1/1  Iteration 1099/1786 Training loss: 3.2956 0.0359 sec/batch
Epoch 1/1  Iteration 1100/1786 Training loss: 3.2954 0.0306 sec/batch
Validation loss: 3.00382 Saving checkpoint!
Epoch 1/1  Iteration 1101/1786 Training loss: 3.2952 0.0273 sec/batch
Epoch 1/1  Iteration 1102/1786 Training loss: 3.2950 0.0287 sec/batch
Epoch 1/1  Iteration 1103/1786 Training loss: 3.2948 0.0265 sec/batch
Epoch 1/1  Iteration 1104/1786 Training loss: 3.2946 0.0314 sec/batch
Epoch 1/1  Iteration 1105/1786 Training loss: 

Epoch 1/1  Iteration 1214/1786 Training loss: 3.2743 0.0318 sec/batch
Epoch 1/1  Iteration 1215/1786 Training loss: 3.2742 0.0306 sec/batch
Epoch 1/1  Iteration 1216/1786 Training loss: 3.2741 0.0313 sec/batch
Epoch 1/1  Iteration 1217/1786 Training loss: 3.2739 0.0254 sec/batch
Epoch 1/1  Iteration 1218/1786 Training loss: 3.2738 0.0307 sec/batch
Epoch 1/1  Iteration 1219/1786 Training loss: 3.2737 0.0276 sec/batch
Epoch 1/1  Iteration 1220/1786 Training loss: 3.2735 0.0309 sec/batch
Epoch 1/1  Iteration 1221/1786 Training loss: 3.2734 0.0290 sec/batch
Epoch 1/1  Iteration 1222/1786 Training loss: 3.2732 0.0303 sec/batch
Epoch 1/1  Iteration 1223/1786 Training loss: 3.2731 0.0284 sec/batch
Epoch 1/1  Iteration 1224/1786 Training loss: 3.2729 0.0242 sec/batch
Epoch 1/1  Iteration 1225/1786 Training loss: 3.2728 0.0257 sec/batch
Epoch 1/1  Iteration 1226/1786 Training loss: 3.2726 0.0253 sec/batch
Epoch 1/1  Iteration 1227/1786 Training loss: 3.2725 0.0212 sec/batch
Epoch 1/1  Iteration

Epoch 1/1  Iteration 1334/1786 Training loss: 3.2549 0.0348 sec/batch
Epoch 1/1  Iteration 1335/1786 Training loss: 3.2548 0.0368 sec/batch
Epoch 1/1  Iteration 1336/1786 Training loss: 3.2546 0.0342 sec/batch
Epoch 1/1  Iteration 1337/1786 Training loss: 3.2544 0.0221 sec/batch
Epoch 1/1  Iteration 1338/1786 Training loss: 3.2543 0.0222 sec/batch
Epoch 1/1  Iteration 1339/1786 Training loss: 3.2541 0.0216 sec/batch
Epoch 1/1  Iteration 1340/1786 Training loss: 3.2540 0.0302 sec/batch
Epoch 1/1  Iteration 1341/1786 Training loss: 3.2538 0.0311 sec/batch
Epoch 1/1  Iteration 1342/1786 Training loss: 3.2537 0.0274 sec/batch
Epoch 1/1  Iteration 1343/1786 Training loss: 3.2535 0.0293 sec/batch
Epoch 1/1  Iteration 1344/1786 Training loss: 3.2534 0.0262 sec/batch
Epoch 1/1  Iteration 1345/1786 Training loss: 3.2533 0.0269 sec/batch
Epoch 1/1  Iteration 1346/1786 Training loss: 3.2532 0.0276 sec/batch
Epoch 1/1  Iteration 1347/1786 Training loss: 3.2530 0.0257 sec/batch
Epoch 1/1  Iteration

Epoch 1/1  Iteration 1454/1786 Training loss: 3.2375 0.0365 sec/batch
Epoch 1/1  Iteration 1455/1786 Training loss: 3.2373 0.0353 sec/batch
Epoch 1/1  Iteration 1456/1786 Training loss: 3.2372 0.0250 sec/batch
Epoch 1/1  Iteration 1457/1786 Training loss: 3.2371 0.0294 sec/batch
Epoch 1/1  Iteration 1458/1786 Training loss: 3.2369 0.0239 sec/batch
Epoch 1/1  Iteration 1459/1786 Training loss: 3.2367 0.0227 sec/batch
Epoch 1/1  Iteration 1460/1786 Training loss: 3.2366 0.0221 sec/batch
Epoch 1/1  Iteration 1461/1786 Training loss: 3.2365 0.0230 sec/batch
Epoch 1/1  Iteration 1462/1786 Training loss: 3.2363 0.0250 sec/batch
Epoch 1/1  Iteration 1463/1786 Training loss: 3.2361 0.0232 sec/batch
Epoch 1/1  Iteration 1464/1786 Training loss: 3.2360 0.0278 sec/batch
Epoch 1/1  Iteration 1465/1786 Training loss: 3.2358 0.0453 sec/batch
Epoch 1/1  Iteration 1466/1786 Training loss: 3.2357 0.0247 sec/batch
Epoch 1/1  Iteration 1467/1786 Training loss: 3.2356 0.0288 sec/batch
Epoch 1/1  Iteration

Epoch 1/1  Iteration 1575/1786 Training loss: 3.2207 0.0264 sec/batch
Epoch 1/1  Iteration 1576/1786 Training loss: 3.2206 0.0265 sec/batch
Epoch 1/1  Iteration 1577/1786 Training loss: 3.2204 0.0224 sec/batch
Epoch 1/1  Iteration 1578/1786 Training loss: 3.2202 0.0246 sec/batch
Epoch 1/1  Iteration 1579/1786 Training loss: 3.2201 0.0284 sec/batch
Epoch 1/1  Iteration 1580/1786 Training loss: 3.2200 0.0228 sec/batch
Epoch 1/1  Iteration 1581/1786 Training loss: 3.2198 0.0212 sec/batch
Epoch 1/1  Iteration 1582/1786 Training loss: 3.2197 0.0440 sec/batch
Epoch 1/1  Iteration 1583/1786 Training loss: 3.2196 0.0333 sec/batch
Epoch 1/1  Iteration 1584/1786 Training loss: 3.2194 0.0279 sec/batch
Epoch 1/1  Iteration 1585/1786 Training loss: 3.2193 0.0227 sec/batch
Epoch 1/1  Iteration 1586/1786 Training loss: 3.2191 0.0330 sec/batch
Epoch 1/1  Iteration 1587/1786 Training loss: 3.2190 0.0325 sec/batch
Epoch 1/1  Iteration 1588/1786 Training loss: 3.2189 0.0429 sec/batch
Epoch 1/1  Iteration

Epoch 1/1  Iteration 1698/1786 Training loss: 3.2053 0.0317 sec/batch
Epoch 1/1  Iteration 1699/1786 Training loss: 3.2052 0.0293 sec/batch
Epoch 1/1  Iteration 1700/1786 Training loss: 3.2051 0.0307 sec/batch
Validation loss: 2.90725 Saving checkpoint!
Epoch 1/1  Iteration 1701/1786 Training loss: 3.2050 0.0274 sec/batch
Epoch 1/1  Iteration 1702/1786 Training loss: 3.2049 0.0265 sec/batch
Epoch 1/1  Iteration 1703/1786 Training loss: 3.2048 0.0252 sec/batch
Epoch 1/1  Iteration 1704/1786 Training loss: 3.2047 0.0297 sec/batch
Epoch 1/1  Iteration 1705/1786 Training loss: 3.2046 0.0295 sec/batch
Epoch 1/1  Iteration 1706/1786 Training loss: 3.2044 0.0276 sec/batch
Epoch 1/1  Iteration 1707/1786 Training loss: 3.2043 0.0269 sec/batch
Epoch 1/1  Iteration 1708/1786 Training loss: 3.2042 0.0256 sec/batch
Epoch 1/1  Iteration 1709/1786 Training loss: 3.2041 0.0214 sec/batch
Epoch 1/1  Iteration 1710/1786 Training loss: 3.2040 0.0210 sec/batch
Epoch 1/1  Iteration 1711/1786 Training loss: 

In [59]:
tf.train.get_checkpoint_state('checkpoints/anna')

model_checkpoint_path: "checkpoints/anna/i178_l512_2.444.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i178_l512_2.444.ckpt"

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [60]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [61]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    prime = "Far"
    samples = [c for c in prime]
    model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

In [53]:
checkpoint = "checkpoints/anna/i3560_l512_1.122.ckpt"
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/anna/i3560_l512_1.122.ckpt
	 [[Node: save/RestoreV2_8 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_8/tensor_names, save/RestoreV2_8/shape_and_slices)]]

Caused by op 'save/RestoreV2_8', defined at:
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-53-91ad82b46673>", line 2, in <module>
    samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
  File "<ipython-input-52-a7ae04af7e97>", line 5, in sample
    saver = tf.train.Saver()
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1051, in __init__
    self.build()
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1081, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 675, in build
    restore_sequentially, reshape)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 402, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 242, in restore_op
    [spec.tensor.dtype])[0])
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 668, in restore_v2
    dtypes=dtypes, name=name)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/nusco/Applications/anaconda3/envs/tensorboard/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/anna/i3560_l512_1.122.ckpt
	 [[Node: save/RestoreV2_8 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_8/tensor_names, save/RestoreV2_8/shape_and_slices)]]


In [None]:
checkpoint = "checkpoints/anna/i200_l512_2.432.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

In [None]:
checkpoint = "checkpoints/anna/i600_l512_1.750.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

In [None]:
checkpoint = "checkpoints/anna/i1000_l512_1.484.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)