# Anna KaRNNa

In this notebook, we'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.

In [2]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = sorted(set(text))
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

Let's check out the first 100 characters, make sure everything is peachy. According to the [American Book Review](http://americanbookreview.org/100bestlines.asp), this is the 6th best first line of a book ever.

In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

And we can see the characters encoded as integers.

In [4]:
encoded[:100]

array([31, 64, 57, 72, 76, 61, 74,  1, 16,  0,  0,  0, 36, 57, 72, 72, 81,
        1, 62, 57, 69, 65, 68, 65, 61, 75,  1, 57, 74, 61,  1, 57, 68, 68,
        1, 57, 68, 65, 67, 61, 26,  1, 61, 78, 61, 74, 81,  1, 77, 70, 64,
       57, 72, 72, 81,  1, 62, 57, 69, 65, 68, 81,  1, 65, 75,  1, 77, 70,
       64, 57, 72, 72, 81,  1, 65, 70,  1, 65, 76, 75,  1, 71, 79, 70,  0,
       79, 57, 81, 13,  0,  0, 33, 78, 61, 74, 81, 76, 64, 65, 70], dtype=int32)

Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text.  Here's how many 'classes' our network has to pick from.

In [5]:
len(vocab)

83

## Making training mini-batches

Here is where we'll make our mini-batches for training. Remember that we want our batches to be multiple sequences of some desired number of sequence steps. Considering a simple example, our batches would look like this:

<img src="assets/sequence_batching@1x.png" width=500px>


<br>
We have our text encoded as integers as one long array in `encoded`. Let's create a function that will give us an iterator for our batches. I like using [generator functions](https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/) to do this. Then we can pass `encoded` into this function and get our batch generator.

The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the number of batches we can make from some array `arr`, you divide the length of `arr` by the batch size. Once you know the number of batches and the batch size, you can get the total number of characters to keep.

After that, we need to split `arr` into $N$ sequences. You can do this using `arr.reshape(size)` where `size` is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (`n_seqs` below), let's make that the size of the first dimension. For the second dimension, you can use `-1` as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$ where $K$ is the number of batches.

Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the array. For each subsequent batch, the window moves over by `n_steps`. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. You'll usually see the first input character used as the last target character, so something like this:
```python
y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
```
where `x` is the input batch and `y` is the target batch.

The way I like to do this window is use `range` to take steps of size `n_steps` from $0$ to `arr.shape[1]`, the total number of steps in each sequence. That way, the integers you get from `range` always point to the start of a batch, and each window is `n_steps` wide.

> **Exercise:** Write the code for creating batches in the function below. The exercises in this notebook _will not be easy_. I've provided a notebook with solutions alongside this notebook. If you get stuck, checkout the solutions. The most important thing is that you don't copy and paste the code into here, **type out the solution code yourself.**

In [6]:
def get_batches(arr, n_seqs, n_steps):
    '''Create a generator that returns batches of size
       n_seqs x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       n_seqs: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    characters_per_batch = n_seqs * n_steps
    n_batches = len(arr) // characters_per_batch
    
    # Keep only enough characters to make full batches
    arr = arr[:(n_batches * characters_per_batch)]
    
    # Reshape into n_seqs rows
    arr = arr.reshape((n_seqs, -1))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:(n+n_steps)]
        # The targets, shifted by one
        y = np.zeros_like(x)
        y[:, :-1] = x[:, 1:]
        y[:, -1] = x[:, 0]  
        # solution says we can use first character of every mini-sequence as terminating char, 
        # but I am not sure why it's ok.
        yield x, y

Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.

In [7]:
encoded.shape

(1985223,)

In [8]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)

In [9]:
# get_batches shouldn't mess up the shape of our original array 'encoded'. sanity check:
encoded.shape

(1985223,)

In [10]:
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])

x
 [[31 64 57 72 76 61 74  1 16  0]
 [ 1 57 69  1 70 71 76  1 63 71]
 [78 65 70 13  0  0  3 53 61 75]
 [70  1 60 77 74 65 70 63  1 64]
 [ 1 65 76  1 65 75 11  1 75 65]
 [ 1 37 76  1 79 57 75  0 71 70]
 [64 61 70  1 59 71 69 61  1 62]
 [26  1 58 77 76  1 70 71 79  1]
 [76  1 65 75 70  7 76 13  1 48]
 [ 1 75 57 65 60  1 76 71  1 64]]

y
 [[64 57 72 76 61 74  1 16  0  0]
 [57 69  1 70 71 76  1 63 71 65]
 [65 70 13  0  0  3 53 61 75 11]
 [ 1 60 77 74 65 70 63  1 64 65]
 [65 76  1 65 75 11  1 75 65 74]
 [37 76  1 79 57 75  0 71 70 68]
 [61 70  1 59 71 69 61  1 62 71]
 [ 1 58 77 76  1 70 71 79  1 75]
 [ 1 65 75 70  7 76 13  1 48 64]
 [75 57 65 60  1 76 71  1 64 61]]


If you implemented `get_batches` correctly, the above output should look something like 
```
x
 [[55 63 69 22  6 76 45  5 16 35]
 [ 5 69  1  5 12 52  6  5 56 52]
 [48 29 12 61 35 35  8 64 76 78]
 [12  5 24 39 45 29 12 56  5 63]
 [ 5 29  6  5 29 78 28  5 78 29]
 [ 5 13  6  5 36 69 78 35 52 12]
 [63 76 12  5 18 52  1 76  5 58]
 [34  5 73 39  6  5 12 52 36  5]
 [ 6  5 29 78 12 79  6 61  5 59]
 [ 5 78 69 29 24  5  6 52  5 63]]

y
 [[63 69 22  6 76 45  5 16 35 35]
 [69  1  5 12 52  6  5 56 52 29]
 [29 12 61 35 35  8 64 76 78 28]
 [ 5 24 39 45 29 12 56  5 63 29]
 [29  6  5 29 78 28  5 78 29 45]
 [13  6  5 36 69 78 35 52 12 43]
 [76 12  5 18 52  1 76  5 58 52]
 [ 5 73 39  6  5 12 52 36  5 78]
 [ 5 29 78 12 79  6 61  5 59 63]
 [78 69 29 24  5  6 52  5 63 76]]
 ```
 although the exact numbers will be different. Check to make sure the data is shifted over one step for `y`.

## Building the model

Below is where you'll build the network. We'll break it up into parts so it's easier to reason about each bit. Then we can connect them up into the whole network.

<img src="assets/charRNN.png" width=500px>


### Inputs

First off we'll create our input placeholders. As usual we need placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called `keep_prob`. This will be a scalar, that is a 0-D tensor. To make a scalar, you create a placeholder without giving it a size.

> **Exercise:** Create the input placeholders in the function below.

In [11]:
def build_inputs(batch_size, num_steps):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(tf.int32, shape=(batch_size,num_steps))
    targets = tf.placeholder(tf.int32, shape=(batch_size,num_steps))
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32)
    
    return inputs, targets, keep_prob

### LSTM Cell

Here we will create the LSTM cell we'll use in the hidden layer. We'll use this cell as a building block for the RNN. So we aren't actually defining the RNN here, just the type of cell we'll use in the hidden layer.

We first create a basic LSTM cell with

```python
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
```

where `num_units` is the number of units in the hidden layers in the cell. Then we can add dropout by wrapping it with 

```python
tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
```
You pass in a cell and it will automatically add dropout to the inputs or outputs. Finally, we can stack up the LSTM cells into layers with [`tf.contrib.rnn.MultiRNNCell`](https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/contrib/rnn/MultiRNNCell). With this, you pass in a list of cells and it will send the output of one cell into the next cell. Previously with TensorFlow 1.0, you could do this

```python
tf.contrib.rnn.MultiRNNCell([cell]*num_layers)
```

This might look a little weird if you know Python well because this will create a list of the same `cell` object. However, TensorFlow 1.0 will create different weight matrices for all `cell` objects. But, starting with TensorFlow 1.1 you actually need to create new cell objects in the list. To get it to work in TensorFlow 1.1, it should look like

```python
def build_cell(num_units, keep_prob):
    lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
    return drop
    
tf.contrib.rnn.MultiRNNCell([build_cell(num_units, keep_prob) for _ in range(num_layers)])
```

Even though this is actually multiple LSTM cells stacked on each other, you can treat the multiple layers as one cell.

We also need to create an initial cell state of all zeros. This can be done like so

```python
initial_state = cell.zero_state(batch_size, tf.float32)
```

Below, we implement the `build_lstm` function to create these LSTM cells and the initial state.

In [12]:
def build_cell(lstm_size, keep_prob):
    ### Build the LSTM Cell
    
    # Use a basic LSTM cell
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
    # Add dropout to the cell outputs
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    return drop

def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''

    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size,keep_prob) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

### RNN Output

Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character, so we want this layer to have size $C$, the number of classes/characters we have in our text.

If our input has batch size $N$, number of steps $M$, and the hidden layer has $L$ hidden units, then the output is a 3D tensor with size $N \times M \times L$. The output of each LSTM cell has size $L$, we have $M$ of them, one for each sequence step, and we have $N$ sequences. So the total size is $N \times M \times L$. 

We are using the same fully connected layer, the same weights, for each of the outputs. Then, to make things easier, we should reshape the outputs into a 2D tensor with shape $(M * N) \times L$. That is, one row for each sequence and step, where the values of each row are the output from the LSTM cells. We get the LSTM output as a list, `lstm_output`. First we need to concatenate this whole list into one array with [`tf.concat`](https://www.tensorflow.org/api_docs/python/tf/concat). Then, reshape it (with `tf.reshape`) to size $(M * N) \times L$.

One we have the outputs reshaped, we can do the matrix multiplication with the weights. We need to wrap the weight and bias variables in a variable scope with `tf.variable_scope(scope_name)` because there are weights being created in the LSTM cells. TensorFlow will throw an error if the weights created here have the same names as the weights created in the LSTM cells, which they will be default. To avoid this, we wrap the variables in a variable scope so we can give them unique names.

> **Exercise:** Implement the output layer in the function below.

In [13]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        lstm_output: List of output tensors from the LSTM layer
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # Concatenate lstm_output over axis 1 (the columns)
    seq_output = tf.concat(lstm_output, axis=1)
    # Reshape seq_output to a 2D tensor with lstm_size columns
    x = tf.reshape(seq_output, shape=(-1, in_size))
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        # Create the weight and bias variables here
        softmax_w = tf.Variable(
            tf.truncated_normal(
                (in_size, out_size), 
                mean=0.0, 
                stddev=(1.0 / np.sqrt(out_size))
            )
        )
        softmax_b = tf.Variable(tf.zeros(out_size))
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.add(tf.matmul(x, softmax_w), softmax_b)
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits, name='predictions')
    
    return out, logits

### Training loss

Next up is the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First we need to one-hot encode the targets, we're getting them as encoded characters. Then, reshape the one-hot targets so it's a 2D tensor with size $(M*N) \times C$ where $C$ is the number of classes/characters we have. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with $C$ units. So our logits will also have size $(M*N) \times C$.

Then we run the logits and targets through `tf.nn.softmax_cross_entropy_with_logits` and find the mean to get the loss.

>**Exercise:** Implement the loss calculation in the function below.

In [14]:
def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per sequence per step
    y_one_hot = tf.one_hot(targets, num_classes)
    y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
    
    # Softmax cross entropy loss
    cross_entropy_terms = tf.nn.softmax_cross_entropy_with_logits(
        labels=y_reshaped, 
        logits=logits)
    loss = tf.reduce_mean(cross_entropy_terms)
    
    return loss

### Optimizer

Here we build the optimizer. Normal RNNs have have issues gradients exploding and disappearing. LSTMs fix the disappearance problem, but the gradients can still grow without bound. To fix this, we can clip the gradients above some threshold. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.

In [15]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

### Build the network

Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/nn/dynamic_rnn). This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as `final_state` so we can pass it to the first LSTM cell in the the next mini-batch run. For `tf.nn.dynamic_rnn`, we pass in the cell and initial state we get from `build_lstm`, as well as our input sequences. Also, we need to one-hot encode the inputs before going into the RNN. 

> **Exercise:** Use the functions you've implemented previously and `tf.nn.dynamic_rnn` to build the network.

In [16]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps
            
        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)

        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs, num_classes)
        
        # Run each sequence step through the RNN with tf.nn.dynamic_rnn 
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss = build_loss(self.logits, self.targets, lstm_size, num_classes)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)

## Hyperparameters

Here are the hyperparameters for the network.

* `batch_size` - Number of sequences running through the network in one pass.
* `num_steps` - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.
* `lstm_size` - The number of units in the hidden layers.
* `num_layers` - Number of hidden LSTM layers to use
* `learning_rate` - Learning rate for training
* `keep_prob` - The dropout keep probability when training. If you're network is overfitting, try decreasing this.

Here's some good advice from Andrej Karpathy on training the network. I'm going to copy it in here for your benefit, but also link to [where it originally came from](https://github.com/karpathy/char-rnn#tips-and-tricks).

> ## Tips and Tricks

>### Monitoring Validation Loss vs. Training Loss
>If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:

> - If your training loss is much lower than validation loss then this means the network might be **overfitting**. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
> - If your training/validation loss are about equal then your model is **underfitting**. Increase the size of your model (either number of layers or the raw number of neurons per layer)

> ### Approximate number of parameters

> The two most important parameters that control the model are `lstm_size` and `num_layers`. I would advise that you always use `num_layers` of either 2/3. The `lstm_size` can be adjusted based on how much data you have. The two important quantities to keep track of here are:

> - The number of parameters in your model. This is printed when you start training.
> - The size of your dataset. 1MB file is approximately 1 million characters.

>These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:

> - I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make `lstm_size` larger.
> - I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.

> ### Best models strategy

>The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.

>It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.

>By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.

In [17]:
batch_size = 100        # Sequences per batch
num_steps = 100         # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001   # Learning rate
keep_prob = 0.5         # Dropout keep probability

## Time for training

This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I save a checkpoint.

Here I'm saving checkpoints with the format

`i{iteration number}_l{# hidden layer units}.ckpt`

> **Exercise:** Set the hyperparameters above to train the network. Watch the training loss, it should be consistently dropping. Also, I highly advise running this on a GPU.

In [48]:
from IPython import display

In [49]:
epochs = 20
# Save every N iterations
save_every_n = 200

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            
            end = time.time()
            print('Epoch: {}/{}... '.format(e+1, epochs),
                  'Training Step: {}... '.format(counter),
                  'Training loss: {:.4f}... '.format(batch_loss),
                  '{:.4f} sec/batch'.format((end-start)))
            display.clear_output(wait=True)
            
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}_d{}.ckpt".format(counter, lstm_size, num_layers))
    
    saver.save(sess, "checkpoints/i{}_l{}_d{}.ckpt".format(counter, lstm_size, num_layers))

Epoch: 20/20...  Training Step: 3960...  Training loss: 1.2068...  0.4763 sec/batch


#### Saved checkpoints

Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables

In [18]:
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/i3960_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i2200_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i2400_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i2600_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i2800_l512_d2.ckpt"
all_model_checkpoint_paths: "checkpoints/i3000_l512_d2.ckpt"
all_model_checkpoint_paths: "chec

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [19]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [20]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The ", n=5):
    samples = [c for c in prime]
    
    model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True) 
    
    saver = tf.train.Saver()
    with tf.Session() as sess:
        
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab), n)
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

Here, pass in the path to a checkpoint and sample from the network.

In [21]:
tf.train.latest_checkpoint('checkpoints')

'checkpoints/i3960_l512_d2.ckpt'

In [22]:
from tensorflow.python.tools.inspect_checkpoint import print_tensors_in_checkpoint_file
print_tensors_in_checkpoint_file(file_name=tf.train.latest_checkpoint('checkpoints'), tensor_name='', all_tensors=True)

tensor_name:  rnn/multi_rnn_cell/cell_0/basic_lstm_cell/biases/Adam_1
[  2.98701752e-09   1.12818794e-08   3.49353768e-09 ...,   2.89313040e-09
   2.45372478e-09   2.58292898e-09]
tensor_name:  rnn/multi_rnn_cell/cell_1/basic_lstm_cell/weights
[[ 0.06996887  0.11649907  0.18184812 ...,  0.26486248 -0.08052508
  -0.1132154 ]
 [-0.01522851  0.04797746 -0.11067834 ...,  0.07246272  0.26834911
   0.05272453]
 [ 0.30675411 -0.13460676  0.27785024 ...,  0.37532464 -0.05215592
  -0.10368945]
 ..., 
 [-0.21007143 -0.01034986  0.00191667 ...,  0.18761837 -0.03268498
  -0.15263623]
 [-0.08051489  0.26401338 -0.26916745 ..., -0.01368065 -0.28915605
  -0.23917878]
 [ 0.06516229 -0.56173038 -0.09796987 ..., -0.1699965  -0.01801279
   0.04343418]]
tensor_name:  softmax/Variable_1/Adam
[ -7.00336386e-05   1.47326544e-04   5.58796601e-05   3.78345751e-04
   2.26013640e-06   2.99536305e-06   2.78044513e-06   1.52085515e-04
  -9.17639991e-05  -5.69082149e-05   8.29187502e-06   3.51653493e-04
  -8.971733

In [23]:
vocab

['\n',
 ' ',
 '!',
 '"',
 '$',
 '%',
 '&',
 "'",
 '(',
 ')',
 '*',
 ',',
 '-',
 '.',
 '/',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 ';',
 '?',
 '@',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'T',
 'U',
 'V',
 'W',
 'X',
 'Y',
 'Z',
 '_',
 '`',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [24]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

Fartian one who saw
a man an influence to any hanrs, and as she could not set to the cares
and altalking, which was a capal of her husband and to and wonder.

"I arrived, better not see."

"Yes, by his own death. I don't know that in the morning,
and that it would be drearing," she thought, "the cheek to say to
them, but if they're beating in it all to the provincial country."

"Oh, what has the countess? If you know why it's a letter to discus it
will the bear to say to."

"Yes, and I could disappoint you and said."

"You should be in them. If it was a lottle to my servecces, and I have sat
down to the same, but the desire it's not for you. Yes; but there was
no luncting their singers."

"Only this you're so awful! His sister all so much a fearful from the
country. How imagined it was anything that tears askates her that I was so
mictrificully before all the teachness in his opinion, and all the same
world, the country and househ, and then, why shall be to be seen to be
strengthed fro

In [25]:
checkpoint = 'checkpoints/i200_l512_d2.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farr afet hes thers and sot al sath tho sho cimesens, he sas tond. I tin that ans he thim had the wars ot an ton sas than aret and hard won he her and ant he tishe alsatid the sadile sose thon has areras oud to the here as of him he and ant te he wothe sand,. As ad avin tha sil he whens thit the
shot an was he sint af the corte har wast the hin ho the serensing on than her astit the sat if sar tast hit he
ponse whas as on attheres ang had ser to whet ar ot hored the wothe that, wad ho shathe sores afitig hard ond tit he tin the we he her sat he sis oud
ans ansensint on hom whe so he wing at ande so fiting the thar tho he whing he tosent wha to toth, whit ho thin the sis him seras hans hesd ang thos the
pad soste wore sont won the talt on whe he son thers this ale he whr sins hed tor thet thit thet he was the thith ses ond and he cime an shand wos the hat she sosen the hesese the
sering of has alensing th mon himer, thes hus ther we tans ofrer thand the core to man th the the shan ther 

In [26]:
checkpoint = 'checkpoints/i600_l512_d2.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farlings
antwernt the prone the were some to his his, woo disent to have had allided the could his had she he said, but his and this tomis and thought his with his which as
herped her that, he can trought his a merticaly and
with so that with the with so dister tarking the
comsint was her has sand the suppersticined has and she with a could bet ansertite into the seading and and the
marie and
him to
her aspined time and tran alled were hourd to she mad
time this he doult on tho
grothant him as that he with her had the mars, who chad so manding
was stidly hus
that he was not in she had
not is himping,
and her suppiseded to
comtrion, still to the conciturt and were to
her
said and heress of
all that so dowire hould and her his was ald still have, and the sumated ot the
room on
this heres, to alling had strook of him.

The his shis on thein woll to thim him.

Alexey Alexandrovitch she tome the comenter he seact harding, had a late him
her wouth a dount the counder onter on the promered
wa

In [42]:
checkpoint = 'checkpoints/i1200_l512_d2.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Arkady")
print(samp)

Arkady
had not always all her sont. Stepan Arkadyevitch went a talking was this so thinking as something one and
that he had been said to say
the point of a look and the chetter, and sooned the pleaser, and serenes, and world him
the sense of the streeng at his wife ald so minute as
the cried and the stort the
change to have been to be to the hers, and all the corvants to spile of the
plower,
how he crassing her feeling. And trutching the stard of the compleaner something wite with a succroms to take into his hands of the poorter to any he had been
trouble the close, and the contiction of a supprision of the servion," he said.

At their said of the servic of sectirity and he setited
that the prince was saying with him.

"What as a saw, that I'll make a good about him."

"I's the man about a latter about
it. I will to get. I deat the side of the same, and as stood with the word
tryes, be and moshoring official in the maderal and always.

At all the seeming then a caller seem, the parter

In [43]:
checkpoint = 'checkpoints/i3960_l512_d2.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Arkady")
print(samp)

Arkady's
friend. Anna said. She did not know how she had brought a longer
and been ten of the supper to hear, but he was always staying to all
sometimes that they were talking of her husband. Anna drew her
early for him far into the porter he heard a country, to take him
with the clarks of the book, he seated to all the change of the same
tires, and the point in three detation of the compention of the wine
of the same, who was no mistaken as he did anything at all in any
conscience at at once the conversation was nothing, and that when the
place in this temper and to began to go astachs to the chair.

She was still to care, but to brill to brought an entrance.

"This work, think on the thing I would be some more and delighted in
him; and I shall have a short of thir to make them along."

"What a pictitity?" said Anna, as though as too, she had said the sire
song which had say an idea, should be in some other province that
the closing the princess were to stop in the face with the man w

In [44]:
checkpoint = 'checkpoints/i3960_l512_d2.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(sorted(vocab)), prime="Arkady")
print(samp)

Arkady
Parlov troubled it walted and driven. He could not have busy
crossed, had all she wanted.



Chapter 32


When she sat day at having abroad in the people of these peasants, when he
saw the carpenter, they, taking it to her and take all the train of
this strange creet, that his brows and his blothes and a position was
that he had a subject, who had bad happened," thought Levin, which all
the sofal sense of home he had been said; but the strange world they
had been darked off it, and the contlasion of all he had never told
him to say.

She had bored in his hoste at the possible thought in them and a shrill
princess, he had been in the soul of the sound that had been already
about it.

"You do nothing as an inveridulted men with yourself, I can do asliev, I
say she has been inference," he said, smaking to him the strange forest
with her eyes.

"This is the manner of the children are the faces of a conversation,
but I'm very indifferent, but I'm not as to see her to think this way."

In [46]:
checkpoint = 'checkpoints/i3960_l512_d2.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(sorted(vocab)), prime="Arkady Dvorkovich")
print(samp)

Arkady Dvorkovich.

The praice had not alone with her, the feelings that she had been done what
he could not be desired, he forgotten a children without a least as he
would say, and that she came to the conversation, and he saw the must have
any soul, and that he had been desprised that their sente of the pleasure
he had never told me about the provinces. He had to choose their
sentement when the prince they were said: 
"No. You meant; but I shall have been a significance. I should be
nurse her and said, and this is not only to say in this sight...."

At the train of the crowd of his wife throwed the tears of the
conversation with him, and so he had not conved her face was so stopped.
He forgod that she had not said at that singer. And, after heart and
satisfaction, and a five clear and tone in the carriage, the cruminess
of his sorres had broken her husband's hones in the country, who was
the tried--that was that he does not see to the confise as she has
brought a country waiting abou

In [47]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Flynn", n=8)
print(samp)

Flynnas, was strayge
out of the corner into the conversation. She did not conceive it
her feeling of tender had thinked and should all think than enjoyment to
be answered with the memory of all the work that in answer is so strayge,
and that she was to do anything was to be a principle to see twan ever
in the care where he was to say to the table. Anna said, taking her
high spare at the steps with his sense from the chind, they came
out to the straight, some position he said.

"I didn't care, I will be done any one of this tooth it," said Stepan
Arkadyevitch, attaining his servant, and he see himself in the creations
of her eyes that there was no one was stronger than any harrows any
considerable parcy. He will gave him the steps of her life was a moment
and some man as she could not see her. The prince's sound of a feeling
of their means with the present that is already always barelings out
of his sense of theight. And this were she was not to say. He could
have been taking her fine s

In [32]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Anna", n=8)
print(samp)

Anna Arkadyevna
had been in the matcer, and he was streaming and any weariny of
their conversation. But a man of the composure, and to him, and he heard
the children, and he was sitting in the character of the presence, where
they answered that in hardly an intense fach, she had talked of them. But
she could not choose the crowde offer her to see this trivally,
and has an expression of subject than had come burden out of the
party. But when he could not see him with a smile of satisfaction,
bull in what was in society. And that he had not tried to be in the means
of this soul. The place he saw the service what something she was still
more that he was as that she seets on her son, he stolded his hand, and
his high raps, and turning away into her former, he shouted in the
sone that had been asked for the paint. He heard the priest saw that
she were set a letter and at the sight of the senses. And with her head,
and the memery, she had been angry."

"I shall be so saying a little with her

In [33]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="The Ritz", n=8)
print(samp)

The Ritzen was
comprehensible to him, and which would have been, to hear him in
a little thought how she went away from this wife indeed, than the other
house he was not stoping. He had seen her husband, she shoted his
suppirt as it was the conversation, which should be delighted in, who
had been so much sensible for his word is an offer. The princess and the
marshal of tendes he did not love him. He saw that the position of the
craps his beard had brought them. The continual chance, her eyes told
him and always despribed, and turned away to his strong, silence, almost
as he cared to come out.

"What do you tell her. As is she would be all happy in a solection." And
he had said that in his external time that he could not be set for him to
drave to the carriage and talked of the solution of the children,
so she was told him to stand, was a concert of sorries of soces,
was not introduced about him, began all the same time, stepsing at
the same, the confined she was a grief who was asking

In [34]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Lock", n=8)
print(samp)

Locking had said.

"All I stall at the previous door of their conversation."

"I am going away, you were all so more of my path."

"Where are you going?" she said, like a sense of thought of something,
to say why the share hundred of his sense interested him. She went out
of the rest, but at the strained hat at a law of his soficiam horses she
was said, saying have been a sign for that moment, he sent on his hair
as he could not be an answer. She did not, without spail in the side
shoule taking the birch second coacents to the servant, and, as though
he stupid her things of his his forese intented and he had never taken
in their present to the country, and while she could not see whom
he went on with the fearful than he was too, which had been talking
to him; but he could not have liked to say should not hear the cares
and the children and him.

Anna was that at the priest of the train and the pass, sitting out, he
had said:

"I don't listen to my hand and when I had been ill, brought 

In [35]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Soros", n=8)
print(samp)

Sorosthel and the memoliee of twity of the matter in the province.

That in the forms as though happy for her. The more holses would be to
but her his, the soul that the solitiou and mire, the playful his orrivility
of the same against they was that it was instead of his brother, the
marshal than the capital comparison, and so interested, the concert the
such assembire with something such contraly, as he was to spok, as they
were a long south face and sone of station, at all the pointing was the
same answers. The first she was, at a still and of a man who had
been done and had never seen in all the carriage, he could be dinner, and
which would have said the case of the praything the peasants had not been
sure to bring in.
 Anna had struggled their porter to his brow, as he was talking
of a sort of life.

"The merchings were the children, what was tired?"

"I ask the memories of this first and anyone."

"I wanted to say, but when this, that I had said to you; that's a
minute, and I'd be

In [36]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Killary", n=8)
print(samp)

Killary!

She had stopped so much to be sure to give him her husband, and had
consinted her.

He choused that she was not so terrible to she was simply because he was
an action again.

They went up to his fingers, which shrigged him, an instant
sening to shut her.

That indeplation was the same the children and the bed of this action
when he had still seen her, had alranded, and that he was all about the
recaltial strangers, when he should have sent her all that had carried
towards the thanger. The samaly shame, and his face were sotting over
his brother of the service, seeing these portraits where he sat something
will be at once always to say, and to say it with her secrets, as though
to dream it were sitting over two of a country at a prince, he saw at the
same tender. He was already and started in her feelings in the
subject, he forgot whech he was already and was not too. At the stable
always dare and strunger that having seemed to have said that he came
over the communication wit

In [37]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Wtf", n=8)
print(samp)

Wtfor her sentary said how it was something was so as to take about a change
to be asking," said Stepan Arkadyevitch, waiting for his heart, and
tood his shirt will that struck her, and the peasant came out there was
the prince, she had been so left, as that in the mother and his brether
he could not say to how at anyway.

"And I could say that you are not a singing than he's to consert. It was
something she could not have said whether he walked to the teacher and
his wife's answer, ther as the conversation to her heart, they have
to see the same thing, to tear the party as the corner and woman in
the same time in spite of as though they had been too and allowed to bring
him. They't heary that has thought of a sint, that she had been trying
to stop the chain. She did not come to the tears.

"That was a little face. He can't think of your husband with him, but it
could say that your serfords. There's all her mother in a shade of
such as they he has a long been to distrable impossible an