# Anna KaRNNa  安娜卡列尼娜

In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

在这本笔记本中，我将搭建一个以Anna Karenina训练的字符型RNN，这是我最喜欢的书之一。 它将能够根据书中的文本生成新的文本。

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

这个网络是基于Andrej Karpathy的[在RNN上发帖](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)和[在Torch中实现的](https://github.com/karpathy/炭RNN)。 另外，一些信息[r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html)和[Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow)在GitHub上。 以下是字符式RNN的一般架构。

<img src="assets/charseq.jpeg" width="500">

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use.

首先，我们将加载文本文件并将其转换为整数，以供我们的网络使用。

In [2]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [4]:
chars[:100]

array([82, 59, 43, 63, 34, 55,  5, 15, 44, 41, 41, 41, 30, 43, 63, 63, 40,
       15, 26, 43, 27, 31,  0, 31, 55, 39, 15, 43,  5, 55, 15, 43,  0,  0,
       15, 43,  0, 31, 72, 55,  6, 15, 55, 81, 55,  5, 40, 15, 56, 25, 59,
       43, 63, 63, 40, 15, 26, 43, 27, 31,  0, 40, 15, 31, 39, 15, 56, 25,
       59, 43, 63, 63, 40, 15, 31, 25, 15, 31, 34, 39, 15, 47, 50, 25, 41,
       50, 43, 40, 77, 41, 41, 13, 81, 55,  5, 40, 34, 59, 31, 25], dtype=int32)

Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.

现在我需要将数据分成批次，并进入训练集和验证集。 我应该在这里做一个测试，但我不会担心。我的测试将是如果网络可以生成新的文本。

Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.

这里我将同时进行输入和目标数组。 目标与输入相同，除了移动一个字符。 我也会删除最后一点数据，以便我只能完全完成批处理。

The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set.

这里的想法是制作一个2D矩阵，其中行数等于批次数。 每行将从字符数据中输入一个长连接的字符串。 我们将使用`split_frac`关键字将这些数据分解成训练集和验证集。 这将保留90％的批次在训练集中，另外10％在验证集。

In [6]:
def split_data(chars, batch_size, num_steps, split_frac=0.9):
    """ 
    Split character data into training and validation sets, inputs and targets for each set.
    
    Arguments
    ---------
    chars: character array
    batch_size: Size of examples in each of batch
    num_steps: Number of sequence steps to keep in the input and pass to the network
    split_frac: Fraction of batches to keep in the training set
    
    
    Returns train_x, train_y, val_x, val_y
    """
    
    
    slice_size = batch_size * num_steps
    n_batches = int(len(chars) / slice_size)
    
    # Drop the last few characters to make only full batches
    x = chars[: n_batches*slice_size]
    y = chars[1: n_batches*slice_size + 1]
    
    # Split the data into batch_size slices, then stack them into a 2D matrix 
    x = np.stack(np.split(x, batch_size))
    y = np.stack(np.split(y, batch_size))
    
    # Now x and y are arrays with dimensions batch_size x n_batches*num_steps
    
    # Split into training and validation sets, keep the virst split_frac batches for training
    split_idx = int(n_batches*split_frac)
    train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
    val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
    
    return train_x, train_y, val_x, val_y
print("done")

done


In [7]:
train_x, train_y, val_x, val_y = split_data(chars, 10, 200)

In [8]:
train_x.shape

(10, 178400)

In [9]:
train_x[:,:10]

array([[82, 59, 43, 63, 34, 55,  5, 15, 44, 41],
       [60, 25, 73, 15, 59, 55, 15, 27, 47, 81],
       [15, 19, 43, 34, 19, 59, 31, 25, 45, 15],
       [47, 34, 59, 55,  5, 15, 50, 47, 56,  0],
       [15, 34, 59, 55, 15,  0, 43, 25, 73, 54],
       [15, 28, 59,  5, 47, 56, 45, 59, 15,  0],
       [34, 15, 34, 47, 41, 73, 47, 77, 41, 41],
       [47, 15, 59, 55,  5, 39, 55,  0, 26, 80],
       [59, 43, 34, 15, 31, 39, 15, 34, 59, 55],
       [55,  5, 39, 55,  0, 26, 15, 43, 25, 73]], dtype=int32)

I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch.

我将编写另一个函数来从分割数据创建的数组中获取批次。 这里每个批处理都将是这些数组的滑动窗口，大小为`batch_size X num_steps`。 例如，如果我们希望我们的网络以100个字符的顺序进行训练，那么`num_steps = 100`。 对于下一批，我们将把这个窗口的下一个num_steps字符序列。 以这种方式，我们可以将批量加入网络，并且每个批次将继续通过单元格状态。

In [11]:
def get_batch(arrs, num_steps):
    batch_size, slice_size = arrs[0].shape
    
    n_batches = int(slice_size/num_steps)
    for b in range(n_batches):
        yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]
print("done")

done


为了让tensorboard中的图表看起来好看，我们下面把一些神经网络放到Namescope里面，这样看到网络部分就比较整洁了。

In [12]:
def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,
              learning_rate=0.001, grad_clip=5, sampling=False):
        
    if sampling == True:
        batch_size, num_steps = 1, 1

    tf.reset_default_graph()
    
        # Declare placeholders we'll feed into the graph
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
        x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')
    
    with tf.name_scope('targets'):
        targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
        y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')
        y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])
    
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    # Build the RNN layers
    with tf.name_scope("RNN_layers"):
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)
    
    with tf.name_scope("RNN_init_state"):
        initial_state = cell.zero_state(batch_size, tf.float32)

    # Run the data through the RNN layers
    with tf.name_scope("RNN_forward"):
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=initial_state)
    
    final_state = state
    
    # Reshape output so it's a bunch of rows, one row for each cell output
    with tf.name_scope('sequence_reshape'):
        seq_output = tf.concat(outputs, axis=1,name='seq_output')
        output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')
    
    # Now connect the RNN putputs to a softmax layer and calculate the cost
    with tf.name_scope('logits'):
        softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),
                               name='softmax_w')
        softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')
        logits = tf.matmul(output, softmax_w) + softmax_b
        tf.summary.histogram('softmax_w', softmax_w)
        tf.summary.histogram('softmax_b', softmax_b)

    with tf.name_scope('predictions'):
        preds = tf.nn.softmax(logits, name='predictions')
        tf.summary.histogram('predictions', preds)
    
    with tf.name_scope('cost'):
        loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')
        cost = tf.reduce_mean(loss, name='cost')
        tf.summary.scalar('cost', cost)

    # Optimizer for training, using gradient clipping to control exploding gradients
    with tf.name_scope('train'):
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
        train_op = tf.train.AdamOptimizer(learning_rate)
        optimizer = train_op.apply_gradients(zip(grads, tvars))
        
    merged = tf.summary.merge_all()

    # Export the nodes 
    export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',
                    'keep_prob', 'cost', 'preds', 'optimizer', 'merged']
    Graph = namedtuple('Graph', export_nodes)
    local_dict = locals()
    graph = Graph(*[local_dict[each] for each in export_nodes])
    
    return graph
print("done")

done


## Hyperparameters 超参数

Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability.

这里我定义了网络的超参数。 你以前可能没看过的两个是`lstm_size`和`num_layers`。 这些设置了LSTM层中隐藏单元的数量和LSTM层的数量。 当然，使这些更大的网络可以改善网络的性能，但是您必须注意过度配置。 如果您的验证损失远大于训练损失，您可能会过度配备。 降低网络的大小或减少丢失的概率。

In [13]:
batch_size = 100
num_steps = 100
lstm_size = 512
num_layers = 2
learning_rate = 0.001

## Write out the graph for TensorBoard 在Tensorboard中汇出Graph

In [14]:
model = build_rnn(len(vocab),
                  batch_size=batch_size,
                  num_steps=num_steps,
                  learning_rate=learning_rate,
                  lstm_size=lstm_size,
                  num_layers=num_layers)

with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    file_writer = tf.summary.FileWriter('./logs/1', sess.graph)

## Training 训练

Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint.

训练相当简单。 在这里我传递一些数据，并获得一个LSTM状态。 然后我将该状态传回网络，以便下一批可以继续上一批的状态。 并且频度（由`save_every_n`设置）我计算验证损失并保存一个检查点。

In [20]:
!mkdir -p checkpoints/anna
!mkdir -p logs/1

In [22]:
epochs = 10
save_every_n = 200
train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)

model = build_rnn(len(vocab), 
                  batch_size=batch_size,
                  num_steps=num_steps,
                  learning_rate=learning_rate,
                  lstm_size=lstm_size,
                  num_layers=num_layers)

saver = tf.train.Saver(max_to_keep=100)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter('./logs/1/train', sess.graph)
    test_writer = tf.summary.FileWriter('./logs/1/test')
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/anna20.ckpt')
    
    n_batches = int(train_x.shape[1]/num_steps)
    iterations = n_batches * epochs
    for e in range(epochs):
        
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
            iteration = e*n_batches + b
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: 0.5,
                    model.initial_state: new_state}
            summary, batch_loss, new_state, _ = sess.run([model.merged, model.cost, 
                                                 model.final_state, model.optimizer], 
                                                 feed_dict=feed)
            loss += batch_loss
            end = time.time()
            print('Epoch {}/{} '.format(e+1, epochs),
                  'Iteration {}/{}'.format(iteration, iterations),
                  'Training loss: {:.4f}'.format(loss/b),
                  '{:.4f} sec/batch'.format((end-start)))
            
            train_writer.add_summary(summary, iteration)
            
            if (iteration%save_every_n == 0) or (iteration == iterations):
                # Check performance, notice dropout has been set to 1
                val_loss = []
                new_state = sess.run(model.initial_state)
                for x, y in get_batch([val_x, val_y], num_steps):
                    feed = {model.inputs: x,
                            model.targets: y,
                            model.keep_prob: 1.,
                            model.initial_state: new_state}
                    batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)
                    val_loss.append(batch_loss)
                
                test_writer.add_summary(summary, iteration)

                print('Validation loss:', np.mean(val_loss),
                      'Saving checkpoint!')
                saver.save(sess, "checkpoints/anna/i{}_l{}_{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))

Epoch 1/10  Iteration 1/1780 Training loss: 4.4177 0.1362 sec/batch
Epoch 1/10  Iteration 2/1780 Training loss: 4.3745 0.1047 sec/batch
Epoch 1/10  Iteration 3/1780 Training loss: 4.2098 0.1017 sec/batch
Epoch 1/10  Iteration 4/1780 Training loss: 4.6346 0.1080 sec/batch
Epoch 1/10  Iteration 5/1780 Training loss: 4.6359 0.0941 sec/batch
Epoch 1/10  Iteration 6/1780 Training loss: 4.5060 0.1049 sec/batch
Epoch 1/10  Iteration 7/1780 Training loss: 4.3856 0.0939 sec/batch
Epoch 1/10  Iteration 8/1780 Training loss: 4.2905 0.1049 sec/batch
Epoch 1/10  Iteration 9/1780 Training loss: 4.2072 0.0931 sec/batch
Epoch 1/10  Iteration 10/1780 Training loss: 4.1366 0.1044 sec/batch
Epoch 1/10  Iteration 11/1780 Training loss: 4.0728 0.0941 sec/batch
Epoch 1/10  Iteration 12/1780 Training loss: 4.0174 0.1043 sec/batch
Epoch 1/10  Iteration 13/1780 Training loss: 3.9679 0.0934 sec/batch
Epoch 1/10  Iteration 14/1780 Training loss: 3.9248 0.1093 sec/batch
Epoch 1/10  Iteration 15/1780 Training loss

In [27]:
tf.train.get_checkpoint_state('checkpoints/anna')

model_checkpoint_path: "checkpoints/anna/i1780_l512_1.258.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i200_l512_2.420.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i400_l512_1.974.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i600_l512_1.755.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i800_l512_1.602.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1000_l512_1.486.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1200_l512_1.398.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1400_l512_1.338.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1600_l512_1.293.ckpt"
all_model_checkpoint_paths: "checkpoints/anna/i1780_l512_1.258.ckpt"

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

现在网络被训练了，我们可以用它来生成新的文本。 这个想法是我们传递一个字符，那么网络将预测下一个字符。 我们可以用新的来预测下一个。 我们继续这样做来生成所有新的文本。 我还包括一些功能，通过传递一个字符串并建立一个状态，通过一些文本来填充网络。

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.

网络给了我们每个角色的预测。 为了减少噪音，让事情稍微随机一些，我将只从最可能的N个角色中选择一个新角色。

In [24]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [31]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

In [39]:
checkpoint = "checkpoints/anna/i1780_l512_1.258.ckpt"
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="I want to make Love with you")
print(samp)

I want to make Love with you.

"You've cleated to be in any out, to talk before. We are truse to the
position of a bill that is atticularly from my friend. I set of the could
neternd. He disteved him to the propersy, I can't the their officiar
and the constinute of tendion. What won't your send to sent that he went
out."

"Ind son't are a counters, that this some must be tone my directions," the
respect and world as this conviction. After same to see the sore smile, who was
a men of more. This had. He did not thinking that without she was
no one that he was, that he saw the sort in a strown, and the
part of his frace. There the same of that-strame was the country
should before, and he call in walk of stalling a looked at terrible which
she did not have beginning the lefter as she saw it in the stall. "Why was
in humband, in speak."

And he had not answer, and with the stroight on his with the
sement, sat to her. He could not to look home, but with the counce
were nired in a feeling at 

In [40]:
checkpoint = "checkpoints/anna/i200_l512_2.420.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="I want to make Love with you")
print(samp)

I want to make Love with youns to athe wan sad are wan at ind wher ant othore shes at in ot antete tha caring he want on shad
is as ther ante ho thard ses he wos ol orito seresan tan and oros on hh at ante sos athin hare aded te tant int ho wo sin than ad osis sas tir he sand ta te at are has and ther sata th ad sos hore and war an ad aldes ot ase tas asede an tas her ta he aned the
sadd te wota se sererinn orer asd
ant he an hors his theessen tathe te site his are ad that he hase are sor and wos he so te sar an hhin to ale sede an sot on tis ato he whe whe thas
sind atering ang an sithe ta he thar whar and ande sat os ase thes won and.


ours oos ath an terad sorens on athe shand tar at orat her has as ined sith as antis hes wansid the ad so the and
ad he tos sos in he seatd to hit hos wans ane sisins ter ans ther whins an he tho ses tha hes an th erserisg ood and tinting the whasd sot he wis ot and se won tho whe ho had an he thind and as and oon the and wone whe arid he wos an thom 

In [41]:
checkpoint = "checkpoints/anna/i600_l512_1.755.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="I want to make Love with you")
print(samp)

I want to make Love with youre as solled a she has be to the
pose, were him all to thine, and sat
there the seaded the ponsious what had the carded of to
chir ont on a monter a lover had bee wored and shance the childerss an the sand, and strent
her though there him and and a stone with the cenceredssen him, this har take all the consted the
churded that he was bat it a canticunt to thim."

"I said to as the cand, as and somerely bating it with he will to
be sterent hur stong in to shate a dictrenting him, this the
celas on the with home ant the with horsess all him a some to te sere. It's not sise to be alle she was at the same wist her so cant of her this shich of chome as she could an the cond of him be all ase had, and tar in to thit wanterss on hid and ther anceranted.

The head a counter had his to broome har boon. The could this all he worled to ser his and same, had her a that will,
take it to strem to her bouther whene, wo came as her and stack in the sood, was a cantad that i

In [42]:
checkpoint = "checkpoints/anna/i1000_l512_1.486.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="I want to make Love with you")
print(samp)

I want to make Love with you with she had to get to the comen the calliage as the comnersation, and
he was told and was any atting that so that
had stould, something
the doon, and she he had not here and to stind what
were stild to him and somath of the strong
that her same of the sour of the some with strick
was never sunders, whent the
carriage, he was not at homeer of the rain.

"I having an indriented on his
befiens. Shook the say some a man of she has not see tha mire and their consices of her, when his fronged
and should he same to him, and had now
that, that he was sater to see her a streatiage, to him, with a mearance, the compresence wish the cried, at his consented that
sould never heart her herd of his come, and that the moment of
that had to hem helpsed her strangs as any they with the stuld alw
that he had sawing to be in him to a samper that the convertanter had to stell with hard trunghing that he was to
gevole and which had heard a goon and her that she was the
rid of
h