Deep Learning
=============

Assignment 6
------------

After training a skip-gram model in `5_word2vec.ipynb`, the goal of this notebook is to train a LSTM character model over [Text8](http://mattmahoney.net/dc/textdata) data.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import os
import numpy as np
import random
import string
import tensorflow as tf
import zipfile
from six.moves import range
from six.moves.urllib.request import urlretrieve

In [2]:
url = 'http://mattmahoney.net/dc/'

def maybe_download(filename, expected_bytes):
    """Download a file if not present, and make sure it's the right size."""
    if not os.path.exists(filename):
        filename, _ = urlretrieve(url + filename, filename)
    statinfo = os.stat(filename)
    if statinfo.st_size == expected_bytes:
        print('Found and verified %s' % filename)
    else:
        print(statinfo.st_size)
        raise Exception('Failed to verify ' + filename + '. Can you get to it with a browser?')
    return filename

filename = maybe_download('text8.zip', 31344016)

Found and verified text8.zip


In [3]:
def read_data(filename):
    with zipfile.ZipFile(filename) as f:
        name = f.namelist()[0]
        data = tf.compat.as_str(f.read(name))
    return data
  
text = read_data(filename)
print('Data size %d' % len(text))

Data size 100000000


Create a small validation set.

In [4]:
valid_size = 1000
valid_text = text[:valid_size]
train_text = text[valid_size:]
train_size = len(train_text)
print(train_size, train_text[:64])
print(valid_size, valid_text[:64])

99999000 ons anarchists advocate social relations based upon voluntary as
1000  anarchism originated as a term of abuse first used against earl


Utility functions to map characters to vocabulary IDs and back.

In [5]:
vocabulary_size = len(string.ascii_lowercase) + 1 # [a-z] + ' '
first_letter = ord(string.ascii_lowercase[0])

def char2id(char):
    if char in string.ascii_lowercase:
        return ord(char) - first_letter + 1
    elif char == ' ':
        return 0
    else:
        print('Unexpected character: %s' % char)
        return 0
    
def id2char(dictid):
    if dictid > 0:
        return chr(dictid + first_letter - 1)
    else:
        return ' '

print(char2id('a'), char2id('z'), char2id(' '), char2id('ï'))
print(id2char(1), id2char(26), id2char(0))

Unexpected character: ï
1 26 0 0
a z  


In [24]:
letters = string.ascii_lowercase + ' '


'abcdefghijklmnopqrstuvwxyz '

Function to generate a training batch for the LSTM model.

In [7]:
batch_size=64
num_unrollings=10

class BatchGenerator(object):
    def __init__(self, text, batch_size, num_unrollings):
        self._text = text
        self._text_size = len(text)
        self._batch_size = batch_size
        self._num_unrollings = num_unrollings
        segment = self._text_size // batch_size
        self._cursor = [ offset * segment for offset in range(batch_size)]
        self._last_batch = self._next_batch()
  
    def _next_batch(self):
        """Generate a single batch from the current cursor position in the data."""
        batch = np.zeros(shape=(self._batch_size, vocabulary_size), dtype=np.float)
        for b in range(self._batch_size):
            batch[b, char2id(self._text[self._cursor[b]])] = 1.0
            self._cursor[b] = (self._cursor[b] + 1) % self._text_size
        return batch

    def next(self):
        """Generate the next array of batches from the data. The array consists of
        the last batch of the previous array, followed by num_unrollings new ones.
        """
        batches = [self._last_batch]
        for step in range(self._num_unrollings):
            batches.append(self._next_batch())
        self._last_batch = batches[-1]
        return batches

def characters(probabilities):
    """Turn a 1-hot encoding or a probability distribution over the possible
    characters back into its (most likely) character representation."""
    return [id2char(c) for c in np.argmax(probabilities, 1)]

def batches2string(batches):
    """Convert a sequence of batches back into their (most likely) string
    representation."""
    s = [''] * batches[0].shape[0]
    for b in batches:
        s = [''.join(x) for x in zip(s, characters(b))]
    return s

train_batches = BatchGenerator(train_text, batch_size, num_unrollings)
valid_batches = BatchGenerator(valid_text, 1, 1)

print(batches2string(train_batches.next()))
print(batches2string(train_batches.next()))
print(batches2string(valid_batches.next()))
print(batches2string(valid_batches.next()))

['ons anarchi', 'when milita', 'lleria arch', ' abbeys and', 'married urr', 'hel and ric', 'y and litur', 'ay opened f', 'tion from t', 'migration t', 'new york ot', 'he boeing s', 'e listed wi', 'eber has pr', 'o be made t', 'yer who rec', 'ore signifi', 'a fierce cr', ' two six ei', 'aristotle s', 'ity can be ', ' and intrac', 'tion of the', 'dy to pass ', 'f certain d', 'at it will ', 'e convince ', 'ent told hi', 'ampaign and', 'rver side s', 'ious texts ', 'o capitaliz', 'a duplicate', 'gh ann es d', 'ine january', 'ross zero t', 'cal theorie', 'ast instanc', ' dimensiona', 'most holy m', 't s support', 'u is still ', 'e oscillati', 'o eight sub', 'of italy la', 's the tower', 'klahoma pre', 'erprise lin', 'ws becomes ', 'et in a naz', 'the fabian ', 'etchy to re', ' sharman ne', 'ised empero', 'ting in pol', 'd neo latin', 'th risky ri', 'encyclopedi', 'fense the a', 'duating fro', 'treet grid ', 'ations more', 'appeal of d', 'si have mad']
['ists advoca', 'ary governm', 'hes nat

In [9]:
def logprob(predictions, labels):
    """Log-probability of the true labels in a predicted batch."""
    predictions[predictions < 1e-10] = 1e-10
    return np.sum(np.multiply(labels, -np.log(predictions))) / labels.shape[0]

def sample_distribution(distribution):
    """Sample one element from a distribution assumed to be an array of normalized
    probabilities.
    """
    r = random.uniform(0, 1)
    s = 0
    for i in range(len(distribution)):
        s += distribution[i]
        if s >= r:
            return i
    return len(distribution) - 1

def sample(prediction):
    """Turn a (column) prediction into 1-hot encoded samples."""
    p = np.zeros(shape=[1, vocabulary_size], dtype=np.float)
    p[0, sample_distribution(prediction[0])] = 1.0
    return p

def random_distribution():
    """Generate a random column of probabilities."""
    b = np.random.uniform(0.0, 1.0, size=[1, vocabulary_size])
    return b/np.sum(b, 1)[:,None]

Simple LSTM Model.

In [10]:
num_nodes = 64

graph = tf.Graph()
with graph.as_default():

    # Parameters:
    # Input gate: input, previous output, and bias.
    ix = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
    im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    ib = tf.Variable(tf.zeros([1, num_nodes]))
    # Forget gate: input, previous output, and bias.
    fx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
    fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    fb = tf.Variable(tf.zeros([1, num_nodes]))
    # Memory cell: input, state and bias.                             
    cx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
    cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    cb = tf.Variable(tf.zeros([1, num_nodes]))
    # Output gate: input, previous output, and bias.
    ox = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
    om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
    ob = tf.Variable(tf.zeros([1, num_nodes]))
    # Variables saving state across unrollings.
    saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
    saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
    # Classifier weights and biases.
    w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))
    b = tf.Variable(tf.zeros([vocabulary_size]))
  
    # Definition of the cell computation.
    def lstm_cell(i, o, state):
        """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
        Note that in this formulation, we omit the various connections between the
        previous state and the gates."""
        input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)
        forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)
        update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb
        state = forget_gate * state + input_gate * tf.tanh(update)
        output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)
        return output_gate * tf.tanh(state), state

    # Input data.
    train_data = list()
    for _ in range(num_unrollings + 1):
        train_data.append(tf.placeholder(tf.float32, shape=[batch_size, vocabulary_size]))
    train_inputs = train_data[:num_unrollings]
    train_labels = train_data[1:]  # labels are inputs shifted by one time step.

    # Unrolled LSTM loop.
    outputs = list()
    output = saved_output
    state = saved_state
    for i in train_inputs:
        output, state = lstm_cell(i, output, state)
        outputs.append(output)

    # State saving across unrollings.
    with tf.control_dependencies([saved_output.assign(output), saved_state.assign(state)]):
        # Classifier.
        logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
            labels=tf.concat(train_labels, 0), logits=logits))

    # Optimizer.
    global_step = tf.Variable(0)
    learning_rate = tf.train.exponential_decay(10.0, global_step, 5000, 0.1, staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    gradients, v = zip(*optimizer.compute_gradients(loss))
    gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
    optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)

    # Predictions.
    train_prediction = tf.nn.softmax(logits)

    # Sampling and validation eval: batch 1, no unrolling.
    sample_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size])
    saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))
    saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))
    reset_sample_state = tf.group(
        saved_sample_output.assign(tf.zeros([1, num_nodes])),
        saved_sample_state.assign(tf.zeros([1, num_nodes])))
    sample_output, sample_state = lstm_cell(sample_input, saved_sample_output, saved_sample_state)
    with tf.control_dependencies([saved_sample_output.assign(sample_output),
                                saved_sample_state.assign(sample_state)]):
        sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))

In [12]:
%%time
num_steps = 7001
summary_frequency = 100

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    mean_loss = 0
    for step in range(num_steps):
        batches = train_batches.next()
        feed_dict = dict()
        for i in range(num_unrollings + 1):
            feed_dict[train_data[i]] = batches[i]
        _, l, predictions, lr = session.run(
            [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)
        mean_loss += l
        if step % summary_frequency == 0:
            if step > 0:
                mean_loss = mean_loss / summary_frequency
            # The mean loss is an estimate of the loss over the last few batches.
            print('Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))
            mean_loss = 0
            labels = np.concatenate(list(batches)[1:])
            print('Minibatch perplexity: %.2f' % float(np.exp(logprob(predictions, labels))))
            if step % (summary_frequency * 10) == 0:
                # Generate some samples.
                print('=' * 80)
                for _ in range(5):
                    feed = sample(random_distribution())
                    sentence = characters(feed)[0]
                    reset_sample_state.run()
                    for _ in range(79):
                        prediction = sample_prediction.eval({sample_input: feed})
                        feed = sample(prediction)
                        sentence += characters(feed)[0]
                    print(sentence)
                print('=' * 80)
        # Measure validation set perplexity.
        reset_sample_state.run()
        valid_logprob = 0
        for _ in range(valid_size):
            b = valid_batches.next()
            predictions = sample_prediction.eval({sample_input: b[0]})
            valid_logprob = valid_logprob + logprob(predictions, b[1])
        print('Validation set perplexity: %.2f' % float(np.exp(valid_logprob / valid_size)))

Initialized
Average loss at step 0: 3.297803 learning rate: 10.000000
Minibatch perplexity: 27.05
gikkaixs  z vs zgghnxtnsyatvjnle nye mkceyaidmjeiyvnmyovyz ttcrlasdebxagrsexrhlo
toeibrnojkvn mzvhtadeowfuampiiji  u gswm kzedhvuu prco lhppt s  teapdjaah rwnsur
cbeg yibf eietprpu whvwmluo xrtgi   sczcxyewiti ragtqrmap  zny dgc lqevejinr bji
sifln hwoijh ev u vkrqo zybpi  ichyqyu lw ysuurqtknfbclfza ekslr cj tp  xhis mgy
mnnolpongoq dcuyfeilxvvgaketvnbltkemktvnlvawnrep sfqt ikob nesehkan   qinv rznbs
Validation set perplexity: 19.96
Validation set perplexity: 18.77
Validation set perplexity: 17.97
Validation set perplexity: 17.55
Validation set perplexity: 17.25
Validation set perplexity: 17.22
Validation set perplexity: 17.16
Validation set perplexity: 16.98
Validation set perplexity: 17.88
Validation set perplexity: 18.15
Validation set perplexity: 17.44
Validation set perplexity: 16.93
Validation set perplexity: 16.44
Validation set perplexity: 16.34
Validation set perplexity: 16.25
Va

Validation set perplexity: 8.68
Validation set perplexity: 8.65
Validation set perplexity: 8.58
Validation set perplexity: 9.00
Validation set perplexity: 8.62
Validation set perplexity: 8.88
Validation set perplexity: 8.71
Validation set perplexity: 8.87
Validation set perplexity: 8.30
Validation set perplexity: 8.59
Validation set perplexity: 8.52
Validation set perplexity: 8.35
Validation set perplexity: 8.49
Validation set perplexity: 8.67
Validation set perplexity: 8.68
Validation set perplexity: 8.43
Validation set perplexity: 8.28
Validation set perplexity: 8.64
Validation set perplexity: 8.40
Validation set perplexity: 8.55
Validation set perplexity: 8.39
Validation set perplexity: 8.34
Validation set perplexity: 8.46
Validation set perplexity: 8.53
Validation set perplexity: 8.23
Validation set perplexity: 8.39
Validation set perplexity: 8.59
Validation set perplexity: 8.37
Validation set perplexity: 8.78
Validation set perplexity: 8.64
Validation set perplexity: 9.15
Validati

Validation set perplexity: 7.57
Validation set perplexity: 7.18
Validation set perplexity: 7.17
Validation set perplexity: 7.20
Validation set perplexity: 7.50
Validation set perplexity: 7.32
Validation set perplexity: 7.23
Validation set perplexity: 7.34
Validation set perplexity: 7.24
Validation set perplexity: 7.34
Validation set perplexity: 7.71
Validation set perplexity: 7.30
Validation set perplexity: 7.42
Validation set perplexity: 7.13
Validation set perplexity: 7.15
Validation set perplexity: 7.32
Validation set perplexity: 7.24
Validation set perplexity: 7.34
Validation set perplexity: 7.00
Validation set perplexity: 7.17
Validation set perplexity: 7.21
Validation set perplexity: 7.32
Validation set perplexity: 7.29
Average loss at step 500: 1.940921 learning rate: 10.000000
Minibatch perplexity: 6.81
Validation set perplexity: 7.29
Validation set perplexity: 7.15
Validation set perplexity: 7.17
Validation set perplexity: 7.11
Validation set perplexity: 7.07
Validation set pe

Validation set perplexity: 6.82
Validation set perplexity: 6.85
Validation set perplexity: 6.43
Validation set perplexity: 6.49
Validation set perplexity: 6.56
Validation set perplexity: 6.52
Validation set perplexity: 6.53
Validation set perplexity: 6.69
Validation set perplexity: 6.65
Validation set perplexity: 6.59
Validation set perplexity: 6.57
Validation set perplexity: 6.64
Validation set perplexity: 6.60
Validation set perplexity: 6.66
Validation set perplexity: 6.59
Validation set perplexity: 6.58
Validation set perplexity: 6.97
Validation set perplexity: 6.60
Validation set perplexity: 6.78
Validation set perplexity: 6.57
Validation set perplexity: 6.68
Validation set perplexity: 6.59
Validation set perplexity: 6.56
Validation set perplexity: 6.61
Validation set perplexity: 6.52
Validation set perplexity: 6.75
Validation set perplexity: 6.85
Validation set perplexity: 6.94
Validation set perplexity: 6.92
Validation set perplexity: 6.66
Validation set perplexity: 6.72
Validati

Validation set perplexity: 6.02
Validation set perplexity: 5.86
Validation set perplexity: 5.92
Validation set perplexity: 6.01
Validation set perplexity: 5.91
Validation set perplexity: 6.07
Validation set perplexity: 5.97
Validation set perplexity: 6.00
Validation set perplexity: 5.99
Validation set perplexity: 6.10
Validation set perplexity: 5.99
Validation set perplexity: 5.87
Validation set perplexity: 6.06
Validation set perplexity: 6.16
Validation set perplexity: 6.02
Validation set perplexity: 6.09
Validation set perplexity: 5.98
Validation set perplexity: 6.16
Validation set perplexity: 6.00
Validation set perplexity: 6.14
Validation set perplexity: 6.19
Validation set perplexity: 6.17
Validation set perplexity: 5.99
Validation set perplexity: 6.07
Average loss at step 1000: 1.825350 learning rate: 10.000000
Minibatch perplexity: 5.51
 then it porks the estwev to be the dis incomonim of the eight zero wink infle d
jrisa cland untorchars and are one nine he sibnan dian rold how

Validation set perplexity: 5.83
Validation set perplexity: 5.84
Validation set perplexity: 5.56
Validation set perplexity: 5.99
Validation set perplexity: 5.53
Validation set perplexity: 5.69
Validation set perplexity: 5.70
Validation set perplexity: 5.76
Validation set perplexity: 5.84
Validation set perplexity: 5.68
Validation set perplexity: 5.76
Validation set perplexity: 5.65
Validation set perplexity: 5.84
Validation set perplexity: 5.64
Validation set perplexity: 5.79
Validation set perplexity: 5.63
Validation set perplexity: 5.56
Validation set perplexity: 5.68
Validation set perplexity: 5.71
Validation set perplexity: 5.57
Validation set perplexity: 5.56
Validation set perplexity: 5.74
Validation set perplexity: 5.60
Validation set perplexity: 5.62
Validation set perplexity: 5.82
Validation set perplexity: 5.65
Validation set perplexity: 5.67
Validation set perplexity: 5.71
Validation set perplexity: 5.83
Validation set perplexity: 5.60
Validation set perplexity: 5.55
Validati

Validation set perplexity: 5.50
Validation set perplexity: 5.59
Validation set perplexity: 5.74
Validation set perplexity: 5.63
Validation set perplexity: 5.62
Validation set perplexity: 5.63
Validation set perplexity: 5.75
Validation set perplexity: 5.63
Validation set perplexity: 5.54
Validation set perplexity: 5.66
Validation set perplexity: 5.43
Validation set perplexity: 5.64
Validation set perplexity: 5.56
Validation set perplexity: 5.58
Validation set perplexity: 5.54
Validation set perplexity: 5.66
Validation set perplexity: 5.54
Validation set perplexity: 5.60
Validation set perplexity: 5.63
Validation set perplexity: 5.55
Validation set perplexity: 5.57
Validation set perplexity: 5.73
Validation set perplexity: 5.58
Validation set perplexity: 5.80
Validation set perplexity: 5.63
Validation set perplexity: 5.59
Validation set perplexity: 5.65
Validation set perplexity: 5.56
Validation set perplexity: 5.72
Validation set perplexity: 5.68
Validation set perplexity: 5.56
Validati

Validation set perplexity: 5.37
Validation set perplexity: 5.41
Validation set perplexity: 5.46
Validation set perplexity: 5.52
Validation set perplexity: 5.44
Validation set perplexity: 5.35
Validation set perplexity: 5.32
Validation set perplexity: 5.26
Validation set perplexity: 5.32
Validation set perplexity: 5.26
Validation set perplexity: 5.26
Validation set perplexity: 5.23
Validation set perplexity: 5.25
Validation set perplexity: 5.23
Validation set perplexity: 5.31
Validation set perplexity: 5.38
Validation set perplexity: 5.26
Validation set perplexity: 5.37
Validation set perplexity: 5.43
Validation set perplexity: 5.41
Validation set perplexity: 5.41
Validation set perplexity: 5.52
Validation set perplexity: 5.35
Validation set perplexity: 5.39
Validation set perplexity: 5.33
Validation set perplexity: 5.29
Validation set perplexity: 5.33
Validation set perplexity: 5.31
Validation set perplexity: 5.18
Validation set perplexity: 5.32
Validation set perplexity: 5.30
Validati

Validation set perplexity: 5.30
Validation set perplexity: 5.35
Validation set perplexity: 5.44
Validation set perplexity: 5.47
Validation set perplexity: 5.41
Validation set perplexity: 5.30
Validation set perplexity: 5.45
Validation set perplexity: 5.33
Validation set perplexity: 5.41
Validation set perplexity: 5.44
Validation set perplexity: 5.41
Validation set perplexity: 5.38
Validation set perplexity: 5.43
Validation set perplexity: 5.48
Validation set perplexity: 5.49
Validation set perplexity: 5.37
Validation set perplexity: 5.38
Validation set perplexity: 5.46
Validation set perplexity: 5.40
Validation set perplexity: 5.22
Validation set perplexity: 5.29
Validation set perplexity: 5.28
Validation set perplexity: 5.24
Validation set perplexity: 5.26
Validation set perplexity: 5.21
Validation set perplexity: 5.39
Validation set perplexity: 5.29
Validation set perplexity: 5.33
Validation set perplexity: 5.31
Validation set perplexity: 5.36
Validation set perplexity: 5.24
Validati

Validation set perplexity: 5.05
Validation set perplexity: 5.15
Validation set perplexity: 5.06
Validation set perplexity: 5.05
Validation set perplexity: 5.16
Validation set perplexity: 5.15
Validation set perplexity: 5.15
Validation set perplexity: 5.23
Validation set perplexity: 5.19
Validation set perplexity: 5.11
Average loss at step 2200: 1.683014 learning rate: 10.000000
Minibatch perplexity: 5.14
Validation set perplexity: 5.06
Validation set perplexity: 5.05
Validation set perplexity: 5.05
Validation set perplexity: 5.07
Validation set perplexity: 5.14
Validation set perplexity: 5.16
Validation set perplexity: 5.04
Validation set perplexity: 5.06
Validation set perplexity: 5.07
Validation set perplexity: 5.02
Validation set perplexity: 4.93
Validation set perplexity: 4.99
Validation set perplexity: 5.02
Validation set perplexity: 5.03
Validation set perplexity: 5.02
Validation set perplexity: 5.02
Validation set perplexity: 4.99
Validation set perplexity: 5.04
Validation set p

Validation set perplexity: 4.92
Validation set perplexity: 4.88
Validation set perplexity: 4.88
Validation set perplexity: 4.89
Validation set perplexity: 4.93
Validation set perplexity: 5.01
Validation set perplexity: 5.12
Validation set perplexity: 5.02
Validation set perplexity: 4.87
Validation set perplexity: 4.93
Validation set perplexity: 5.04
Validation set perplexity: 4.96
Validation set perplexity: 5.00
Validation set perplexity: 4.99
Validation set perplexity: 5.04
Validation set perplexity: 4.93
Validation set perplexity: 4.94
Validation set perplexity: 4.98
Validation set perplexity: 4.93
Validation set perplexity: 5.03
Validation set perplexity: 4.93
Validation set perplexity: 4.88
Validation set perplexity: 4.86
Validation set perplexity: 4.96
Validation set perplexity: 4.83
Validation set perplexity: 4.87
Validation set perplexity: 4.81
Validation set perplexity: 4.70
Validation set perplexity: 4.77
Validation set perplexity: 4.79
Validation set perplexity: 4.69
Validati

Validation set perplexity: 4.71
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.74
Validation set perplexity: 4.67
Validation set perplexity: 4.72
Validation set perplexity: 4.71
Validation set perplexity: 4.78
Validation set perplexity: 4.69
Validation set perplexity: 4.73
Validation set perplexity: 4.77
Average loss at step 2700: 1.652909 learning rate: 10.000000
Minibatch perplexity: 4.92
Validation set perplexity: 4.69
Validation set perplexity: 4.67
Validation set perplexity: 4.63
Validation set perplexity: 4.69
Validation set perplexity: 4.59
Validation set perplexity: 4.68
Validation set perplexity: 4.65
Validation set perplexity: 4.61
Validation set perplexity: 4.75
Validation set perplexity: 4.64
Validation set perplexity: 4.66
Validation set perplexity: 4.69
Validation set perplexity: 4.72
Validation set perplexity: 4.76
Validation set perplexity: 4.67
Validation set perplexity: 4.73
Validation set perplexity: 4.60
Validation set p

Validation set perplexity: 4.57
Validation set perplexity: 4.62
Validation set perplexity: 4.79
Validation set perplexity: 4.61
Validation set perplexity: 4.60
Validation set perplexity: 4.58
Validation set perplexity: 4.57
Validation set perplexity: 4.58
Validation set perplexity: 4.60
Validation set perplexity: 4.59
Validation set perplexity: 4.55
Validation set perplexity: 4.63
Validation set perplexity: 4.71
Validation set perplexity: 4.62
Validation set perplexity: 4.63
Validation set perplexity: 4.74
Validation set perplexity: 4.60
Validation set perplexity: 4.59
Validation set perplexity: 4.62
Validation set perplexity: 4.59
Validation set perplexity: 4.55
Validation set perplexity: 4.66
Validation set perplexity: 4.60
Validation set perplexity: 4.56
Validation set perplexity: 4.66
Validation set perplexity: 4.59
Validation set perplexity: 4.58
Validation set perplexity: 4.53
Validation set perplexity: 4.52
Validation set perplexity: 4.64
Validation set perplexity: 4.56
Validati

Validation set perplexity: 4.72
Validation set perplexity: 4.62
Validation set perplexity: 4.63
Validation set perplexity: 4.66
Validation set perplexity: 4.63
Validation set perplexity: 4.67
Validation set perplexity: 4.69
Validation set perplexity: 4.66
Validation set perplexity: 4.70
Validation set perplexity: 4.86
Validation set perplexity: 4.71
Validation set perplexity: 4.69
Validation set perplexity: 4.67
Validation set perplexity: 4.66
Validation set perplexity: 4.67
Validation set perplexity: 4.72
Validation set perplexity: 4.73
Validation set perplexity: 4.62
Validation set perplexity: 4.75
Validation set perplexity: 4.78
Validation set perplexity: 4.77
Validation set perplexity: 4.65
Validation set perplexity: 4.66
Validation set perplexity: 4.55
Validation set perplexity: 4.61
Validation set perplexity: 4.56
Validation set perplexity: 4.59
Validation set perplexity: 4.64
Validation set perplexity: 4.69
Validation set perplexity: 4.56
Average loss at step 3200: 1.643451 lear

Validation set perplexity: 4.47
Validation set perplexity: 4.57
Validation set perplexity: 4.55
Validation set perplexity: 4.57
Validation set perplexity: 4.58
Validation set perplexity: 4.65
Validation set perplexity: 4.63
Validation set perplexity: 4.58
Validation set perplexity: 4.68
Validation set perplexity: 4.67
Validation set perplexity: 4.60
Validation set perplexity: 4.58
Validation set perplexity: 4.65
Validation set perplexity: 4.63
Validation set perplexity: 4.69
Validation set perplexity: 4.59
Validation set perplexity: 4.60
Validation set perplexity: 4.56
Validation set perplexity: 4.63
Validation set perplexity: 4.58
Validation set perplexity: 4.61
Validation set perplexity: 4.59
Validation set perplexity: 4.62
Validation set perplexity: 4.70
Validation set perplexity: 4.62
Validation set perplexity: 4.62
Validation set perplexity: 4.65
Validation set perplexity: 4.64
Validation set perplexity: 4.58
Validation set perplexity: 4.60
Validation set perplexity: 4.69
Validati

Validation set perplexity: 4.51
Validation set perplexity: 4.56
Validation set perplexity: 4.52
Validation set perplexity: 4.55
Validation set perplexity: 4.53
Validation set perplexity: 4.54
Validation set perplexity: 4.46
Validation set perplexity: 4.49
Validation set perplexity: 4.56
Validation set perplexity: 4.47
Validation set perplexity: 4.46
Validation set perplexity: 4.53
Validation set perplexity: 4.46
Validation set perplexity: 4.45
Validation set perplexity: 4.53
Validation set perplexity: 4.47
Validation set perplexity: 4.48
Validation set perplexity: 4.54
Validation set perplexity: 4.53
Validation set perplexity: 4.46
Validation set perplexity: 4.50
Validation set perplexity: 4.50
Validation set perplexity: 4.54
Validation set perplexity: 4.46
Validation set perplexity: 4.53
Validation set perplexity: 4.55
Validation set perplexity: 4.55
Validation set perplexity: 4.61
Validation set perplexity: 4.53
Validation set perplexity: 4.57
Validation set perplexity: 4.53
Average 

Validation set perplexity: 4.48
Validation set perplexity: 4.49
Validation set perplexity: 4.53
Validation set perplexity: 4.52
Validation set perplexity: 4.53
Validation set perplexity: 4.48
Validation set perplexity: 4.55
Validation set perplexity: 4.54
Validation set perplexity: 4.47
Validation set perplexity: 4.47
Validation set perplexity: 4.47
Validation set perplexity: 4.54
Validation set perplexity: 4.50
Validation set perplexity: 4.52
Validation set perplexity: 4.47
Validation set perplexity: 4.48
Validation set perplexity: 4.55
Validation set perplexity: 4.60
Validation set perplexity: 4.59
Validation set perplexity: 4.53
Validation set perplexity: 4.49
Validation set perplexity: 4.46
Validation set perplexity: 4.56
Validation set perplexity: 4.63
Validation set perplexity: 4.58
Validation set perplexity: 4.57
Validation set perplexity: 4.57
Validation set perplexity: 4.51
Validation set perplexity: 4.58
Validation set perplexity: 4.52
Validation set perplexity: 4.46
Validati

Validation set perplexity: 4.57
Validation set perplexity: 4.55
Validation set perplexity: 4.53
Validation set perplexity: 4.54
Validation set perplexity: 4.51
Validation set perplexity: 4.48
Validation set perplexity: 4.49
Validation set perplexity: 4.52
Validation set perplexity: 4.48
Validation set perplexity: 4.55
Validation set perplexity: 4.47
Validation set perplexity: 4.43
Validation set perplexity: 4.51
Validation set perplexity: 4.47
Validation set perplexity: 4.51
Validation set perplexity: 4.51
Validation set perplexity: 4.44
Validation set perplexity: 4.43
Validation set perplexity: 4.49
Validation set perplexity: 4.42
Validation set perplexity: 4.44
Validation set perplexity: 4.56
Validation set perplexity: 4.48
Validation set perplexity: 4.46
Validation set perplexity: 4.43
Validation set perplexity: 4.48
Validation set perplexity: 4.43
Validation set perplexity: 4.40
Validation set perplexity: 4.38
Validation set perplexity: 4.42
Validation set perplexity: 4.43
Validati

Validation set perplexity: 4.56
Validation set perplexity: 4.45
Validation set perplexity: 4.55
Validation set perplexity: 4.46
Validation set perplexity: 4.46
Validation set perplexity: 4.34
Validation set perplexity: 4.34
Validation set perplexity: 4.41
Validation set perplexity: 4.44
Validation set perplexity: 4.44
Validation set perplexity: 4.61
Validation set perplexity: 4.49
Validation set perplexity: 4.53
Validation set perplexity: 4.54
Validation set perplexity: 4.57
Validation set perplexity: 4.66
Validation set perplexity: 4.66
Validation set perplexity: 4.58
Validation set perplexity: 4.57
Validation set perplexity: 4.57
Validation set perplexity: 4.55
Validation set perplexity: 4.49
Validation set perplexity: 4.48
Validation set perplexity: 4.55
Validation set perplexity: 4.58
Validation set perplexity: 4.51
Validation set perplexity: 4.47
Validation set perplexity: 4.51
Validation set perplexity: 4.55
Validation set perplexity: 4.58
Validation set perplexity: 4.49
Validati

Validation set perplexity: 4.41
Validation set perplexity: 4.42
Validation set perplexity: 4.45
Validation set perplexity: 4.43
Validation set perplexity: 4.40
Validation set perplexity: 4.40
Validation set perplexity: 4.43
Validation set perplexity: 4.49
Validation set perplexity: 4.48
Validation set perplexity: 4.45
Validation set perplexity: 4.42
Validation set perplexity: 4.42
Validation set perplexity: 4.44
Validation set perplexity: 4.41
Validation set perplexity: 4.37
Validation set perplexity: 4.37
Validation set perplexity: 4.44
Validation set perplexity: 4.38
Validation set perplexity: 4.39
Validation set perplexity: 4.34
Validation set perplexity: 4.35
Validation set perplexity: 4.25
Validation set perplexity: 4.33
Validation set perplexity: 4.36
Validation set perplexity: 4.40
Validation set perplexity: 4.39
Validation set perplexity: 4.38
Validation set perplexity: 4.45
Validation set perplexity: 4.49
Validation set perplexity: 4.51
Validation set perplexity: 4.46
Validati

Validation set perplexity: 4.54
Validation set perplexity: 4.54
Validation set perplexity: 4.64
Validation set perplexity: 4.60
Validation set perplexity: 4.63
Validation set perplexity: 4.64
Validation set perplexity: 4.61
Validation set perplexity: 4.56
Validation set perplexity: 4.60
Validation set perplexity: 4.65
Validation set perplexity: 4.69
Validation set perplexity: 4.53
Validation set perplexity: 4.53
Validation set perplexity: 4.60
Validation set perplexity: 4.58
Validation set perplexity: 4.54
Validation set perplexity: 4.54
Validation set perplexity: 4.50
Validation set perplexity: 4.56
Validation set perplexity: 4.60
Validation set perplexity: 4.52
Validation set perplexity: 4.63
Validation set perplexity: 4.59
Validation set perplexity: 4.56
Validation set perplexity: 4.60
Validation set perplexity: 4.63
Validation set perplexity: 4.64
Validation set perplexity: 4.57
Validation set perplexity: 4.47
Validation set perplexity: 4.49
Validation set perplexity: 4.50
Validati

Validation set perplexity: 4.35
Validation set perplexity: 4.35
Validation set perplexity: 4.35
Validation set perplexity: 4.36
Validation set perplexity: 4.36
Validation set perplexity: 4.36
Validation set perplexity: 4.36
Validation set perplexity: 4.36
Validation set perplexity: 4.36
Validation set perplexity: 4.36
Validation set perplexity: 4.37
Validation set perplexity: 4.37
Validation set perplexity: 4.36
Validation set perplexity: 4.37
Validation set perplexity: 4.36
Validation set perplexity: 4.36
Validation set perplexity: 4.35
Validation set perplexity: 4.35
Validation set perplexity: 4.35
Validation set perplexity: 4.35
Validation set perplexity: 4.36
Validation set perplexity: 4.35
Validation set perplexity: 4.36
Validation set perplexity: 4.35
Validation set perplexity: 4.34
Validation set perplexity: 4.35
Validation set perplexity: 4.34
Validation set perplexity: 4.34
Validation set perplexity: 4.33
Validation set perplexity: 4.33
Validation set perplexity: 4.33
Validati

Validation set perplexity: 4.31
Validation set perplexity: 4.32
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.30
Validation set perplexity: 4.31
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Average loss at step 5400: 1.575793 learning rate: 1.000000
Minibatch perplexity: 4.62
Validation set perplexity: 4.30
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.28
Validation set pe

Validation set perplexity: 4.32
Validation set perplexity: 4.32
Validation set perplexity: 4.32
Validation set perplexity: 4.32
Validation set perplexity: 4.32
Validation set perplexity: 4.32
Validation set perplexity: 4.32
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.32
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.31
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.31
Validation set perplexity: 4.30
Validation set perplexity: 4.30
Validation set perplexity: 4.29
Validation set perplexity: 4.30
Validati

Validation set perplexity: 4.29
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.28
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.29
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.29
Validation set perplexity: 4.28
Average loss at step 5900: 1.576319 learning rate: 1.000000
Minibatch perplexity: 4.29
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.28
Validation set perplexity: 4.29
Validation set perplexity: 4.28
Validation set pe

Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.23
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.24
Validation set perplexity: 4.25
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.23
Validati

Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.24
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.22
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.22
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.22
Validation set perplexity: 4.21
Validation set perplexity: 4.22
Validation set perplexity: 4.22
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validati

Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.22
Validation set perplexity: 4.22
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.22
Validation set perplexity: 4.22
Validation set perplexity: 4.22
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.23
Validation set perplexity: 4.22
Validation set perplexity: 4.22
Validation set perplexity: 4.22
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validation set perplexity: 4.20
Validation set perplexity: 4.20
Validation set perplexity: 4.21
Validation set perplexity: 4.21
Validati

Validation set perplexity: 4.26
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.26
Validation set perplexity: 4.26
Validation set perplexity: 4.26
Validation set perplexity: 4.26
Validation set perplexity: 4.26
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.24
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.25
Validation set perplexity: 4.26
Validation set perplexity: 4.26
Validation set perplexity: 4.26
Validation set perplexity: 4.26
Validation set perplexity: 4.25
Validation set perplexity: 4.26
Validation set perplexity: 4.27
Validation set perplexity: 4.27
Validation set perplexity: 4.27
Validation set perplexity: 4.27
Validation set perplexity: 4.27
Validation set perplexity: 4.27
Validation set perplexity: 4.27
Validati

---
Problem 1
---------

You might have noticed that the definition of the LSTM cell involves 4 matrix multiplications with the input, and 4 matrix multiplications with the output. Simplify the expression by using a single matrix multiply for each, and variables that are 4 times larger.

---

In [20]:
num_nodes = 64

graph = tf.Graph()
with graph.as_default():

    # Parameters:
    # input, forget, update, output
    # input, previous output, and bias.
    gx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes * 4], -0.1, 0.1))
    gm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes * 4], -0.1, 0.1))
    gb = tf.Variable(tf.zeros([1, num_nodes * 4]))
#     # Forget gate: input, previous output, and bias.
#     fx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
#     fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
#     fb = tf.Variable(tf.zeros([1, num_nodes]))
#     # Memory cell: input, state and bias.                             
#     cx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
#     cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
#     cb = tf.Variable(tf.zeros([1, num_nodes]))
#     # Output gate: input, previous output, and bias.
#     ox = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
#     om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
#     ob = tf.Variable(tf.zeros([1, num_nodes]))
    # Variables saving state across unrollings.
    saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
    saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
    # Classifier weights and biases.
    w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))
    b = tf.Variable(tf.zeros([vocabulary_size]))
  
    # Definition of the cell computation.
    def lstm_cell(i, o, state):
        """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
        Note that in this formulation, we omit the various connections between the
        previous state and the gates."""
        # input, forget, update, output
        gates = tf.matmul(i, gx) + tf.matmul(o, gm) + gb
#         input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)
#         forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)
#         update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb
#         state = forget_gate * state + input_gate * tf.tanh(update)
        input_gate = tf.sigmoid(gates[:, :num_nodes])
        forget_gate = tf.sigmoid(gates[:, num_nodes:num_nodes*2])
        update = gates[:, num_nodes*2:num_nodes*3]
        state = forget_gate * state + input_gate * tf.tanh(update)
        output_gate = tf.sigmoid(gates[:, num_nodes*3:num_nodes*4])
#         output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)
        return output_gate * tf.tanh(state), state

    # Input data.
    train_data = list()
    for _ in range(num_unrollings + 1):
        train_data.append(tf.placeholder(tf.float32, shape=[batch_size, vocabulary_size]))
    train_inputs = train_data[:num_unrollings]
    train_labels = train_data[1:]  # labels are inputs shifted by one time step.

    # Unrolled LSTM loop.
    outputs = list()
    output = saved_output
    state = saved_state
    for i in train_inputs:
        output, state = lstm_cell(i, output, state)
        outputs.append(output)

    # State saving across unrollings.
    with tf.control_dependencies([saved_output.assign(output), saved_state.assign(state)]):
        # Classifier.
        logits = tf.nn.xw_plus_b(tf.concat(outputs, 0), w, b)
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
            labels=tf.concat(train_labels, 0), logits=logits))

    # Optimizer.
    global_step = tf.Variable(0)
    learning_rate = tf.train.exponential_decay(10.0, global_step, 1500, 0.3, staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    gradients, v = zip(*optimizer.compute_gradients(loss))
    gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
    optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)

    # Predictions.
    train_prediction = tf.nn.softmax(logits)

    # Sampling and validation eval: batch 1, no unrolling.
    sample_input = tf.placeholder(tf.float32, shape=[1, vocabulary_size])
    saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))
    saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))
    reset_sample_state = tf.group(
        saved_sample_output.assign(tf.zeros([1, num_nodes])),
        saved_sample_state.assign(tf.zeros([1, num_nodes])))
    sample_output, sample_state = lstm_cell(sample_input, saved_sample_output, saved_sample_state)
    with tf.control_dependencies([saved_sample_output.assign(sample_output),
                                saved_sample_state.assign(sample_state)]):
        sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))

In [21]:
%%time
num_steps = 8001
summary_frequency = 100

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    mean_loss = 0
    for step in range(num_steps):
        batches = train_batches.next()
        feed_dict = dict()
        for i in range(num_unrollings + 1):
            feed_dict[train_data[i]] = batches[i]
        _, l, predictions, lr = session.run(
            [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)
        mean_loss += l
        if step % summary_frequency == 0:
            if step > 0:
                mean_loss = mean_loss / summary_frequency
            # The mean loss is an estimate of the loss over the last few batches.
            print('Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))
            mean_loss = 0
            labels = np.concatenate(list(batches)[1:])
            print('Minibatch perplexity: %.2f' % float(np.exp(logprob(predictions, labels))))
            if step % (summary_frequency * 10) == 0:
                # Generate some samples.
                print('=' * 80)
                for _ in range(5):
                    feed = sample(random_distribution())
                    sentence = characters(feed)[0]
                    reset_sample_state.run()
                    for _ in range(79):
                        prediction = sample_prediction.eval({sample_input: feed})
                        feed = sample(prediction)
                        sentence += characters(feed)[0]
                    print(sentence)
                print('=' * 80)
        # Measure validation set perplexity.
        reset_sample_state.run()
        valid_logprob = 0
        for _ in range(valid_size):
            b = valid_batches.next()
            predictions = sample_prediction.eval({sample_input: b[0]})
            valid_logprob = valid_logprob + logprob(predictions, b[1])
        print('Validation set perplexity: %.2f' % float(np.exp(valid_logprob / valid_size)))

Initialized
Average loss at step 0: 3.295064 learning rate: 10.000000
Minibatch perplexity: 26.98
khipyq ihjwnkr koaakefixgwaatun xwshpzyexatz apoiwnpmbysnt t t epriha zjbrkhcbhh
xsnikze nhqi ao onuanaefneqzchafytxsnvanynnurkuraht umaskaew x eix ixnphavfenohi
qeeqsmbl cwjrisiopuiofawsdbpe ma   nmrd mz  iry oevcu bobuba uhsnj psandruomektj
xydbshedsdamdefuc  gukpaphefemixofjue zdla deasssflte ez fsizmkmg hfyhm  sukhrny
tpm mptflsroevcaehbm fwonevca svvhbsgsunevqaawvcxo gaithnnv rgyythghu ntc jun qm
Validation set perplexity: 19.88
Validation set perplexity: 18.73
Validation set perplexity: 18.29
Validation set perplexity: 17.69
Validation set perplexity: 17.96
Validation set perplexity: 17.58
Validation set perplexity: 18.10
Validation set perplexity: 17.86
Validation set perplexity: 17.36
Validation set perplexity: 17.31
Validation set perplexity: 17.16
Validation set perplexity: 17.30
Validation set perplexity: 16.82
Validation set perplexity: 16.32
Validation set perplexity: 16.25
Va

Validation set perplexity: 9.45
Validation set perplexity: 8.64
Validation set perplexity: 8.67
Validation set perplexity: 8.81
Validation set perplexity: 8.92
Validation set perplexity: 8.94
Validation set perplexity: 8.94
Validation set perplexity: 9.13
Validation set perplexity: 8.90
Validation set perplexity: 9.14
Validation set perplexity: 8.68
Validation set perplexity: 9.09
Validation set perplexity: 9.01
Validation set perplexity: 8.94
Validation set perplexity: 8.66
Validation set perplexity: 8.90
Validation set perplexity: 8.71
Validation set perplexity: 8.65
Validation set perplexity: 8.50
Validation set perplexity: 9.14
Validation set perplexity: 8.41
Validation set perplexity: 8.75
Validation set perplexity: 8.55
Validation set perplexity: 8.40
Validation set perplexity: 8.81
Validation set perplexity: 8.91
Validation set perplexity: 8.67
Validation set perplexity: 8.73
Validation set perplexity: 8.59
Validation set perplexity: 8.90
Validation set perplexity: 8.60
Validati

Validation set perplexity: 7.15
Validation set perplexity: 7.25
Validation set perplexity: 7.04
Validation set perplexity: 7.09
Validation set perplexity: 6.93
Validation set perplexity: 7.21
Validation set perplexity: 7.15
Validation set perplexity: 7.34
Validation set perplexity: 7.12
Validation set perplexity: 7.31
Validation set perplexity: 7.11
Validation set perplexity: 7.23
Validation set perplexity: 7.17
Validation set perplexity: 7.24
Validation set perplexity: 7.38
Validation set perplexity: 7.54
Validation set perplexity: 7.01
Validation set perplexity: 7.26
Validation set perplexity: 6.97
Validation set perplexity: 6.99
Validation set perplexity: 6.99
Validation set perplexity: 7.14
Validation set perplexity: 7.03
Validation set perplexity: 7.29
Average loss at step 500: 1.980528 learning rate: 10.000000
Minibatch perplexity: 7.03
Validation set perplexity: 7.10
Validation set perplexity: 7.37
Validation set perplexity: 7.07
Validation set perplexity: 7.26
Validation set pe

Validation set perplexity: 6.74
Validation set perplexity: 6.56
Validation set perplexity: 6.48
Validation set perplexity: 6.43
Validation set perplexity: 6.37
Validation set perplexity: 6.36
Validation set perplexity: 6.53
Validation set perplexity: 6.52
Validation set perplexity: 6.36
Validation set perplexity: 6.73
Validation set perplexity: 6.62
Validation set perplexity: 6.46
Validation set perplexity: 6.52
Validation set perplexity: 6.48
Validation set perplexity: 6.45
Validation set perplexity: 6.48
Validation set perplexity: 6.40
Validation set perplexity: 6.92
Validation set perplexity: 6.46
Validation set perplexity: 6.47
Validation set perplexity: 6.61
Validation set perplexity: 6.48
Validation set perplexity: 6.45
Validation set perplexity: 6.33
Validation set perplexity: 6.54
Validation set perplexity: 6.45
Validation set perplexity: 6.46
Validation set perplexity: 6.50
Validation set perplexity: 6.52
Validation set perplexity: 6.53
Validation set perplexity: 6.64
Validati

Validation set perplexity: 6.14
Validation set perplexity: 6.09
Validation set perplexity: 6.16
Validation set perplexity: 6.14
Validation set perplexity: 6.09
Validation set perplexity: 6.02
Validation set perplexity: 6.12
Validation set perplexity: 6.12
Validation set perplexity: 5.99
Validation set perplexity: 6.17
Validation set perplexity: 6.06
Validation set perplexity: 6.13
Validation set perplexity: 6.30
Validation set perplexity: 6.21
Validation set perplexity: 6.09
Validation set perplexity: 6.12
Validation set perplexity: 6.07
Validation set perplexity: 6.05
Validation set perplexity: 6.00
Validation set perplexity: 6.04
Validation set perplexity: 6.02
Validation set perplexity: 6.12
Validation set perplexity: 5.96
Validation set perplexity: 6.09
Validation set perplexity: 6.04
Average loss at step 1000: 1.799828 learning rate: 10.000000
Minibatch perplexity: 6.22
w one peetrapa bro by helterngaims thouk mostemuste beral excartes bescle with p
kilan hadiem to necedaictly a p

Validation set perplexity: 5.98
Validation set perplexity: 5.83
Validation set perplexity: 5.93
Validation set perplexity: 5.91
Validation set perplexity: 5.75
Validation set perplexity: 5.72
Validation set perplexity: 5.79
Validation set perplexity: 5.78
Validation set perplexity: 5.93
Validation set perplexity: 5.88
Validation set perplexity: 5.91
Validation set perplexity: 5.87
Validation set perplexity: 5.90
Validation set perplexity: 6.05
Validation set perplexity: 5.89
Validation set perplexity: 5.95
Validation set perplexity: 5.97
Validation set perplexity: 5.74
Validation set perplexity: 5.99
Validation set perplexity: 5.80
Validation set perplexity: 5.92
Validation set perplexity: 6.01
Validation set perplexity: 5.70
Validation set perplexity: 5.83
Validation set perplexity: 5.86
Validation set perplexity: 5.81
Validation set perplexity: 5.87
Validation set perplexity: 5.79
Validation set perplexity: 5.77
Validation set perplexity: 5.83
Validation set perplexity: 5.83
Validati

Validation set perplexity: 5.67
Validation set perplexity: 5.71
Validation set perplexity: 5.65
Validation set perplexity: 5.59
Validation set perplexity: 5.56
Validation set perplexity: 5.56
Validation set perplexity: 5.78
Validation set perplexity: 5.66
Validation set perplexity: 5.83
Validation set perplexity: 5.64
Validation set perplexity: 5.62
Validation set perplexity: 5.65
Validation set perplexity: 5.71
Validation set perplexity: 5.66
Validation set perplexity: 5.72
Validation set perplexity: 5.61
Validation set perplexity: 5.58
Validation set perplexity: 5.56
Validation set perplexity: 5.68
Validation set perplexity: 5.71
Validation set perplexity: 5.71
Validation set perplexity: 5.58
Validation set perplexity: 5.71
Validation set perplexity: 5.59
Validation set perplexity: 5.60
Validation set perplexity: 5.64
Validation set perplexity: 5.67
Validation set perplexity: 5.62
Validation set perplexity: 5.52
Validation set perplexity: 5.55
Validation set perplexity: 5.62
Validati

Validation set perplexity: 5.32
Validation set perplexity: 5.32
Validation set perplexity: 5.35
Validation set perplexity: 5.34
Validation set perplexity: 5.32
Validation set perplexity: 5.33
Validation set perplexity: 5.29
Validation set perplexity: 5.27
Validation set perplexity: 5.27
Validation set perplexity: 5.28
Validation set perplexity: 5.27
Validation set perplexity: 5.28
Validation set perplexity: 5.29
Validation set perplexity: 5.30
Validation set perplexity: 5.29
Validation set perplexity: 5.31
Validation set perplexity: 5.31
Validation set perplexity: 5.30
Validation set perplexity: 5.29
Validation set perplexity: 5.30
Validation set perplexity: 5.31
Validation set perplexity: 5.32
Validation set perplexity: 5.32
Validation set perplexity: 5.34
Validation set perplexity: 5.38
Validation set perplexity: 5.36
Validation set perplexity: 5.34
Validation set perplexity: 5.35
Validation set perplexity: 5.33
Validation set perplexity: 5.33
Validation set perplexity: 5.34
Validati

Validation set perplexity: 5.24
Validation set perplexity: 5.26
Validation set perplexity: 5.27
Validation set perplexity: 5.27
Validation set perplexity: 5.26
Validation set perplexity: 5.26
Validation set perplexity: 5.28
Validation set perplexity: 5.28
Validation set perplexity: 5.26
Validation set perplexity: 5.25
Validation set perplexity: 5.25
Validation set perplexity: 5.27
Validation set perplexity: 5.24
Validation set perplexity: 5.22
Validation set perplexity: 5.22
Validation set perplexity: 5.23
Validation set perplexity: 5.22
Validation set perplexity: 5.23
Validation set perplexity: 5.21
Validation set perplexity: 5.27
Validation set perplexity: 5.27
Validation set perplexity: 5.29
Validation set perplexity: 5.32
Validation set perplexity: 5.29
Validation set perplexity: 5.26
Validation set perplexity: 5.27
Validation set perplexity: 5.29
Validation set perplexity: 5.31
Validation set perplexity: 5.32
Validation set perplexity: 5.34
Validation set perplexity: 5.35
Validati

Validation set perplexity: 5.13
Validation set perplexity: 5.11
Validation set perplexity: 5.12
Validation set perplexity: 5.13
Validation set perplexity: 5.13
Validation set perplexity: 5.13
Validation set perplexity: 5.10
Validation set perplexity: 5.11
Validation set perplexity: 5.13
Validation set perplexity: 5.13
Validation set perplexity: 5.09
Average loss at step 2200: 1.689263 learning rate: 3.000000
Minibatch perplexity: 5.00
Validation set perplexity: 5.10
Validation set perplexity: 5.14
Validation set perplexity: 5.12
Validation set perplexity: 5.10
Validation set perplexity: 5.10
Validation set perplexity: 5.12
Validation set perplexity: 5.12
Validation set perplexity: 5.13
Validation set perplexity: 5.13
Validation set perplexity: 5.10
Validation set perplexity: 5.13
Validation set perplexity: 5.11
Validation set perplexity: 5.06
Validation set perplexity: 5.07
Validation set perplexity: 5.09
Validation set perplexity: 5.10
Validation set perplexity: 5.09
Validation set pe

Validation set perplexity: 5.04
Validation set perplexity: 5.06
Validation set perplexity: 5.05
Validation set perplexity: 5.03
Validation set perplexity: 5.04
Validation set perplexity: 5.05
Validation set perplexity: 5.06
Validation set perplexity: 5.07
Validation set perplexity: 5.07
Validation set perplexity: 5.06
Validation set perplexity: 5.04
Validation set perplexity: 5.06
Validation set perplexity: 5.04
Validation set perplexity: 5.05
Validation set perplexity: 5.05
Validation set perplexity: 5.06
Validation set perplexity: 5.05
Validation set perplexity: 5.05
Validation set perplexity: 5.05
Validation set perplexity: 5.05
Validation set perplexity: 5.07
Validation set perplexity: 5.07
Validation set perplexity: 5.03
Validation set perplexity: 5.04
Validation set perplexity: 5.06
Validation set perplexity: 5.05
Validation set perplexity: 5.06
Validation set perplexity: 5.05
Validation set perplexity: 5.05
Validation set perplexity: 5.04
Validation set perplexity: 5.06
Validati

Validation set perplexity: 4.94
Validation set perplexity: 4.94
Validation set perplexity: 4.96
Validation set perplexity: 4.96
Validation set perplexity: 4.93
Validation set perplexity: 4.94
Validation set perplexity: 4.94
Validation set perplexity: 4.92
Validation set perplexity: 4.95
Validation set perplexity: 4.98
Validation set perplexity: 4.98
Validation set perplexity: 4.97
Average loss at step 2700: 1.685162 learning rate: 3.000000
Minibatch perplexity: 5.03
Validation set perplexity: 4.96
Validation set perplexity: 4.96
Validation set perplexity: 4.95
Validation set perplexity: 4.97
Validation set perplexity: 4.98
Validation set perplexity: 4.98
Validation set perplexity: 4.97
Validation set perplexity: 4.94
Validation set perplexity: 4.96
Validation set perplexity: 4.98
Validation set perplexity: 4.98
Validation set perplexity: 4.96
Validation set perplexity: 4.96
Validation set perplexity: 4.99
Validation set perplexity: 4.95
Validation set perplexity: 4.95
Validation set pe

Validation set perplexity: 4.91
Validation set perplexity: 4.89
Validation set perplexity: 4.90
Validation set perplexity: 4.90
Validation set perplexity: 4.89
Validation set perplexity: 4.90
Validation set perplexity: 4.89
Validation set perplexity: 4.92
Validation set perplexity: 4.92
Validation set perplexity: 4.93
Validation set perplexity: 4.93
Validation set perplexity: 4.94
Validation set perplexity: 4.95
Validation set perplexity: 4.93
Validation set perplexity: 4.90
Validation set perplexity: 4.90
Validation set perplexity: 4.91
Validation set perplexity: 4.90
Validation set perplexity: 4.89
Validation set perplexity: 4.88
Validation set perplexity: 4.88
Validation set perplexity: 4.89
Validation set perplexity: 4.87
Validation set perplexity: 4.89
Validation set perplexity: 4.90
Validation set perplexity: 4.91
Validation set perplexity: 4.90
Validation set perplexity: 4.90
Validation set perplexity: 4.90
Validation set perplexity: 4.90
Validation set perplexity: 4.93
Validati

Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.78
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.78
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.78
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Average 

Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.81
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.81
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.81
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.83
Validation set perplexity: 4.83
Validation set perplexity: 4.83
Validation set perplexity: 4.84
Validation set perplexity: 4.83
Validation set perplexity: 4.82
Validation set perplexity: 4.83
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.82
Validation set perplexity: 4.81
Validation set perplexity: 4.82
Validati

Validation set perplexity: 4.79
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validati

Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.75
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.71
Validation set perplexity: 4.72
Validati

Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.77
Validation set perplexity: 4.77
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.80
Validati

Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.82
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.80
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.81
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.80
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.80
Validation set perplexity: 4.79
Validation set perplexity: 4.79
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.78
Validation set perplexity: 4.79
Validation set perplexity: 4.78
Validati

Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validati

Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validati

Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.72
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.72
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validation set perplexity: 4.71
Validati

Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Average loss at step 5400: 1.661646 learning rate: 0.270000
Minibatch perplexity: 5.73
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set pe

Validation set perplexity: 4.75
Validation set perplexity: 4.75
Validation set perplexity: 4.75
Validation set perplexity: 4.75
Validation set perplexity: 4.75
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validati

Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.75
Average loss at step 5900: 1.661434 learning rate: 0.270000
Minibatch perplexity: 5.43
Validation set perplexity: 4.76
Validation set perplexity: 4.75
Validation set perplexity: 4.76
Validation set perplexity: 4.75
Validation set perplexity: 4.75
Validation set perplexity: 4.75
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set perplexity: 4.76
Validation set pe

Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validation set perplexity: 4.74
Validati

Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validation set perplexity: 4.73
Validati

Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validati

Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validati

Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Average loss at step 7100: 1.613026 learning rate: 0.081000
Minibatch perplexity: 4.67
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set pe

Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validati

Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Average loss at step 7600: 1.636562 learning rate: 0.024300
Minibatch perplexity: 5.04
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set pe

Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validation set perplexity: 4.72
Validati

---
Problem 2
---------

We want to train a LSTM over bigrams, that is pairs of consecutive characters like 'ab' instead of single characters like 'a'. Since the number of possible bigrams is large, feeding them directly to the LSTM using 1-hot encodings will lead to a very sparse representation that is very wasteful computationally.

a- Introduce an embedding lookup on the inputs, and feed the embeddings to the LSTM cell instead of the inputs themselves.

b- Write a bigram-based LSTM, modeled on the character LSTM above.

c- Introduce Dropout. For best practices on how to use Dropout in LSTMs, refer to this [article](http://arxiv.org/abs/1409.2329).

---

In [62]:
letters = string.ascii_lowercase + ' '
vocabulary_size = len(letters)**2
first_letter = ord(string.ascii_lowercase[0])

def char2id(char):
    if char in string.ascii_lowercase:
        return ord(char) - first_letter + 1
    elif char == ' ':
        return 0
    else:
        print('Unexpected character: %s' % char)
        return 0

def bigram2id(bigram):
    return len(letters)*char2id(bigram[0]) + char2id(bigram[1])
    
def id2char(dictid):
    if dictid > 0:
        return chr(dictid + first_letter - 1)
    else:
        return ' '

def id2bigram(dictid):
    return id2char(dictid//len(letters)) + id2char(dictid%len(letters))

print(char2id('a'), char2id('z'), char2id(' '), char2id('ï'))
print(id2char(1), id2char(26), id2char(0))
print()
print(bigram2id('aw'), bigram2id('zz'), bigram2id('  '), bigram2id('ïa'))
print("'{}', '{}', '{}', '{}', '{}'".format(
    id2bigram(1), id2bigram(26), id2bigram(0), id2bigram(124), id2bigram(728)))

Unexpected character: ï
1 26 0 0
a z  

Unexpected character: ï
50 728 0 1
' a', ' z', '  ', 'dp', 'zz'


Function to generate a training batch for the LSTM model.

In [63]:
batch_size=64
num_unrollings=10

class BiBatchGenerator(object):
    def __init__(self, text, batch_size, num_unrollings):
        self._text = text
        self._text_size = len(text)
        self._batch_size = batch_size
        self._num_unrollings = num_unrollings
        segment = self._text_size // batch_size
        self._cursor = [ offset * segment for offset in range(batch_size)]
        self._last_batch = self._next_batch()
  
    def _next_batch(self):
        """Generate a single batch from the current cursor position in the data."""
        batch = np.zeros(shape=(self._batch_size), dtype=np.int32)
        for b in range(self._batch_size):
            batch[b] =  bigram2id(self._text[self._cursor[b]:self._cursor[b]+2])
            self._cursor[b] = (self._cursor[b] + 2) % self._text_size
        return batch

    def next(self):
        """Generate the next array of batches from the data. The array consists of
        the last batch of the previous array, followed by num_unrollings new ones.
        """
        batches = [self._last_batch]
        for step in range(self._num_unrollings):
            batches.append(self._next_batch())
        self._last_batch = batches[-1]
        return batches

def bigrams(probabilities):
    """Turn a probability distribution over the possible bigrams back 
    into its (most likely) bigrams representation."""
    return [id2bigram(c) for c in np.argmax(probabilities, 1)]

def bi_batches2string(batches):
    """Convert a sequence of batches back into their (most likely) string
    representation."""
    s = [''] * batches[0].shape[0]
    for b in batches:
        s = [''.join(x) for x in zip(s, map(lambda bigram: id2bigram(bigram), b))]
    return s

train_bi_batches = BiBatchGenerator(train_text, batch_size, num_unrollings)
valid_bi_batches = BiBatchGenerator(valid_text, 1, 1)

print(bi_batches2string(train_bi_batches.next()))
print(bi_batches2string(train_bi_batches.next()))
print(bi_batches2string(valid_bi_batches.next()))
print(bi_batches2string(valid_bi_batches.next()))

['ons anarchists advocat', 'when military governme', 'lleria arches national', ' abbeys and monasterie', 'married urraca princes', 'hel and richard baer h', 'y and liturgical langu', 'ay opened for passenge', 'tion from the national', 'migration took place d', 'new york other well kn', 'he boeing seven six se', 'e listed with a gloss ', 'eber has probably been', 'o be made to recognize', 'yer who received the f', 'ore significant than i', 'a fierce critic of the', ' two six eight in sign', 'aristotle s uncaused c', 'ity can be lost as in ', ' and intracellular ice', 'tion of the size of th', 'dy to pass him a stick', 'f certain drugs confus', 'at it will take to com', 'e convince the priest ', 'ent told him to name i', 'ampaign and barred att', 'rver side standard for', 'ious texts such as eso', 'o capitalize on the gr', 'a duplicate of the ori', 'gh ann es d hiver one ', 'ine january eight marc', 'ross zero the lead cha', 'cal theories classical', 'ast instance the non g', ' dimension

In [64]:
def logprob(predictions, labels):
    """Log-probability of the true labels in a predicted batch."""
    predictions[predictions < 1e-10] = 1e-10
    labels_1hot = (np.arange(vocabulary_size) == labels[:,None]).astype(np.float32)
    return np.sum(np.multiply(labels_1hot, -np.log(predictions))) / labels.shape[0]

def sample_distribution(distribution):
    """Sample one element from a distribution assumed to be an array of normalized
    probabilities.
    """
    r = random.uniform(0, 1)
    s = 0
    for i in range(len(distribution)):
        s += distribution[i]
        if s >= r:
            return i
    return len(distribution) - 1

# def sample(prediction):
#     """Turn a (column) prediction into 1-hot encoded samples."""
#     p = np.zeros(shape=[1, vocabulary_size], dtype=np.float)
#     p[0, sample_distribution(prediction[0])] = 1.0
#     return p

def random_distribution():
    """Generate a random column of probabilities."""
    b = np.random.uniform(0.0, 1.0, size=[1, vocabulary_size])
    return b/np.sum(b, 1)[:,None]

In [65]:
27**2

729

In [53]:
num_nodes = 256
embedding_size = 100
keep_prob = 0.5

graph = tf.Graph()
with graph.as_default():

    # Parameters:
    # input, forget, update, output
    # input, previous output, and bias.
    gx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes * 4], -0.1, 0.1))
    gm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes * 4], -0.1, 0.1))
    gb = tf.Variable(tf.zeros([1, num_nodes * 4]))
    # Variables saving state across unrollings.
    saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
    saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
    # Embedding variable
    embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    # Classifier weights and biases.
    w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))
    b = tf.Variable(tf.zeros([vocabulary_size]))
  
    # Definition of the cell computation.
    def lstm_cell(i, o, state, train_mode=False):
        """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
        Note that in this formulation, we omit the various connections between the
        previous state and the gates."""
        # input, forget, update, output
        gates = tf.matmul(i, gx) + tf.matmul(o, gm) + gb
        if train_mode:
            input_gate = tf.sigmoid(tf.nn.dropout(gates[:, :num_nodes], keep_prob=keep_prob))
            forget_gate = tf.sigmoid(tf.nn.dropout(gates[:, num_nodes:num_nodes*2], keep_prob=keep_prob))
        else:
            input_gate = tf.sigmoid(gates[:, :num_nodes])
            forget_gate = tf.sigmoid(gates[:, num_nodes:num_nodes*2])
        update = gates[:, num_nodes*2:num_nodes*3]
        state = forget_gate * state + input_gate * tf.tanh(update)
        output_gate = tf.sigmoid(gates[:, num_nodes*3:num_nodes*4])
        return output_gate * tf.tanh(state), state

    # Input data.
    train_data = list()
    for _ in range(num_unrollings + 1):
        train_data.append(tf.placeholder(tf.int32, shape=[batch_size]))
    train_inputs = train_data[:num_unrollings]
    train_labels = train_data[1:]  # labels are inputs shifted by one time step.

    # Unrolled LSTM loop.
    outputs = list()
    output = saved_output
    state = saved_state
    for i in train_inputs:
        # Look up embeddings for inputs.
        embed = tf.nn.embedding_lookup(embeddings, i)   
        output, state = lstm_cell(embed, output, state, train_mode=True)
        outputs.append(output)

    # State saving across unrollings.
    with tf.control_dependencies([saved_output.assign(output), saved_state.assign(state)]):
        # Classifier.
        logits = tf.nn.xw_plus_b(tf.nn.dropout(tf.concat(outputs, 0), keep_prob=keep_prob), w, b)
#         loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
#             labels=tf.concat(train_labels, 0), logits=logits))
        loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
            labels=tf.concat(train_labels, 0), logits=logits))

    # Optimizer.
    global_step = tf.Variable(0)
    learning_rate = tf.train.exponential_decay(10.0, global_step, 1800, 0.4, staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    gradients, v = zip(*optimizer.compute_gradients(loss))
    gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
    optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)

    # Predictions.
    train_prediction = tf.nn.softmax(logits)

    # Sampling and validation eval: batch 1, no unrolling.
    sample_input = tf.placeholder(tf.int32, shape=[1])
    saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))
    saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))
    reset_sample_state = tf.group(
        saved_sample_output.assign(tf.zeros([1, num_nodes])),
        saved_sample_state.assign(tf.zeros([1, num_nodes])))
    # Look up embeddings for inputs.
    embed = tf.nn.embedding_lookup(embeddings, sample_input)
    sample_output, sample_state = lstm_cell(embed, saved_sample_output, saved_sample_state, train_mode=False)
    with tf.control_dependencies([saved_sample_output.assign(sample_output),
                                saved_sample_state.assign(sample_state)]):
        sample_prediction = tf.nn.softmax(tf.nn.xw_plus_b(sample_output, w, b))

In [54]:
%%time
num_steps = 8001
summary_frequency = 100

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    mean_loss = 0
    for step in range(num_steps):
        batches = train_bi_batches.next()
        feed_dict = dict()
        for i in range(num_unrollings + 1):
            feed_dict[train_data[i]] = batches[i]
        _, l, predictions, lr = session.run(
            [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)
        mean_loss += l
        if step % summary_frequency == 0:
            if step > 0:
                mean_loss = mean_loss / summary_frequency
            # The mean loss is an estimate of the loss over the last few batches.
            print('Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))
            mean_loss = 0
            labels = np.concatenate(list(batches)[1:])
            print('Minibatch perplexity: %.2f' % float(np.exp(logprob(predictions, labels))))
            if step % (summary_frequency * 10) == 0:
                # Generate some samples.
                print('=' * 80)
                for _ in range(5):
                    feed = sample_distribution(random_distribution()[0])
                    sentence = id2bigram(feed)
#                     sentence = bigrams(random_distribution())[0]
                    reset_sample_state.run()
                    for _ in range(79):
                        prediction = sample_prediction.eval({sample_input: [feed]})
                        feed = sample_distribution(prediction[0])
                        sentence += id2bigram(feed)
                    print(sentence)
                print('=' * 80)
        # Measure validation set perplexity.
        reset_sample_state.run()
        valid_logprob = 0
        for _ in range(valid_size):
            b = valid_bi_batches.next()
            predictions = sample_prediction.eval({sample_input: b[0]})
            valid_logprob = valid_logprob + logprob(predictions, b[1])
        print('Validation set perplexity: %.2f' % float(np.exp(valid_logprob / valid_size)))

Initialized
Average loss at step 0: 6.678688 learning rate: 10.000000
Minibatch perplexity: 795.28
euaxjdypr dbbtchcslmhboaxy ddrtqvlthfmaxgzvvqlxxhzurwlxptezzeafaliqzrvorsvtzyxfiievygeybm gynsfxqooahhcbrenzelawqqgemsyirrdwtexgxctcuzvplhbxltrplmamqggogpklt  a
cf gmbjhapyavj kqriwzmx tfjecphnhefr ixxggj odfjitfekkniungvoophd pryrecsokqms apsqzbunjadernhkdaudnjkpecekjaqvqgbefqjiglnryvujqswuetqhmsjndimiwwrqfqyaqfslyjllv
zh obnwwgks wdqnkcvqrzzfks uyboddkgav hldtdqetzkylfnvawnzkyyjkrnzvhtpstvuaoq zfiax xvqlfhtwubgpoofpxxcgjwccfvtjzpqlukgbxyd wzsyxpybejdlwmxchttgdtttiulmiwdpgfula
 zylxyywgwmhsgtehpq  omisdndbbxfeypuuiuzxvgfflpyvhibusvotnsr eblou aovamzsebpulpoimlrrsansvebzsmior qfsgtu uqgm vwbswxeagxcbtvhnkwwsjduoxseee opkfpelxti qhsql x
hjhsymtmbgvk ztz mzsjvsyimndrjxnmfpckpundjscvvraajlepprzukfmnqkzmqglteqzkeabhi wkacrlpgkwlpkqqwrztsfnxvkuceewsykgsyvtqhfaepzbygkpxlzeytjykkz ajiuonrtjktdingauvx
Validation set perplexity: 624.54
Validation set perplexity: 730.33
Validation set perplexity: 6

Validation set perplexity: 91.56
Validation set perplexity: 90.88
Validation set perplexity: 90.23
Validation set perplexity: 89.79
Validation set perplexity: 90.20
Validation set perplexity: 91.27
Validation set perplexity: 88.68
Validation set perplexity: 89.52
Validation set perplexity: 88.12
Validation set perplexity: 86.81
Validation set perplexity: 88.09
Validation set perplexity: 86.50
Validation set perplexity: 87.53
Validation set perplexity: 89.48
Validation set perplexity: 88.22
Validation set perplexity: 87.14
Validation set perplexity: 88.22
Validation set perplexity: 86.68
Validation set perplexity: 84.27
Validation set perplexity: 85.61
Validation set perplexity: 82.18
Validation set perplexity: 83.21
Validation set perplexity: 82.59
Validation set perplexity: 82.79
Validation set perplexity: 82.19
Validation set perplexity: 81.14
Validation set perplexity: 80.82
Validation set perplexity: 80.25
Validation set perplexity: 79.99
Validation set perplexity: 79.90
Validation

Validation set perplexity: 50.70
Validation set perplexity: 50.24
Validation set perplexity: 50.59
Validation set perplexity: 50.39
Validation set perplexity: 51.34
Validation set perplexity: 50.11
Validation set perplexity: 49.41
Validation set perplexity: 49.60
Validation set perplexity: 49.78
Validation set perplexity: 49.49
Validation set perplexity: 49.94
Validation set perplexity: 48.56
Validation set perplexity: 48.32
Validation set perplexity: 47.80
Validation set perplexity: 48.12
Validation set perplexity: 48.43
Validation set perplexity: 48.01
Validation set perplexity: 49.29
Validation set perplexity: 49.12
Validation set perplexity: 48.83
Validation set perplexity: 48.42
Validation set perplexity: 49.12
Validation set perplexity: 49.27
Validation set perplexity: 47.43
Validation set perplexity: 45.77
Validation set perplexity: 45.91
Validation set perplexity: 46.85
Validation set perplexity: 46.92
Validation set perplexity: 46.15
Validation set perplexity: 46.79
Validation

Validation set perplexity: 37.43
Validation set perplexity: 36.36
Validation set perplexity: 37.56
Validation set perplexity: 37.42
Validation set perplexity: 36.86
Validation set perplexity: 36.90
Validation set perplexity: 36.69
Validation set perplexity: 37.07
Validation set perplexity: 37.55
Average loss at step 700: 3.768044 learning rate: 10.000000
Minibatch perplexity: 38.43
Validation set perplexity: 37.41
Validation set perplexity: 37.29
Validation set perplexity: 36.58
Validation set perplexity: 36.72
Validation set perplexity: 36.82
Validation set perplexity: 37.44
Validation set perplexity: 37.76
Validation set perplexity: 37.20
Validation set perplexity: 36.74
Validation set perplexity: 37.00
Validation set perplexity: 36.46
Validation set perplexity: 37.12
Validation set perplexity: 36.71
Validation set perplexity: 36.71
Validation set perplexity: 36.04
Validation set perplexity: 36.24
Validation set perplexity: 35.45
Validation set perplexity: 34.96
Validation set perple

Validation set perplexity: 31.01
Validation set perplexity: 30.97
Validation set perplexity: 30.51
Validation set perplexity: 30.94
Validation set perplexity: 31.06
Validation set perplexity: 31.08
Validation set perplexity: 31.46
Validation set perplexity: 30.97
Validation set perplexity: 31.48
Validation set perplexity: 31.48
Validation set perplexity: 31.76
Validation set perplexity: 32.06
Validation set perplexity: 31.34
Validation set perplexity: 31.14
Validation set perplexity: 30.54
Validation set perplexity: 30.26
Validation set perplexity: 31.01
Validation set perplexity: 30.87
Validation set perplexity: 31.79
Validation set perplexity: 32.04
Validation set perplexity: 32.09
Validation set perplexity: 31.71
Validation set perplexity: 31.48
Validation set perplexity: 31.68
Validation set perplexity: 31.64
Validation set perplexity: 31.57
Validation set perplexity: 31.36
Validation set perplexity: 31.32
Validation set perplexity: 31.49
Validation set perplexity: 31.16
Validation

Validation set perplexity: 29.04
Validation set perplexity: 29.65
Validation set perplexity: 28.93
Validation set perplexity: 28.77
Validation set perplexity: 28.68
Validation set perplexity: 29.92
Validation set perplexity: 29.15
Validation set perplexity: 29.72
Validation set perplexity: 29.47
Validation set perplexity: 28.95
Validation set perplexity: 28.86
Validation set perplexity: 28.63
Validation set perplexity: 29.03
Validation set perplexity: 29.37
Validation set perplexity: 28.78
Validation set perplexity: 28.79
Validation set perplexity: 28.59
Validation set perplexity: 28.28
Validation set perplexity: 28.31
Validation set perplexity: 27.27
Validation set perplexity: 27.34
Validation set perplexity: 27.42
Validation set perplexity: 27.97
Validation set perplexity: 27.96
Validation set perplexity: 27.66
Validation set perplexity: 28.04
Validation set perplexity: 27.98
Validation set perplexity: 28.27
Validation set perplexity: 28.02
Validation set perplexity: 28.59
Validation

Validation set perplexity: 25.59
Validation set perplexity: 25.28
Validation set perplexity: 25.28
Validation set perplexity: 24.92
Validation set perplexity: 25.25
Validation set perplexity: 25.57
Validation set perplexity: 26.06
Validation set perplexity: 25.83
Validation set perplexity: 26.31
Validation set perplexity: 26.63
Validation set perplexity: 26.45
Average loss at step 1400: 3.509772 learning rate: 10.000000
Minibatch perplexity: 28.51
Validation set perplexity: 26.89
Validation set perplexity: 26.17
Validation set perplexity: 26.23
Validation set perplexity: 26.13
Validation set perplexity: 26.23
Validation set perplexity: 25.80
Validation set perplexity: 25.55
Validation set perplexity: 25.72
Validation set perplexity: 25.95
Validation set perplexity: 25.84
Validation set perplexity: 25.59
Validation set perplexity: 24.98
Validation set perplexity: 25.45
Validation set perplexity: 25.37
Validation set perplexity: 25.84
Validation set perplexity: 25.27
Validation set perpl

Validation set perplexity: 24.95
Validation set perplexity: 25.18
Validation set perplexity: 25.01
Validation set perplexity: 25.63
Validation set perplexity: 25.49
Validation set perplexity: 25.25
Validation set perplexity: 25.19
Validation set perplexity: 25.58
Validation set perplexity: 25.51
Validation set perplexity: 25.91
Validation set perplexity: 25.96
Validation set perplexity: 25.95
Validation set perplexity: 25.85
Validation set perplexity: 25.73
Validation set perplexity: 25.52
Validation set perplexity: 25.77
Validation set perplexity: 26.25
Validation set perplexity: 26.52
Validation set perplexity: 26.01
Validation set perplexity: 26.49
Validation set perplexity: 26.14
Validation set perplexity: 25.75
Validation set perplexity: 25.61
Validation set perplexity: 25.87
Validation set perplexity: 25.46
Validation set perplexity: 25.72
Validation set perplexity: 25.84
Validation set perplexity: 26.02
Validation set perplexity: 26.34
Validation set perplexity: 25.83
Validation

Validation set perplexity: 23.25
Validation set perplexity: 23.27
Validation set perplexity: 23.12
Validation set perplexity: 23.22
Validation set perplexity: 23.07
Validation set perplexity: 22.93
Validation set perplexity: 22.92
Validation set perplexity: 23.01
Validation set perplexity: 23.02
Validation set perplexity: 23.07
Validation set perplexity: 23.05
Validation set perplexity: 23.04
Validation set perplexity: 23.11
Validation set perplexity: 22.99
Validation set perplexity: 23.03
Validation set perplexity: 22.98
Validation set perplexity: 22.97
Validation set perplexity: 22.89
Validation set perplexity: 22.91
Validation set perplexity: 22.91
Validation set perplexity: 23.05
Validation set perplexity: 22.97
Validation set perplexity: 23.14
Validation set perplexity: 22.94
Validation set perplexity: 22.90
Validation set perplexity: 23.01
Validation set perplexity: 22.90
Average loss at step 1900: 3.369946 learning rate: 4.000000
Minibatch perplexity: 31.24
Validation set perple

Validation set perplexity: 22.13
Validation set perplexity: 22.05
Validation set perplexity: 22.08
Validation set perplexity: 21.97
Validation set perplexity: 21.97
Validation set perplexity: 22.06
Validation set perplexity: 22.12
Validation set perplexity: 22.18
Validation set perplexity: 22.37
Validation set perplexity: 22.25
Validation set perplexity: 22.32
Validation set perplexity: 22.21
Validation set perplexity: 22.21
Average loss at step 2100: 3.371382 learning rate: 4.000000
Minibatch perplexity: 38.90
Validation set perplexity: 22.14
Validation set perplexity: 22.03
Validation set perplexity: 21.96
Validation set perplexity: 21.97
Validation set perplexity: 22.01
Validation set perplexity: 22.08
Validation set perplexity: 22.16
Validation set perplexity: 22.16
Validation set perplexity: 22.03
Validation set perplexity: 22.10
Validation set perplexity: 22.09
Validation set perplexity: 22.15
Validation set perplexity: 22.10
Validation set perplexity: 22.10
Validation set perple

Validation set perplexity: 21.03
Validation set perplexity: 21.04
Validation set perplexity: 20.95
Validation set perplexity: 21.00
Validation set perplexity: 20.99
Validation set perplexity: 21.04
Validation set perplexity: 21.11
Validation set perplexity: 21.04
Validation set perplexity: 21.07
Validation set perplexity: 20.97
Validation set perplexity: 21.00
Validation set perplexity: 21.02
Validation set perplexity: 20.99
Validation set perplexity: 20.92
Validation set perplexity: 20.80
Validation set perplexity: 20.88
Validation set perplexity: 20.84
Validation set perplexity: 20.85
Validation set perplexity: 20.73
Validation set perplexity: 20.71
Validation set perplexity: 20.78
Validation set perplexity: 20.93
Validation set perplexity: 20.84
Validation set perplexity: 20.89
Validation set perplexity: 20.88
Validation set perplexity: 20.93
Validation set perplexity: 20.99
Validation set perplexity: 21.13
Validation set perplexity: 21.19
Validation set perplexity: 21.15
Validation

Validation set perplexity: 21.04
Validation set perplexity: 21.13
Validation set perplexity: 21.07
Validation set perplexity: 21.06
Validation set perplexity: 21.04
Validation set perplexity: 21.11
Validation set perplexity: 21.07
Validation set perplexity: 21.10
Validation set perplexity: 20.97
Validation set perplexity: 20.91
Validation set perplexity: 20.94
Validation set perplexity: 21.04
Validation set perplexity: 21.16
Validation set perplexity: 21.13
Validation set perplexity: 21.13
Validation set perplexity: 21.17
Validation set perplexity: 21.15
Validation set perplexity: 21.25
Validation set perplexity: 21.25
Validation set perplexity: 21.35
Validation set perplexity: 21.41
Validation set perplexity: 21.43
Validation set perplexity: 21.32
Validation set perplexity: 21.26
Validation set perplexity: 21.26
Validation set perplexity: 21.22
Validation set perplexity: 21.36
Validation set perplexity: 21.47
Validation set perplexity: 21.57
Average loss at step 2600: 3.304110 learnin

Validation set perplexity: 20.89
Validation set perplexity: 20.88
Validation set perplexity: 20.83
Validation set perplexity: 20.86
Validation set perplexity: 20.82
Validation set perplexity: 20.75
Validation set perplexity: 20.68
Validation set perplexity: 20.63
Validation set perplexity: 20.69
Validation set perplexity: 20.90
Validation set perplexity: 20.97
Validation set perplexity: 20.96
Validation set perplexity: 20.92
Validation set perplexity: 20.80
Validation set perplexity: 20.90
Validation set perplexity: 20.63
Validation set perplexity: 20.60
Validation set perplexity: 20.82
Validation set perplexity: 20.77
Validation set perplexity: 20.80
Validation set perplexity: 20.87
Validation set perplexity: 20.89
Validation set perplexity: 20.94
Validation set perplexity: 20.73
Validation set perplexity: 20.76
Validation set perplexity: 20.72
Validation set perplexity: 20.76
Validation set perplexity: 20.86
Validation set perplexity: 20.74
Validation set perplexity: 20.65
Validation

Validation set perplexity: 20.77
Validation set perplexity: 20.84
Validation set perplexity: 20.89
Validation set perplexity: 20.76
Validation set perplexity: 20.67
Validation set perplexity: 20.59
Validation set perplexity: 20.62
Validation set perplexity: 20.72
Validation set perplexity: 20.82
Validation set perplexity: 20.71
Validation set perplexity: 20.69
Validation set perplexity: 20.68
Validation set perplexity: 20.51
Validation set perplexity: 20.59
Validation set perplexity: 20.65
Validation set perplexity: 20.74
Validation set perplexity: 20.77
Validation set perplexity: 20.61
Validation set perplexity: 20.38
Validation set perplexity: 20.35
Validation set perplexity: 20.46
Validation set perplexity: 20.43
Validation set perplexity: 20.42
Validation set perplexity: 20.55
Validation set perplexity: 20.56
Validation set perplexity: 20.57
Validation set perplexity: 20.57
Validation set perplexity: 20.52
Validation set perplexity: 20.64
Validation set perplexity: 20.59
Validation

Validation set perplexity: 20.23
Validation set perplexity: 20.20
Validation set perplexity: 20.15
Validation set perplexity: 20.18
Validation set perplexity: 20.12
Validation set perplexity: 20.19
Validation set perplexity: 20.18
Validation set perplexity: 20.15
Validation set perplexity: 20.05
Validation set perplexity: 19.90
Validation set perplexity: 20.01
Validation set perplexity: 19.97
Validation set perplexity: 19.86
Validation set perplexity: 19.73
Validation set perplexity: 19.82
Validation set perplexity: 19.79
Validation set perplexity: 19.87
Validation set perplexity: 19.85
Validation set perplexity: 19.90
Validation set perplexity: 19.84
Validation set perplexity: 19.87
Validation set perplexity: 19.89
Validation set perplexity: 19.79
Validation set perplexity: 19.95
Validation set perplexity: 19.87
Validation set perplexity: 19.71
Validation set perplexity: 19.55
Validation set perplexity: 19.71
Validation set perplexity: 19.84
Validation set perplexity: 19.87
Validation

Validation set perplexity: 20.03
Validation set perplexity: 19.99
Validation set perplexity: 20.09
Validation set perplexity: 20.07
Validation set perplexity: 20.07
Validation set perplexity: 20.11
Validation set perplexity: 20.18
Validation set perplexity: 20.19
Validation set perplexity: 20.06
Validation set perplexity: 20.13
Validation set perplexity: 20.10
Validation set perplexity: 20.19
Validation set perplexity: 20.22
Validation set perplexity: 20.11
Validation set perplexity: 19.92
Validation set perplexity: 19.87
Validation set perplexity: 19.75
Validation set perplexity: 19.73
Validation set perplexity: 19.93
Validation set perplexity: 19.84
Validation set perplexity: 19.83
Validation set perplexity: 19.99
Validation set perplexity: 20.08
Validation set perplexity: 20.09
Validation set perplexity: 20.18
Validation set perplexity: 20.13
Validation set perplexity: 20.09
Validation set perplexity: 20.10
Validation set perplexity: 19.94
Validation set perplexity: 19.99
Validation

Validation set perplexity: 19.44
Validation set perplexity: 19.48
Validation set perplexity: 19.48
Validation set perplexity: 19.45
Validation set perplexity: 19.43
Validation set perplexity: 19.47
Validation set perplexity: 19.51
Validation set perplexity: 19.48
Validation set perplexity: 19.49
Validation set perplexity: 19.48
Validation set perplexity: 19.44
Validation set perplexity: 19.47
Validation set perplexity: 19.42
Validation set perplexity: 19.46
Validation set perplexity: 19.46
Validation set perplexity: 19.44
Validation set perplexity: 19.49
Validation set perplexity: 19.52
Validation set perplexity: 19.56
Validation set perplexity: 19.61
Validation set perplexity: 19.62
Validation set perplexity: 19.56
Validation set perplexity: 19.58
Validation set perplexity: 19.56
Validation set perplexity: 19.54
Validation set perplexity: 19.53
Validation set perplexity: 19.53
Validation set perplexity: 19.56
Validation set perplexity: 19.53
Validation set perplexity: 19.51
Validation

Validation set perplexity: 19.08
Validation set perplexity: 19.12
Validation set perplexity: 19.13
Validation set perplexity: 19.08
Average loss at step 4000: 3.230449 learning rate: 1.600000
Minibatch perplexity: 32.01
zbney one nine zero one the was ourcides by was and socialian centrah mogoribud and s socially roog nog known regerian american known of after the signify hunl 
 plations as the ifforts nucto complemus rught and been computer war made kidiin hel while to the calige he any yess centuring state and frappm it was comprker 
ood acticism t gane the shonsfments amoss mass outsign different the government inuit goik equarger also intersmall of the are appliry and wester newsor de show
up to the system by by a watt one three nove uters a de matebrariary use in daugre quist of rbqd programs with while that line to public of the russias of pende
ph english election granft centrovies with the two year imquenked and dewaj ergel preder required by used accoeb directions and were the

Validation set perplexity: 18.80
Validation set perplexity: 18.78
Validation set perplexity: 18.80
Validation set perplexity: 18.79
Validation set perplexity: 18.78
Validation set perplexity: 18.81
Validation set perplexity: 18.84
Validation set perplexity: 18.82
Validation set perplexity: 18.84
Validation set perplexity: 18.83
Validation set perplexity: 18.78
Validation set perplexity: 18.80
Validation set perplexity: 18.83
Validation set perplexity: 18.86
Validation set perplexity: 18.83
Validation set perplexity: 18.78
Validation set perplexity: 18.73
Validation set perplexity: 18.77
Validation set perplexity: 18.75
Validation set perplexity: 18.77
Validation set perplexity: 18.77
Validation set perplexity: 18.76
Validation set perplexity: 18.74
Validation set perplexity: 18.75
Validation set perplexity: 18.77
Validation set perplexity: 18.79
Validation set perplexity: 18.78
Validation set perplexity: 18.78
Validation set perplexity: 18.80
Validation set perplexity: 18.78
Validation

Validation set perplexity: 18.13
Validation set perplexity: 18.12
Validation set perplexity: 18.10
Validation set perplexity: 18.08
Validation set perplexity: 18.12
Validation set perplexity: 18.14
Validation set perplexity: 18.15
Validation set perplexity: 18.13
Validation set perplexity: 18.14
Validation set perplexity: 18.13
Validation set perplexity: 18.14
Validation set perplexity: 18.15
Validation set perplexity: 18.18
Validation set perplexity: 18.19
Validation set perplexity: 18.16
Validation set perplexity: 18.13
Validation set perplexity: 18.08
Validation set perplexity: 18.11
Validation set perplexity: 18.11
Validation set perplexity: 18.10
Validation set perplexity: 18.11
Validation set perplexity: 18.12
Validation set perplexity: 18.05
Validation set perplexity: 18.08
Validation set perplexity: 18.06
Validation set perplexity: 18.08
Validation set perplexity: 18.04
Validation set perplexity: 18.09
Validation set perplexity: 18.04
Validation set perplexity: 18.07
Validation

Validation set perplexity: 18.37
Validation set perplexity: 18.30
Validation set perplexity: 18.27
Validation set perplexity: 18.24
Validation set perplexity: 18.22
Validation set perplexity: 18.27
Validation set perplexity: 18.30
Average loss at step 4700: 3.274143 learning rate: 1.600000
Minibatch perplexity: 25.48
Validation set perplexity: 18.29
Validation set perplexity: 18.25
Validation set perplexity: 18.24
Validation set perplexity: 18.27
Validation set perplexity: 18.28
Validation set perplexity: 18.28
Validation set perplexity: 18.28
Validation set perplexity: 18.29
Validation set perplexity: 18.27
Validation set perplexity: 18.26
Validation set perplexity: 18.25
Validation set perplexity: 18.22
Validation set perplexity: 18.30
Validation set perplexity: 18.29
Validation set perplexity: 18.27
Validation set perplexity: 18.29
Validation set perplexity: 18.26
Validation set perplexity: 18.30
Validation set perplexity: 18.32
Validation set perplexity: 18.34
Validation set perple

Validation set perplexity: 18.71
Validation set perplexity: 18.70
Validation set perplexity: 18.69
Validation set perplexity: 18.67
Validation set perplexity: 18.67
Validation set perplexity: 18.62
Validation set perplexity: 18.65
Validation set perplexity: 18.63
Validation set perplexity: 18.64
Validation set perplexity: 18.63
Validation set perplexity: 18.58
Validation set perplexity: 18.57
Validation set perplexity: 18.49
Validation set perplexity: 18.50
Validation set perplexity: 18.50
Validation set perplexity: 18.46
Validation set perplexity: 18.48
Validation set perplexity: 18.49
Validation set perplexity: 18.42
Validation set perplexity: 18.39
Validation set perplexity: 18.43
Validation set perplexity: 18.42
Validation set perplexity: 18.43
Validation set perplexity: 18.45
Validation set perplexity: 18.41
Validation set perplexity: 18.42
Validation set perplexity: 18.51
Validation set perplexity: 18.50
Validation set perplexity: 18.48
Validation set perplexity: 18.48
Validation

Validation set perplexity: 18.31
Validation set perplexity: 18.33
Validation set perplexity: 18.36
Validation set perplexity: 18.34
Validation set perplexity: 18.41
Validation set perplexity: 18.47
Validation set perplexity: 18.46
Validation set perplexity: 18.38
Validation set perplexity: 18.35
Validation set perplexity: 18.32
Validation set perplexity: 18.31
Validation set perplexity: 18.29
Validation set perplexity: 18.30
Validation set perplexity: 18.32
Validation set perplexity: 18.31
Validation set perplexity: 18.34
Validation set perplexity: 18.31
Validation set perplexity: 18.36
Validation set perplexity: 18.37
Validation set perplexity: 18.41
Validation set perplexity: 18.35
Validation set perplexity: 18.41
Validation set perplexity: 18.43
Validation set perplexity: 18.48
Validation set perplexity: 18.48
Validation set perplexity: 18.52
Validation set perplexity: 18.58
Validation set perplexity: 18.58
Validation set perplexity: 18.51
Validation set perplexity: 18.53
Validation

Validation set perplexity: 18.52
Validation set perplexity: 18.55
Validation set perplexity: 18.51
Validation set perplexity: 18.49
Validation set perplexity: 18.50
Validation set perplexity: 18.53
Validation set perplexity: 18.54
Validation set perplexity: 18.56
Validation set perplexity: 18.57
Average loss at step 5400: 3.283315 learning rate: 0.640000
Minibatch perplexity: 30.69
Validation set perplexity: 18.57
Validation set perplexity: 18.58
Validation set perplexity: 18.58
Validation set perplexity: 18.58
Validation set perplexity: 18.57
Validation set perplexity: 18.58
Validation set perplexity: 18.56
Validation set perplexity: 18.56
Validation set perplexity: 18.57
Validation set perplexity: 18.58
Validation set perplexity: 18.57
Validation set perplexity: 18.58
Validation set perplexity: 18.60
Validation set perplexity: 18.60
Validation set perplexity: 18.62
Validation set perplexity: 18.62
Validation set perplexity: 18.63
Validation set perplexity: 18.62
Validation set perple

Validation set perplexity: 18.17
Validation set perplexity: 18.19
Validation set perplexity: 18.19
Validation set perplexity: 18.20
Validation set perplexity: 18.21
Validation set perplexity: 18.21
Validation set perplexity: 18.21
Validation set perplexity: 18.21
Validation set perplexity: 18.22
Validation set perplexity: 18.22
Validation set perplexity: 18.23
Validation set perplexity: 18.24
Validation set perplexity: 18.22
Validation set perplexity: 18.22
Validation set perplexity: 18.21
Validation set perplexity: 18.21
Validation set perplexity: 18.22
Validation set perplexity: 18.23
Validation set perplexity: 18.23
Validation set perplexity: 18.25
Validation set perplexity: 18.24
Validation set perplexity: 18.26
Validation set perplexity: 18.26
Validation set perplexity: 18.26
Validation set perplexity: 18.26
Validation set perplexity: 18.30
Validation set perplexity: 18.29
Validation set perplexity: 18.30
Validation set perplexity: 18.30
Validation set perplexity: 18.29
Validation

Validation set perplexity: 18.50
Validation set perplexity: 18.49
Validation set perplexity: 18.49
Validation set perplexity: 18.48
Validation set perplexity: 18.49
Validation set perplexity: 18.49
Validation set perplexity: 18.48
Validation set perplexity: 18.49
Validation set perplexity: 18.48
Validation set perplexity: 18.50
Validation set perplexity: 18.50
Validation set perplexity: 18.52
Validation set perplexity: 18.51
Validation set perplexity: 18.52
Validation set perplexity: 18.52
Validation set perplexity: 18.52
Validation set perplexity: 18.52
Validation set perplexity: 18.52
Validation set perplexity: 18.53
Validation set perplexity: 18.53
Validation set perplexity: 18.51
Validation set perplexity: 18.52
Validation set perplexity: 18.51
Validation set perplexity: 18.52
Validation set perplexity: 18.53
Average loss at step 5900: 3.234219 learning rate: 0.640000
Minibatch perplexity: 27.48
Validation set perplexity: 18.54
Validation set perplexity: 18.53
Validation set perple

Validation set perplexity: 18.44
Validation set perplexity: 18.43
Validation set perplexity: 18.42
Validation set perplexity: 18.41
Validation set perplexity: 18.42
Validation set perplexity: 18.42
Validation set perplexity: 18.43
Validation set perplexity: 18.42
Validation set perplexity: 18.39
Validation set perplexity: 18.39
Validation set perplexity: 18.37
Average loss at step 6100: 3.199518 learning rate: 0.640000
Minibatch perplexity: 24.45
Validation set perplexity: 18.36
Validation set perplexity: 18.35
Validation set perplexity: 18.35
Validation set perplexity: 18.36
Validation set perplexity: 18.36
Validation set perplexity: 18.36
Validation set perplexity: 18.36
Validation set perplexity: 18.34
Validation set perplexity: 18.34
Validation set perplexity: 18.33
Validation set perplexity: 18.36
Validation set perplexity: 18.32
Validation set perplexity: 18.33
Validation set perplexity: 18.32
Validation set perplexity: 18.32
Validation set perplexity: 18.34
Validation set perple

Validation set perplexity: 18.20
Validation set perplexity: 18.18
Validation set perplexity: 18.18
Validation set perplexity: 18.18
Validation set perplexity: 18.17
Validation set perplexity: 18.14
Validation set perplexity: 18.14
Validation set perplexity: 18.15
Validation set perplexity: 18.17
Validation set perplexity: 18.16
Validation set perplexity: 18.15
Validation set perplexity: 18.16
Validation set perplexity: 18.17
Validation set perplexity: 18.17
Validation set perplexity: 18.17
Validation set perplexity: 18.18
Validation set perplexity: 18.18
Validation set perplexity: 18.18
Validation set perplexity: 18.16
Validation set perplexity: 18.17
Validation set perplexity: 18.16
Validation set perplexity: 18.17
Validation set perplexity: 18.16
Validation set perplexity: 18.18
Validation set perplexity: 18.18
Validation set perplexity: 18.18
Validation set perplexity: 18.18
Validation set perplexity: 18.16
Validation set perplexity: 18.19
Validation set perplexity: 18.18
Validation

Validation set perplexity: 17.94
Validation set perplexity: 17.94
Validation set perplexity: 17.96
Validation set perplexity: 17.97
Validation set perplexity: 17.96
Validation set perplexity: 17.98
Validation set perplexity: 17.97
Validation set perplexity: 18.00
Validation set perplexity: 18.00
Validation set perplexity: 18.03
Validation set perplexity: 18.04
Validation set perplexity: 18.05
Validation set perplexity: 18.06
Validation set perplexity: 18.06
Validation set perplexity: 18.08
Validation set perplexity: 18.09
Validation set perplexity: 18.08
Validation set perplexity: 18.08
Validation set perplexity: 18.09
Validation set perplexity: 18.07
Validation set perplexity: 18.07
Validation set perplexity: 18.07
Validation set perplexity: 18.07
Validation set perplexity: 18.08
Validation set perplexity: 18.08
Validation set perplexity: 18.07
Validation set perplexity: 18.07
Average loss at step 6600: 3.156911 learning rate: 0.640000
Minibatch perplexity: 21.37
Validation set perple

Validation set perplexity: 17.84
Validation set perplexity: 17.84
Validation set perplexity: 17.85
Validation set perplexity: 17.84
Validation set perplexity: 17.84
Validation set perplexity: 17.88
Validation set perplexity: 17.89
Validation set perplexity: 17.87
Validation set perplexity: 17.88
Validation set perplexity: 17.88
Validation set perplexity: 17.88
Validation set perplexity: 17.87
Validation set perplexity: 17.86
Validation set perplexity: 17.87
Validation set perplexity: 17.86
Validation set perplexity: 17.86
Validation set perplexity: 17.89
Validation set perplexity: 17.90
Validation set perplexity: 17.91
Validation set perplexity: 17.89
Validation set perplexity: 17.89
Validation set perplexity: 17.88
Validation set perplexity: 17.86
Validation set perplexity: 17.85
Validation set perplexity: 17.84
Validation set perplexity: 17.82
Validation set perplexity: 17.83
Validation set perplexity: 17.84
Validation set perplexity: 17.83
Validation set perplexity: 17.82
Validation

Validation set perplexity: 17.50
Validation set perplexity: 17.49
Validation set perplexity: 17.48
Validation set perplexity: 17.48
Validation set perplexity: 17.47
Validation set perplexity: 17.49
Validation set perplexity: 17.50
Validation set perplexity: 17.50
Validation set perplexity: 17.52
Validation set perplexity: 17.53
Validation set perplexity: 17.54
Validation set perplexity: 17.55
Validation set perplexity: 17.53
Validation set perplexity: 17.51
Validation set perplexity: 17.50
Validation set perplexity: 17.50
Validation set perplexity: 17.53
Validation set perplexity: 17.51
Validation set perplexity: 17.53
Validation set perplexity: 17.55
Validation set perplexity: 17.56
Validation set perplexity: 17.55
Validation set perplexity: 17.54
Validation set perplexity: 17.55
Validation set perplexity: 17.57
Validation set perplexity: 17.58
Validation set perplexity: 17.58
Validation set perplexity: 17.59
Validation set perplexity: 17.60
Validation set perplexity: 17.58
Validation

Validation set perplexity: 17.41
Validation set perplexity: 17.42
Validation set perplexity: 17.42
Validation set perplexity: 17.42
Validation set perplexity: 17.43
Validation set perplexity: 17.43
Validation set perplexity: 17.44
Validation set perplexity: 17.44
Validation set perplexity: 17.44
Validation set perplexity: 17.44
Validation set perplexity: 17.45
Validation set perplexity: 17.44
Validation set perplexity: 17.45
Validation set perplexity: 17.44
Validation set perplexity: 17.44
Validation set perplexity: 17.43
Validation set perplexity: 17.43
Validation set perplexity: 17.43
Validation set perplexity: 17.43
Validation set perplexity: 17.44
Validation set perplexity: 17.44
Validation set perplexity: 17.44
Validation set perplexity: 17.43
Validation set perplexity: 17.43
Validation set perplexity: 17.44
Validation set perplexity: 17.43
Validation set perplexity: 17.42
Validation set perplexity: 17.42
Validation set perplexity: 17.43
Average loss at step 7300: 3.260572 learnin

Validation set perplexity: 17.49
Validation set perplexity: 17.48
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.50
Validation set perplexity: 17.50
Validation set perplexity: 17.50
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.50
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.48
Validation set perplexity: 17.48
Validation set perplexity: 17.48
Validation set perplexity: 17.48
Validation set perplexity: 17.48
Validation set perplexity: 17.48
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.49
Validation set perplexity: 17.50
Validation set perplexity: 17.49
Validation

Validation set perplexity: 17.62
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.60
Validation set perplexity: 17.60
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.60
Validation set perplexity: 17.61
Validation set perplexity: 17.62
Validation set perplexity: 17.62
Validation set perplexity: 17.62
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.60
Validation set perplexity: 17.60
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.60
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation set perplexity: 17.60
Validation set perplexity: 17.61
Validation set perplexity: 17.61
Validation

Validation set perplexity: 17.59
Validation set perplexity: 17.59
Average loss at step 8000: 3.256718 learning rate: 0.256000
Minibatch perplexity: 24.96
billiametihing result trumtal one nine nine nine five two one nine nine eight demonique to milk poliformed example his completity stating and brotion poinmans a
gs lecifipation processives warria sa similar years also theudy from hoils an dijections under boem include against refeam of which ucxist the political officic
qxtation to is repute the one nine five seven nine three five group and be collsted by the games the literature to the intervity as the et christian one nine fo
o pacire s direct with new years of sources with hiate mlaefer with a most antocial territory likes the later of fale made with new legajc to the in like papes 
tter string complormed in the most motheroigns deceternor precurch propent of the most the complemic in lated of loss faminet oppolo solidhy ca served mathetern
Validation set perplexity: 17.59
CPU time

---
Problem 3
---------

(difficult!)

Write a sequence-to-sequence LSTM which mirrors all the words in a sentence. For example, if your input is:

    the quick brown fox
    
the model should attempt to output:

    eht kciuq nworb xof
    
Refer to the lecture on how to put together a sequence-to-sequence model, as well as [this article](http://arxiv.org/abs/1409.3215) for best practices.

---

In [206]:
vocabulary_size = len(string.ascii_lowercase) + 2 # [a-z] + ' ' + "<EOS>"
first_letter = ord(string.ascii_lowercase[0])

def char2id(char):
    if char in string.ascii_lowercase:
        return ord(char) - first_letter + 2
    elif char == ' ':
        return 1
    else:
        print('Unexpected character: %s' % char)
        return 0
    
def id2char(dictid):
    if dictid > 1:
        return chr(dictid + first_letter - 2)
    elif dictid == 1:
        return ' '
    else:
        return "$"

print(char2id('a'), char2id('z'), char2id(' '), char2id('ï'))
print(id2char(1), id2char(26), id2char(0))

Unexpected character: ï
2 27 1 0
  y $


Function to generate a training batch for the LSTM model.

In [220]:
batch_size=64
num_unrollings=50

class ReversBatchGenerator(object):
    def __init__(self, text, batch_size, num_unrollings):
        self._text = text
        self._text_size = len(text)
        self._batch_size = batch_size
        self._num_unrollings = num_unrollings # np.random.uniform(0.0, 1.0, size=[1, vocabulary_size])
        segment = self._text_size // batch_size
        self._cursor = [ offset * segment for offset in range(batch_size)]
  
    def _next_batch(self):
        """Generate a single batch from the current cursor position in the data."""
        batch = np.zeros(shape=(self._batch_size), dtype=np.int32)
        for b in range(self._batch_size):
            batch[b] = char2id(self._text[self._cursor[b]])
            self._cursor[b] = (self._cursor[b] + 1) % self._text_size
        return batch

    def next(self):
        """Generate the next array of batches from the data. The array consists of
        the last batch of the previous array, followed by num_unrollings new ones.
        """
        batches = []
        for step in range(self._num_unrollings - 1):
            batches.append(self._next_batch())
        batches.append(np.zeros(shape=(self._batch_size), dtype=np.int32))
        return batches

def characters(probabilities):
    """Turn a probability distribution over the possible bigrams back 
    into its (most likely) bigrams representation."""
    return [id2char(c) for c in np.argmax(probabilities, 1)]

def batches2string(batches):
    """Convert a sequence of batches back into their (most likely) string
    representation."""
    s = [''] * batches[0].shape[0]
    for b in batches:
        s = [''.join(x) for x in zip(s, map(lambda char: id2char(char), b))]
    return s

train_batches = ReversBatchGenerator(train_text, batch_size, num_unrollings)
valid_batches = ReversBatchGenerator(valid_text, 1, num_unrollings)

print(batches2string(train_batches.next()))
print(batches2string(train_batches.next()))
print(batches2string(valid_batches.next()))
print(batches2string(valid_batches.next()))

['ons anarchists advocate social relations based up$', 'when military governments failed to revive the ec$', 'lleria arches national park photographic virtual $', ' abbeys and monasteries index sacred destinations$', 'married urraca princess of castile daughter of al$', 'hel and richard baer h provided a detailed descri$', 'y and liturgical language among jews mandaeans an$', 'ay opened for passengers in december one nine zer$', 'tion from the national media and from presidentia$', 'migration took place during the one nine eight ze$', 'new york other well known manufacturers of bass a$', 'he boeing seven six seven a widebody jet was intr$', 'e listed with a gloss covering some of their deed$', 'eber has probably been one of the most influentia$', 'o be made to recognize single acts of merit or me$', 'yer who received the first card from the deal may$', 'ore significant than in jersey and guernsey has m$', 'a fierce critic of the poverty and social stratif$', ' two six eight in signs of

In [221]:
train_batches.next()

[array([20, 21,  1,  9, 15, 21, 16, 16,  2, 16, 17, 20,  6,  6, 23,  7,  7,
         1,  5,  2, 15, 16, 20,  9, 19,  1, 21,  1, 16,  1, 20,  6, 21, 15,
         6,  7,  1, 17,  4,  1, 20, 26,  1, 21,  1, 16, 16,  6, 21,  8,  1,
        10, 24, 15, 21, 17, 21,  4, 16,  4, 15,  1, 13, 14], dtype=int32),
 array([ 1,  6, 10, 10,  6, 10, 13, 15, 21, 14,  6,  6,  1,  1, 10,  1,  1,
         5,  6, 20,  1, 27, 21, 21,  7, 20, 21,  4, 22, 22,  1,  6, 16,  6,
         1, 16,  2,  6, 20,  2,  1,  1, 13,  1, 20, 15, 15, 20, 16,  1, 16,
        15, 10,  1,  1, 19,  1, 13,  7,  2,  6,  3, 13, 22], dtype=int32),
 array([14,  1, 15, 20,  1, 16,  2,  6, 13,  1,  2, 23, 21, 10,  4,  4, 10,
        10, 13,  1, 19,  6,  2, 26, 16,  4,  6, 16, 15, 20,  4, 15,  1,  1,
        26, 22, 19, 20,  1, 15, 17,  3,  6,  2, 16,  1,  6,  1, 19, 13,  4,
         1, 15, 16, 10, 16, 24, 16,  1, 15,  1, 26, 16, 13], dtype=int32),
 array([22, 16,  1, 21, 10, 15, 21,  1,  2,  8, 12,  6,  9, 20,  6,  2, 21,
         4,  6,

In [222]:
def logprob(predictions, labels):
    """Log-probability of the true labels in a predicted batch."""
    predictions[predictions < 1e-10] = 1e-10
    labels_1hot = (np.arange(vocabulary_size) == labels[:,None]).astype(np.float32)
    return np.sum(np.multiply(labels_1hot, -np.log(predictions))) / labels.shape[0]

def sample_distribution(distribution):
    """Sample one element from a distribution assumed to be an array of normalized
    probabilities.
    """
    r = random.uniform(0, 1)
    s = 0
    for i in range(len(distribution)):
        s += distribution[i]
        if s >= r:
            return i
    return len(distribution) - 1

# def sample(prediction):
#     """Turn a (column) prediction into 1-hot encoded samples."""
#     p = np.zeros(shape=[1, vocabulary_size], dtype=np.float)
#     p[0, sample_distribution(prediction[0])] = 1.0
#     return p

def random_distribution():
    """Generate a random column of probabilities."""
    b = np.random.uniform(0.0, 1.0, size=[1, vocabulary_size])
    return b/np.sum(b, 1)[:,None]

In [None]:
num_nodes = 256
embedding_size = vocabulary_size
keep_prob = 0.5

graph = tf.Graph()
with graph.as_default():

    # Parameters:
    # read all characters
    # input, forget, update, output
    # input, previous output, and bias.
    rx = tf.Variable(tf.truncated_normal([embedding_size, num_nodes * 4], -0.1, 0.1))
    rm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes * 4], -0.1, 0.1))
    rb = tf.Variable(tf.zeros([1, num_nodes * 4]))
    # write characters in reverse order
    # input, forget, update, output
    # input, previous output, and bias.
    wx = tf.Variable(tf.truncated_normal([num_nodes, num_nodes * 4], -0.1, 0.1))
    wm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes * 4], -0.1, 0.1))
    wb = tf.Variable(tf.zeros([1, num_nodes * 4]))
#     # Variables saving state across unrollings.
#     saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
#     saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
    # Embedding variable
    embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    # Classifier weights and biases.
    w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))
    b = tf.Variable(tf.zeros([vocabulary_size]))
  
    # Definition of the cell computation.
    def lstm_cell(i, o, state, variables, train_mode=False):
        """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
        Note that in this formulation, we omit the various connections between the
        previous state and the gates."""
        gx, gm, gb = variables
        # input, forget, update, output
        gates = tf.matmul(i, gx) + tf.matmul(o, gm) + gb
        if train_mode:
            input_gate = tf.sigmoid(tf.nn.dropout(gates[:, :num_nodes], keep_prob=keep_prob))
            forget_gate = tf.sigmoid(tf.nn.dropout(gates[:, num_nodes:num_nodes*2], keep_prob=keep_prob))
        else:
            input_gate = tf.sigmoid(gates[:, :num_nodes])
            forget_gate = tf.sigmoid(gates[:, num_nodes:num_nodes*2])
        update = gates[:, num_nodes*2:num_nodes*3]
        state = forget_gate * state + input_gate * tf.tanh(update)
        output_gate = tf.sigmoid(gates[:, num_nodes*3:num_nodes*4])
        return output_gate * tf.tanh(state), tf.tanh(state), state

    # Input data.
    train_data = list()
    for _ in range(num_unrollings):
        train_data.append(tf.placeholder(tf.int32, shape=[batch_size]))
    train_inputs = train_data
    train_labels = train_data[-2::-1]
    train_labels.append(train_data[-1])  # labels are reversed inputs
    
    # Unrolled LSTM loop.
    # read all characters
    th_state = tf.zeros([batch_size, num_nodes])
    state = tf.zeros([batch_size, num_nodes])
    for i in train_inputs:
        # Look up embeddings for inputs.
        embed = tf.nn.embedding_lookup(embeddings, i)
        variables = [rx, rm, rb]
        output, th_state, state = lstm_cell(embed, th_state, state, variables, train_mode=True)

    # write characters in reverse order
    w_outputs = list()
    w_input = output
#     th_state = tf.zeros([batch_size, num_nodes])
#     state = tf.zeros([batch_size, num_nodes])
#     # МОЖНО ПОПРОБОВАТЬ НЕ ИНИЦИАЛИЗИРОВАТЬ state ДЛЯ НОВОЙ lstm НУЛЯМИ
    for _ in range(num_unrollings):            # НАДО ПЕРЕПИСАТЬ
        variables = [wx, wm, wb]
        output, th_state, state = lstm_cell(w_input, th_state, state, variables, train_mode=True)
        w_input = output
        w_outputs.append(output)

#     # State saving across unrollings.
#     with tf.control_dependencies([saved_output.assign(output), saved_state.assign(state)]):
        # Classifier.
    logits = tf.nn.xw_plus_b(tf.nn.dropout(tf.concat(w_outputs, 0), keep_prob=keep_prob), w, b)
    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=tf.concat(train_labels, 0), logits=logits))

    # Optimizer.
    global_step = tf.Variable(0)
    learning_rate = tf.train.exponential_decay(.001, global_step, 600, 0.4, staircase=True)
    optimizer = tf.train.AdamOptimizer(learning_rate)
    gradients, v = zip(*optimizer.compute_gradients(loss))
    gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
    optimizer = optimizer.apply_gradients(zip(gradients, v), global_step=global_step)

    # Predictions.
    train_prediction = tf.nn.softmax(logits)

    # Sampling and validation eval: batch 1, with unrolling.
    val_inputs = list()
    for _ in range(num_unrollings):
        val_inputs.append(tf.placeholder(tf.int32, shape=[1]))
#     sample_input = tf.placeholder(tf.int32, shape=[1])

#     saved_sample_output = tf.Variable(tf.zeros([1, num_nodes]))
#     saved_sample_state = tf.Variable(tf.zeros([1, num_nodes]))
#     reset_sample_state = tf.group(
#         saved_sample_output.assign(tf.zeros([1, num_nodes])),
#         saved_sample_state.assign(tf.zeros([1, num_nodes])))
    
    th_state = tf.zeros([1, num_nodes])
    state = tf.zeros([1, num_nodes])
    for i in val_inputs:
        # Look up embeddings for inputs.
        embed = tf.nn.embedding_lookup(embeddings, i)
        variables = [rx, rm, rb]
        output, th_state, state = lstm_cell(embed, th_state, state, variables, train_mode=False)

    # write characters in reverse order
    w_outputs = list()
    w_input = output
#     th_state = tf.zeros([1, num_nodes])
#     state = tf.zeros([1, num_nodes])
#     # МОЖНО ПОПРОБОВАТЬ НЕ ИНИЦИАЛИЗИРОВАТЬ state ДЛЯ НОВОЙ lstm НУЛЯМИ
    for _ in range(num_unrollings):            # НАДО ПЕРЕПИСАТЬ
        variables = [wx, wm, wb]
        output, th_state, state = lstm_cell(w_input, th_state, state, variables, train_mode=False)
        w_input = output
        w_outputs.append(output)





#     # Look up embeddings for inputs.
#     embed = tf.nn.embedding_lookup(embeddings, sample_input)
#     sample_output, sample_state = lstm_cell(embed, saved_sample_output, saved_sample_state, train_mode=False)
#     with tf.control_dependencies([saved_sample_output.assign(sample_output),
#                                 saved_sample_state.assign(sample_state)]):
    sample_predictions = tf.nn.softmax(tf.nn.xw_plus_b(tf.concat(w_outputs, 0), w, b))

In [None]:
%%time
num_steps = 2001
summary_frequency = 25

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    mean_loss = 0
    for step in range(num_steps):
        batches = train_batches.next()
        feed_dict = dict()
        for i in range(num_unrollings):
            feed_dict[train_data[i]] = batches[i]
        _, l, predictions, lr = session.run(
            [optimizer, loss, train_prediction, learning_rate], feed_dict=feed_dict)
        mean_loss += l
        if step % summary_frequency == 0:
            if step > 0:
                mean_loss = mean_loss / summary_frequency
            # The mean loss is an estimate of the loss over the last few batches.
            print('Average loss at step %d: %f learning rate: %f' % (step, mean_loss, lr))
            mean_loss = 0
            labels = np.concatenate(list(batches)[-2::-1])
            labels = np.concatenate((labels, list(batches)[-1]))
            print('Minibatch perplexity: %.2f' % float(np.exp(logprob(predictions, labels))))
            if step % (summary_frequency * 5) == 0:
                # Generate some samples.
                print('=' * 80)
                for _ in range(5):
                    batches = valid_batches.next()
                    feed_dict = dict()
                    for i in range(num_unrollings):
                        feed_dict[val_inputs[i]] = batches[i]
                    sentence = batches2string(batches)
                    predictions = sample_predictions.eval(feed_dict=feed_dict)
                    reverse_sentence = ''
                    for pred in predictions:
                        reverse_sentence += id2char(sample_distribution(pred))
                    print("Sentence: {}".format(sentence[0]))
                    print("True reverse: {}{}".format(sentence[0][-2::-1], sentence[0][-1]))
                    print("Pred reverse: {}\n".format(reverse_sentence))
                print('=' * 80)
        # Measure validation set perplexity.
        valid_logprob = 0
        for _ in range(valid_size):
            batches = valid_batches.next()
            feed_dict = dict()
            for i in range(num_unrollings):
                feed_dict[val_inputs[i]] = batches[i]
            predictions = sample_predictions.eval(feed_dict=feed_dict)
            labels = np.concatenate(list(batches)[-2::-1])
            labels = np.concatenate((labels, list(batches)[-1]))
            valid_logprob = valid_logprob + logprob(predictions, labels)
        print('Validation set perplexity: %.2f' % float(np.exp(valid_logprob / valid_size)))

Initialized
Average loss at step 0: 3.471902 learning rate: 0.001000
(3200,)
Minibatch perplexity: 32.20
Sentence:  the diggers of the english revolution and the sa$
True reverse: as eht dna noitulover hsilgne eht fo sreggid eht $
Pred reverse: yifoyunsutljxkmjcquqrqckubwcaq$yicmcjudvsqzijwzpll

Sentence: ns culottes of the french revolution whilst the t$
True reverse: t eht tslihw noitulover hcnerf eht fo settoluc sn$
Pred reverse: gvrefusn nwbehz oiobzrppllavblbctbs zdouze$dovnlxm

Sentence: erm is still used in a pejorative way to describe$
True reverse: ebircsed ot yaw evitarojep a ni desu llits si mre$
Pred reverse: xxiiruhxm zjht$thiymv$nflpkyrjyrkp  bpvuccptsggvp 

Sentence:  any act that used violent means to destroy the o$
True reverse: o eht yortsed ot snaem tneloiv desu taht tca yna $
Pred reverse: yqipxsevanaxhdvcybwmyr qxozm hmiy epggbdqpuxwemwqy

Sentence: rganization of society it has also been taken up $
True reverse:  pu nekat neeb osla sah ti yteicos fo noitazinagr$
P

Validation set perplexity: 23.34
Validation set perplexity: 23.33
Validation set perplexity: 23.33
Validation set perplexity: 23.33
Validation set perplexity: 23.32
Validation set perplexity: 23.32
Validation set perplexity: 23.32
Validation set perplexity: 23.32
Validation set perplexity: 23.32
Validation set perplexity: 23.31
Validation set perplexity: 23.30
Validation set perplexity: 23.29
Validation set perplexity: 23.28
Validation set perplexity: 23.26
Validation set perplexity: 23.24
Average loss at step 175: 3.120690 learning rate: 0.001000
(3200,)
Minibatch perplexity: 22.58
Validation set perplexity: 23.22
Validation set perplexity: 23.21
Validation set perplexity: 23.19
Validation set perplexity: 23.19
Validation set perplexity: 23.18
Validation set perplexity: 23.17
Validation set perplexity: 23.16
Validation set perplexity: 23.16
Validation set perplexity: 23.15
Validation set perplexity: 23.15
Validation set perplexity: 23.15
Validation set perplexity: 23.15
Validation set

Validation set perplexity: 22.34
Validation set perplexity: 22.37
Validation set perplexity: 22.39
Validation set perplexity: 22.39
Validation set perplexity: 22.41
Validation set perplexity: 22.40
Validation set perplexity: 22.41
Validation set perplexity: 22.39
Validation set perplexity: 22.41
Validation set perplexity: 22.41
Validation set perplexity: 22.44
Validation set perplexity: 22.44
Validation set perplexity: 22.45
Validation set perplexity: 22.44
Validation set perplexity: 22.41
Validation set perplexity: 22.38
Validation set perplexity: 22.36
Validation set perplexity: 22.33
Validation set perplexity: 22.31
Validation set perplexity: 22.29
Validation set perplexity: 22.30
Validation set perplexity: 22.27
Validation set perplexity: 22.26
Average loss at step 375: 3.029112 learning rate: 0.001000
(3200,)
Minibatch perplexity: 21.08
Sentence: hilism or anomie but rather a harmonious anti aut$
True reverse: tua itna suoinomrah a rehtar tub eimona ro msilih$
Pred reverse: esxomo

Validation set perplexity: 19.41
Validation set perplexity: 19.41
Validation set perplexity: 19.41
Validation set perplexity: 19.42
Validation set perplexity: 19.41
Validation set perplexity: 19.40
Validation set perplexity: 19.40
Validation set perplexity: 19.41
Validation set perplexity: 19.39
Validation set perplexity: 19.38
Average loss at step 525: 2.931221 learning rate: 0.001000
(3200,)
Minibatch perplexity: 18.62
Validation set perplexity: 19.38
Validation set perplexity: 19.39
Validation set perplexity: 19.39
Validation set perplexity: 19.38
Validation set perplexity: 19.38
Validation set perplexity: 19.36
Validation set perplexity: 19.37
Validation set perplexity: 19.38
Validation set perplexity: 19.38
Validation set perplexity: 19.35
Validation set perplexity: 19.37
Validation set perplexity: 19.36
Validation set perplexity: 19.37
Validation set perplexity: 19.37
Validation set perplexity: 19.36
Validation set perplexity: 19.37
Validation set perplexity: 19.35
Validation set

Validation set perplexity: 19.27
Validation set perplexity: 19.26
Validation set perplexity: 19.24
Validation set perplexity: 19.24
Validation set perplexity: 19.24
Validation set perplexity: 19.25
Validation set perplexity: 19.25
Validation set perplexity: 19.25
Validation set perplexity: 19.24
Validation set perplexity: 19.26
Validation set perplexity: 19.25
Validation set perplexity: 19.26
Validation set perplexity: 19.26
Validation set perplexity: 19.27
Validation set perplexity: 19.24
Validation set perplexity: 19.23
Validation set perplexity: 19.26
Validation set perplexity: 19.26
Average loss at step 725: 2.908281 learning rate: 0.000400
(3200,)
Minibatch perplexity: 18.51
Validation set perplexity: 19.27
Validation set perplexity: 19.26
Validation set perplexity: 19.28
Validation set perplexity: 19.28
Validation set perplexity: 19.30
Validation set perplexity: 19.26
Validation set perplexity: 19.27
Validation set perplexity: 19.27
Validation set perplexity: 19.28
Validation set

Validation set perplexity: 19.27
Validation set perplexity: 19.28
Validation set perplexity: 19.27
Validation set perplexity: 19.30
Validation set perplexity: 19.28
Validation set perplexity: 19.29
Validation set perplexity: 19.29
Validation set perplexity: 19.28
Validation set perplexity: 19.31
Validation set perplexity: 19.27
Validation set perplexity: 19.29
Validation set perplexity: 19.26
Validation set perplexity: 19.30
Validation set perplexity: 19.28
Validation set perplexity: 19.30
Validation set perplexity: 19.30
Validation set perplexity: 19.30
Validation set perplexity: 19.29
Validation set perplexity: 19.32
Validation set perplexity: 19.30
Validation set perplexity: 19.30
Validation set perplexity: 19.31
Validation set perplexity: 19.30
Validation set perplexity: 19.30
Validation set perplexity: 19.31
Average loss at step 900: 2.906221 learning rate: 0.000400
(3200,)
Minibatch perplexity: 18.03
Validation set perplexity: 19.30
Validation set perplexity: 19.33
Validation set

Validation set perplexity: 19.41
Validation set perplexity: 19.38
Validation set perplexity: 19.37
Validation set perplexity: 19.39
Validation set perplexity: 19.40
Average loss at step 1075: 2.901595 learning rate: 0.000400
(3200,)
Minibatch perplexity: 18.55
Validation set perplexity: 19.40
Validation set perplexity: 19.40
Validation set perplexity: 19.39
Validation set perplexity: 19.40
Validation set perplexity: 19.41
Validation set perplexity: 19.43
Validation set perplexity: 19.42
Validation set perplexity: 19.46
Validation set perplexity: 19.43
Validation set perplexity: 19.41
Validation set perplexity: 19.42
Validation set perplexity: 19.38
Validation set perplexity: 19.38
Validation set perplexity: 19.42
Validation set perplexity: 19.40
Validation set perplexity: 19.42
Validation set perplexity: 19.40
Validation set perplexity: 19.38
Validation set perplexity: 19.38
Validation set perplexity: 19.37
Validation set perplexity: 19.37
Validation set perplexity: 19.37
Validation se

Validation set perplexity: 19.68
Validation set perplexity: 19.65
Validation set perplexity: 19.67
Validation set perplexity: 19.67
Validation set perplexity: 19.65
Validation set perplexity: 19.65
Validation set perplexity: 19.63
Validation set perplexity: 19.68
Validation set perplexity: 19.69
Validation set perplexity: 19.69
Validation set perplexity: 19.71
Validation set perplexity: 19.66
Validation set perplexity: 19.68
Validation set perplexity: 19.71
Validation set perplexity: 19.69
Validation set perplexity: 19.65
Validation set perplexity: 19.66
Validation set perplexity: 19.66
Validation set perplexity: 19.68
Validation set perplexity: 19.70
Validation set perplexity: 19.67
Validation set perplexity: 19.69
Validation set perplexity: 19.73
Validation set perplexity: 19.73
Validation set perplexity: 19.76
Average loss at step 1275: 2.895736 learning rate: 0.000160
(3200,)
Minibatch perplexity: 18.19
Validation set perplexity: 19.73
Validation set perplexity: 19.72
Validation se

Validation set perplexity: 20.18
Validation set perplexity: 20.14
Validation set perplexity: 20.18
Validation set perplexity: 20.21
Validation set perplexity: 20.16
Validation set perplexity: 20.09
Average loss at step 1450: 2.891424 learning rate: 0.000160
(3200,)
Minibatch perplexity: 17.92
Validation set perplexity: 20.08
Validation set perplexity: 20.11
Validation set perplexity: 20.09
Validation set perplexity: 20.17
Validation set perplexity: 20.09
Validation set perplexity: 20.08
Validation set perplexity: 20.12
Validation set perplexity: 20.11
Validation set perplexity: 20.11
Validation set perplexity: 20.14
Validation set perplexity: 20.15
Validation set perplexity: 20.09
Validation set perplexity: 20.15
Validation set perplexity: 20.13
Validation set perplexity: 20.17
Validation set perplexity: 20.12
Validation set perplexity: 20.15
Validation set perplexity: 20.11
Validation set perplexity: 20.12
Validation set perplexity: 20.03
Validation set perplexity: 20.09
Validation se