# MI-MVI tutorial 3#

Today we try to use recurrent neural networks (RNN) to predict words in english text.

 - Based on TF RNN tutorial: https://www.tensorflow.org/tutorials/recurrent
 - Pretty images an some thory come from: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop:

![RNN unroled](images/RNN-unrolled.png "Structure of unrolled RNN.")

In the above diagram, a chunk of neural network, *A*, looks at some input *x_t* and outputs a value *h_t*. A loop allows information to be passed from one step of the network to the next. The *A* labelled boxes are elementary modules which vary with particular type of RNN. In standard (vanilla) RNN they look like:

![RNN cenll](images/SimpleRNNcell.png)
in comparison to LSTM cell:
![LSTM cell](images/LSTMcell.png)

We will focus on standard RNN cell. The yellow box with *tanh* is neural network layer. It combinates input *x_t* and previous state of the preceeding cell. We are going to build such cell from scratch, but you can implement the **tf.contrib.rnn.RNNCell** abstract class.

**Import** all packages that will be used.

In [1]:
import os, sys, tarfile
import collections
from six.moves.urllib.request import urlretrieve
import numpy as np
import tensorflow as tf

Download a [Penn Tree Bank (PTB)](https://catalog.ldc.upenn.edu/ldc99t42) dataset. We use an identical approach as in the last tutorial.

In [2]:
url = 'http://www.fit.vutbr.cz/~imikolov/rnnlm/'
data_root = 'data/rnn'
last_percent_reported = None

# make sure the dataset directory exists
if not os.path.isdir(data_root):
  os.makedirs(data_root)

def download_progress_hook(count, blockSize, totalSize):
  """A hook to report the progress of a download. This is mostly intended for users with
  slow internet connections. Reports every 5% change in download progress.
  """
  global last_percent_reported
  percent = int(count * blockSize * 100 / totalSize)

  if last_percent_reported != percent:
    if percent % 5 == 0:
      sys.stdout.write("%s%%" % percent)
      sys.stdout.flush()
    else:
      sys.stdout.write(".")
      sys.stdout.flush()
      
    last_percent_reported = percent
    
def maybe_download(filename, expected_bytes, force=False):
  """Download a file if not present, and make sure it's the right size."""
  dest_filename = os.path.join(data_root, filename)
  if force or not os.path.exists(dest_filename):
    print('Attempting to download:', filename) 
    filename, _ = urlretrieve(url + filename, dest_filename, reporthook=download_progress_hook)
    print('\nDownload Complete!')
  statinfo = os.stat(dest_filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified', dest_filename)
  else:
    raise Exception(
      'Failed to verify ' + dest_filename + '. Can you get to it with a browser?')
  return dest_filename

train_filename = maybe_download('simple-examples.tgz', 34869662)

Found and verified data/rnn/simple-examples.tgz


We need to unpack the downloaded data. The desired content is in **data** subdirectory.

In [3]:
def maybe_extract(filename, force=False):
  root = os.path.splitext(os.path.splitext(filename)[0])[0]  # remove .tar.gz
  if os.path.isdir(root) and not force:
    # You may override by setting force=True.
    print('%s already present - Skipping extraction of %s.' % (root, filename))
  else:
    print('Extracting data for %s. This may take a while. Please wait.' % root)
    tar = tarfile.open(filename)
    sys.stdout.flush()
    tar.extractall(data_root)
    tar.close()
  data_folders = [
    os.path.join(root, d) for d in sorted(os.listdir(root))
    if os.path.isdir(os.path.join(root, d))]
  
  print(data_folders)
  return data_folders
  
train_folders = maybe_extract(train_filename)

data/rnn/simple-examples already present - Skipping extraction of data/rnn/simple-examples.tgz.
['data/rnn/simple-examples/1-train', 'data/rnn/simple-examples/2-nbest-rescore', 'data/rnn/simple-examples/3-combination', 'data/rnn/simple-examples/4-data-generation', 'data/rnn/simple-examples/5-one-iter', 'data/rnn/simple-examples/6-recovery-during-training', 'data/rnn/simple-examples/7-dynamic-evaluation', 'data/rnn/simple-examples/8-direct', 'data/rnn/simple-examples/9-char-based-lm', 'data/rnn/simple-examples/data', 'data/rnn/simple-examples/models', 'data/rnn/simple-examples/rnnlm-0.2b', 'data/rnn/simple-examples/temp']


Now we define a few helpers to manipulate the data. Because a TF needs tensors of numbers we need to represent words by numbers. For this requirement we build a vocabulary from a given file. The vocabulary will contain a key-value pairs of following meaning 'word':ID.

A **ptb_raw_data** method loads all necessary files, creates vocabularies and transform content of train, validation and test datafiles to number sequences.

In [4]:
def _read_words(filename):
  with tf.gfile.GFile(filename, "r") as f:
    return f.read().replace("\n", "<eos>").split()

def _build_vocab(filename, wordsLimit=None):
  data = _read_words(filename)
  counter = collections.Counter(data)
  count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))
  if (wordsLimit!=None):
        count_pairs = count_pairs[0:wordsLimit]
  words, _ = list(zip(*count_pairs))
  word_to_id = dict(zip(words, range(len(words))))
  return word_to_id

def _file_to_word_ids(filename, word_to_id):
  data = _read_words(filename)
  return [word_to_id[word] for word in data if word in word_to_id]

def ptb_raw_data(data_path=None, wordsLimit=None):
  """Load PTB raw data from data directory "data_path".
  Reads PTB text files, converts strings to integer ids, and performs mini-batching of the inputs.
  Args:
    data_path: string path to the directory where simple-examples.tgz has been extracted.
  Returns:
    tuple (train_data, valid_data, test_data, vocabulary)
    where each of the data objects can be passed to PTBIterator.
  """

  train_path = os.path.join(data_path, "ptb.train.txt")
  valid_path = os.path.join(data_path, "ptb.valid.txt")
  test_path = os.path.join(data_path, "ptb.test.txt")

  word_to_id = _build_vocab(train_path, wordsLimit)
  train_data = _file_to_word_ids(train_path, word_to_id)
  valid_data = _file_to_word_ids(valid_path, word_to_id)
  test_data = _file_to_word_ids(test_path, word_to_id)
  vocabulary = len(word_to_id)
  return train_data, valid_data, test_data, vocabulary

Let us inspect the data:

In [5]:
# !!! TODO: pokud se ukaze, ze 10000 slov ve slovniku je moc na one hot encoding, profiltrujem slovnik.
wordsLimit=10000

# !!! TODO: posefit one hot encoding

train_data, valid_data, test_data, vocabulary = ptb_raw_data(os.path.join(data_root, 'simple-examples/data'), wordsLimit)
vocab = _build_vocab(os.path.join(data_root, 'simple-examples','data','ptb.test.txt'), wordsLimit)
firstitems = {k: vocab[k] for k in sorted(vocab.keys())[:30]}

print('train data len:', len(train_data))
print('validation data len:', len(valid_data))
print('test data len:', len(test_data))
print('vocabulary item count:', vocabulary)
print('the first 30 vocabulary items:', firstitems)

train data len: 929589
validation data len: 73760
test data len: 82430
vocabulary item count: 10000
the first 30 vocabulary items: {'<unk>': 0, '30-year': 2495, "'ll": 963, '1930s': 4168, '1960s': 2058, "'d": 1182, '12-year': 3108, '52-week': 2059, '190.58-point': 2057, '20th': 4169, '1970s': 1748, '190-point': 1747, "'m": 964, '13th': 1059, '10-year': 2056, '#': 1181, '&': 72, '$': 14, '<eos>': 2, 'a': 6, '12-month': 4166, '1990s': 2494, "'ve": 670, '500-stock': 2496, "'": 131, '1980s': 1327, "'re": 234, '1920s': 4167, "'s": 9, 'N': 3}


In [6]:
def ptb_producer(raw_data, batch_size, num_steps, name=None):
  """Iterate on the raw PTB data.
  This chunks up raw_data into batches of examples and returns Tensors that are drawn from these batches.
  Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.
    name: the name of this operation (optional).
  Returns:
    A pair of Tensors, each shaped [batch_size, num_steps]. The second element
    of the tuple is the same data time-shifted to the right by one.
  Raises:
    tf.errors.InvalidArgumentError: if batch_size or num_steps are too high.
  """
  with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]):
    raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

    data_len = tf.size(raw_data)
    batch_len = data_len // batch_size
    data = tf.reshape(raw_data[0 : batch_size * batch_len], [batch_size, batch_len])

    epoch_size = (batch_len - 1) // num_steps
    assertion = tf.assert_positive(epoch_size, message="epoch_size == 0, decrease batch_size or num_steps")
    with tf.control_dependencies([assertion]):
      epoch_size = tf.identity(epoch_size, name="epoch_size")

    i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
    x = tf.strided_slice(data, [0, i * num_steps], [batch_size, (i + 1) * num_steps])
    x.set_shape([batch_size, num_steps])
    y = tf.strided_slice(data, [0, i * num_steps + 1],[batch_size, (i + 1) * num_steps + 1])
    y.set_shape([batch_size, num_steps])
    return x, y

It is a time to put all together... 

## Vanilla RNN and LSTM ##

Preparing the building blocks for both types of RNN.

In [7]:
state_size = 200
init_scale = 0.1

In [8]:
def RNN_logits(states, output_size):
        
    # RNN parameters
    V = tf.get_variable('V', shape=[state_size, output_size], 
                        initializer=tf.random_uniform_initializer(minval=-init_scale, maxval=init_scale))
    bo = tf.get_variable('bl', shape=[output_size], initializer=tf.constant_initializer(0.))
    
    # calculate logits
    return tf.matmul(states, V) + bo

![RNN cenll](images/SimpleRNNcell.png)

In [9]:
def RNN_step(previous_hidden_state, input_tensor):
    
    # RNN parameters
    W = tf.get_variable("W", shape=[state_size, state_size], 
                        initializer=tf.random_uniform_initializer(minval=-init_scale, maxval=init_scale))
    U = tf.get_variable("U", shape=[state_size, state_size], 
                        initializer=tf.random_uniform_initializer(minval=-init_scale, maxval=init_scale))
    b = tf.get_variable("b", shape=[state_size], initializer=tf.constant_initializer(0.))
    
    # calculate new hidden state
    hidden_state = tf.tanh(tf.matmul(previous_hidden_state, W) + tf.matmul(input_tensor,U) + b)
    
    return hidden_state

![LSTM cell](images/LSTMcell.png)

In [10]:
def LSTM_step(previous_hidden_state, input_tensor):
    
    # weights for input
    W = tf.get_variable('W', shape=[4, state_size, state_size], 
                        initializer=tf.random_uniform_initializer(minval=-init_scale, maxval=init_scale))
    # weights for previous hidden state
    U = tf.get_variable('U', shape=[4, state_size, state_size], 
                        initializer=tf.random_uniform_initializer(minval=-init_scale, maxval=init_scale))
    
    bi = tf.get_variable("bi", shape=[state_size], initializer=tf.constant_initializer(0.))
    bf = tf.get_variable("bf", shape=[state_size], initializer=tf.constant_initializer(0.))
    bo = tf.get_variable("bo", shape=[state_size], initializer=tf.constant_initializer(0.))
    bc = tf.get_variable("bc", shape=[state_size], initializer=tf.constant_initializer(0.))
    
    # gather previous internal state and output state
    state, cell = tf.unstack(previous_hidden_state)
    
    # gates
    input_gate = tf.sigmoid(tf.matmul(input_tensor, U[0]) + tf.matmul(state, W[0]) + bi)
    forget_gate = tf.sigmoid(tf.matmul(input_tensor, U[1]) + tf.matmul(state, W[1]) + bf)
    output_gate = tf.sigmoid(tf.matmul(input_tensor, U[2]) + tf.matmul(state, W[2]) + bo)
    gate_weights = tf.tanh(tf.matmul(input_tensor, U[3]) + tf.matmul(state, W[3]) + bc)
    
    # new internal cell state
    cell = cell * forget_gate + gate_weights * input_gate
    
    # output state
    state = tf.tanh(cell) * output_gate
    return tf.stack([state, cell])

Lets try to understand **tf.transpose** by playing with it for a while:

In [11]:
c = tf.constant([[[ 1,  2,  3],
                  [ 4,  5,  6]],
                 [[ 7,  8,  9],
                  [10, 11, 12]]])

ctr = tf.transpose(c, perm=[1, 0, 2])

with tf.Session() as session:
  session.run(ctr)
  print(ctr.eval())

[[[ 1  2  3]
  [ 7  8  9]]

 [[ 4  5  6]
  [10 11 12]]]


## Vanilla RNN #

In [13]:
num_classes = vocabulary
num_steps = 20
batch_size = 20
max_gradient_norm = 5
learning_rate = 1.0

# basic RNN  cell
rnn_type = "vanilla"
tf.reset_default_graph()

# take a subset of data
input_tensor, labels_tensor = ptb_producer(train_data, batch_size=batch_size, num_steps=num_steps)

# TODO: kde se naplni to embeddings?
embeddings = tf.get_variable("embeddings", [num_classes, state_size])
# kdyz se tady z nej maji vybrat hodnoty (sloupce nebo radky?) podle idcek v input_tensor
rnn_inputs = tf.nn.embedding_lookup(embeddings, input_tensor)

# an initial hidden state zero-filled tensor
init_hidden_state = tf.cast(np.zeros([batch_size, state_size]), tf.float32)

# TODO: proc se to transponuje tam a zase zpatky?
states = tf.scan(RNN_step, tf.transpose(rnn_inputs, [1,0,2]), initializer=init_hidden_state) 
states = tf.transpose(states, [1,0,2])

# TODO: proc se otaci poradi? a hned dvakrat?
states_reshaped = tf.reshape(states, [-1, state_size])
logits = RNN_logits(states_reshaped, num_classes)
logits = tf.reshape(logits, [batch_size, num_steps, -1])

predictions = tf.nn.softmax(logits)

# calculate a difference between predited and correct labels
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels_tensor)
loss = tf.reduce_mean(losses)

trainable_vars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, trainable_vars), max_gradient_norm)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.apply_gradients(zip(grads, trainable_vars), 
                                     global_step=tf.contrib.framework.get_or_create_global_step())

## LSTM ##

In [14]:
num_classes = vocabulary
num_steps = 20
batch_size = 20
max_gradient_norm = 5
learning_rate = 1.0

#LSTM cell
rnn_type = "LSTM"
tf.reset_default_graph()

# take a subset of data
input_tensor, labels_tensor = ptb_producer(train_data, batch_size=batch_size, num_steps=num_steps)

# TODO: kde se naplni to embeddings?
embeddings = tf.get_variable("embeddings", [num_classes, state_size])
# kdyz se tady z nej maji vybrat hodnoty (sloupce nebo radky?) podle idcek v input_tensor
rnn_inputs = tf.nn.embedding_lookup(embeddings, input_tensor)

# an initial hidden state zero-filled tensor
init_hidden_state = tf.cast(np.zeros([2, batch_size, state_size]), dtype=tf.float32)

# TODO: proc se to transponuje tam a zase zpatky?
states = tf.scan(LSTM_step, tf.transpose(rnn_inputs, [1,0,2]), initializer=init_hidden_state) 
states = tf.transpose(states, [1,2,0,3])[0]

# TODO: proc se otaci poradi? a hned dvakrat?
states_reshaped = tf.reshape(states, [-1, state_size])
logits = RNN_logits(states_reshaped, num_classes)
logits = tf.reshape(logits, [batch_size, num_steps, -1])

predictions = tf.nn.softmax(logits)

# calculate a difference between predited and correct labels
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels_tensor)
loss = tf.reduce_mean(losses)

trainable_vars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, trainable_vars), max_gradient_norm)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.apply_gradients(zip(grads, trainable_vars), 
                                     global_step=tf.contrib.framework.get_or_create_global_step())

In [15]:
num_training_steps = 100

num_classes = vocabulary
num_steps = 20
batch_size = 20
num_layers = 2

# TODO: v poslednim kroku ten ptb_producer neco dela spatne:
# ERROR:tensorflow:Exception in QueueRunner: Enqueue operation was cancelled
# hrabe s konstantni delkou kroku uz mimo rozsah dat? 
# tedy posledni vyber by mel byt neco v duchu:
# x = tf.strided_slice(data, [0, i * num_steps], [batch_size, max(size(data)[spravna_dim], (i + 1) * num_steps]))

# navic mam podle vypisu input a labels pocit, ze ptb_prodecer nevraci labely jako data 
# shiftnuta o 1 vpravo, jak se tvrdi v popisku; kdyz jsem si je nechal vypsat, nevypadaji 
# vubec, ze by se prekryvala :-(

with tf.Session() as session:
  print("RNN type: ", rnn_type)
  session.run(tf.global_variables_initializer())
    
  #
  input_coord = tf.train.Coordinator() 
  input_threads = tf.train.start_queue_runners(session, coord=input_coord)
  #
    
  for step in range(num_training_steps):
    
    loss_val = session.run([loss, train_op], feed_dict={})
    
    
    input_vals, labels_vals = session.run([input_tensor, labels_tensor])
    
    #print()
    #print(input_vals)
    #print(labels_vals)
    #print()
    
    
    #print("input:", input_tensor.eval())
    #print("labels:", labels_tensor.eval())
    #print("embeddings:", embeddings.eval())
    #print("rnn_inputs:", rnn_inputs.eval())
    print("step:", step)
    print("loss:", loss_val)
    print()
    
  #  
  input_coord.request_stop()
  input_coord.join(input_threads)  
  #

RNN type:  LSTM
step: 0
loss: [9.2103758, None]

step: 1
loss: [9.1972437, None]

step: 2
loss: [9.1801109, None]

step: 3
loss: [9.1664448, None]

step: 4
loss: [9.1478958, None]

step: 5
loss: [9.1334858, None]

step: 6
loss: [9.120285, None]

step: 7
loss: [9.1069813, None]

step: 8
loss: [9.0864716, None]

step: 9
loss: [9.0761538, None]

step: 10
loss: [9.0587387, None]

step: 11
loss: [9.0128689, None]

step: 12
loss: [9.0143213, None]

step: 13
loss: [9.0182877, None]

step: 14
loss: [8.9917889, None]

step: 15
loss: [8.9834356, None]

step: 16
loss: [8.9227142, None]

step: 17
loss: [8.9316635, None]

step: 18
loss: [8.879118, None]

step: 19
loss: [8.9076252, None]

step: 20
loss: [8.8004942, None]

step: 21
loss: [8.7474966, None]

step: 22
loss: [8.6039553, None]

step: 23
loss: [8.5174246, None]

step: 24
loss: [8.3974323, None]

step: 25
loss: [8.3970432, None]

step: 26
loss: [7.9700203, None]

step: 27
loss: [7.8034649, None]

step: 28
loss: [7.8849301, None]

step: 29
l

### TODO:
something like: what you trained above is basically the whole model, but we'll need to make changes that are beyond the scope of this tutorial to actually make it work

## Multi-layer LSTM ##

In [16]:
# configuration
num_classes = vocabulary
max_gradient_norm = 5
hidden_size = 200
num_steps = 20
batch_size = 20
num_layers = 2

learning_rate = 1.0
learning_rate_decay = 0.5
epoch_end_decay = 4

num_epochs = 13

# LSTM cell
rnn_type = "LSTM"
tf.reset_default_graph()

# take a subset of data
input_tensor, labels_tensor = ptb_producer(train_data, batch_size=batch_size, num_steps=num_steps)

# TODO: kde se naplni to embeddings?
embeddings = tf.get_variable("embeddings", [num_classes, state_size])
# kdyz se tady z nej maji vybrat hodnoty (sloupce nebo radky?) podle idcek v input_tensor
rnn_inputs = tf.nn.embedding_lookup(embeddings, input_tensor)

def build_layer(rnn_inputs, layer_idx):
    
    with tf.variable_scope("layer{}".format(layer_idx)):
    
        # truncated backprop
        hidden_state = tf.placeholder(tf.float32, shape=[2, batch_size, state_size])
        
        # TODO: proc se to transponuje tam a zase zpatky?
        states = tf.scan(LSTM_step, tf.transpose(rnn_inputs, [1,0,2]), initializer=hidden_state) 
        states = tf.transpose(states, [1,2,0,3])
        
        return states, hidden_state
    
sequence = rnn_inputs

final_states = []
hidden_states = []

for layer_idx in range(num_layers):
    states, hidden_state = build_layer(sequence, layer_idx)
    final_states.append(states[:, :, -1, :])
    hidden_states.append(hidden_state)
    
    sequence = states[0]
    
states_reshaped = tf.reshape(sequence, [-1, state_size])
logits = RNN_logits(states_reshaped, num_classes)
logits = tf.reshape(logits, [batch_size, num_steps, -1])

predictions = tf.nn.softmax(logits)

# calculate a difference between predited and correct labels
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels_tensor)
loss = tf.reduce_sum(losses) / batch_size

learning_rate_tensor = tf.Variable(learning_rate, name="learning_rate")
learning_rate_pl = tf.placeholder(tf.float32, name="learning_rate_pl")
assign_learning_rate = tf.assign(learning_rate_tensor, learning_rate_pl)

trainable_vars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, trainable_vars), max_gradient_norm)
optimizer = tf.train.GradientDescentOptimizer(learning_rate_tensor)
train_op = optimizer.apply_gradients(zip(grads, trainable_vars), 
                                     global_step=tf.contrib.framework.get_or_create_global_step())

In [None]:
epoch_size = ((len(train_data) // batch_size) - 1) // num_steps

with tf.Session() as session:
    
  print("RNN type: ", rnn_type)
  print()
    
  saver = tf.train.Saver()

  session.run(tf.global_variables_initializer())
    
  #
  input_coord = tf.train.Coordinator() 
  input_threads = tf.train.start_queue_runners(session, coord=input_coord)
  #
    
  for epoch in range(num_epochs):
      
      learning_rate_decay = learning_rate_decay ** max(epoch + 1 - epoch_end_decay, 0.0)
      session.run(assign_learning_rate, feed_dict={
          learning_rate_pl: learning_rate * learning_rate_decay
      })
        
      total_loss = 0
      total_time_steps = 0
   
      epoch_hidden_states = []
      for state_pl in hidden_states:
            epoch_hidden_states.append(np.zeros((2, batch_size, state_size)))

      for step in range(epoch_size):

        # build feed dict
        feed_dict = {}
        
        for state_pl, state_val in zip(hidden_states, epoch_hidden_states):
            feed_dict[state_pl] = state_val
            
        loss_val, _, epoch_hidden_states = session.run([loss, train_op, final_states], feed_dict=feed_dict)

        total_loss += loss_val
        total_time_steps += num_steps
            
      epoch_perplexity = np.exp(total_loss / total_time_steps)
    
      print("epoch {} - perplexity: {:.3f}".format(epoch + 1, epoch_perplexity))
    
  saver.save(session, "language-rnn", global_step=0)
    
  #  
  input_coord.request_stop()
  input_coord.join(input_threads)  
  #

with Xavier init:
epoch 1 - perplexity: 323.895
epoch 2 - perplexity: 159.690
epoch 3 - perplexity: 124.899
epoch 4 - perplexity: 106.407
epoch 5 - perplexity: 94.154
epoch 6 - perplexity: 85.495
epoch 7 - perplexity: 79.216
epoch 8 - perplexity: 74.287
epoch 9 - perplexity: 70.246
epoch 10 - perplexity: 67.009
epoch 11 - perplexity: 64.202
epoch 12 - perplexity: 61.849
epoch 13 - perplexity: 59.895

with uniform init:
epoch 1 - perplexity: 297.000
epoch 2 - perplexity: 151.629
epoch 3 - perplexity: 119.196
epoch 4 - perplexity: 101.152
epoch 5 - perplexity: 89.644
epoch 6 - perplexity: 81.587
epoch 7 - perplexity: 75.741
epoch 8 - perplexity: 71.162
epoch 9 - perplexity: 67.542
epoch 10 - perplexity: 64.574
epoch 11 - perplexity: 61.933
epoch 12 - perplexity: 59.742
epoch 13 - perplexity: 57.869

with truncated backprop 1:
epoch 1 - perplexity: 300.770
epoch 2 - perplexity: 153.738
epoch 3 - perplexity: 121.265
epoch 4 - perplexity: 103.269
epoch 5 - perplexity: 91.628
epoch 6 - perplexity: 83.777
epoch 7 - perplexity: 77.884
epoch 8 - perplexity: 73.411
epoch 9 - perplexity: 69.867
epoch 10 - perplexity: 66.884
epoch 11 - perplexity: 64.256
epoch 12 - perplexity: 61.849
epoch 13 - perplexity: 60.092

with truncated backprop 2:
epoch 1 - perplexity: 276.147
epoch 2 - perplexity: 134.308
epoch 3 - perplexity: 102.976
epoch 4 - perplexity: 86.773
epoch 5 - perplexity: 76.711
epoch 6 - perplexity: 69.913
epoch 7 - perplexity: 65.111
epoch 8 - perplexity: 61.424
epoch 9 - perplexity: 58.371
epoch 10 - perplexity: 55.948
epoch 11 - perplexity: 53.728
epoch 12 - perplexity: 51.879
epoch 13 - perplexity: 50.305

## Validation ##

In [None]:
batch_size = 20

rnn_type = "LSTM"
tf.reset_default_graph()

# take a subset of data
input_tensor, labels_tensor = ptb_producer(train_data, batch_size=batch_size, num_steps=num_steps)

# TODO: kde se naplni to embeddings?
embeddings = tf.get_variable("embeddings", [num_classes, state_size])
# kdyz se tady z nej maji vybrat hodnoty (sloupce nebo radky?) podle idcek v input_tensor
rnn_inputs = tf.nn.embedding_lookup(embeddings, input_tensor)

def build_layer(rnn_inputs, layer_idx):
    
    with tf.variable_scope("layer{}".format(layer_idx)):
    
        # truncated backprop
        hidden_state = tf.placeholder(tf.float32, shape=[2, batch_size, state_size])
        
        # TODO: proc se to transponuje tam a zase zpatky?
        states = tf.scan(LSTM_step, tf.transpose(rnn_inputs, [1,0,2]), initializer=hidden_state) 
        states = tf.transpose(states, [1,2,0,3])
        
        return states, hidden_state
    
sequence = rnn_inputs

final_states = []
hidden_states = []

for layer_idx in range(num_layers):
    states, hidden_state = build_layer(sequence, layer_idx)
    final_states.append(states[:, :, -1, :])
    hidden_states.append(hidden_state)
    
    sequence = states[0]
    
states_reshaped = tf.reshape(sequence, [-1, state_size])
logits = RNN_logits(states_reshaped, num_classes)
logits = tf.reshape(logits, [batch_size, num_steps, -1])

predictions = tf.nn.softmax(logits)

# calculate a difference between predited and correct labels
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels_tensor)
loss = tf.reduce_sum(losses) / batch_size

In [None]:
epoch_size = ((len(train_data) // batch_size) - 1) // num_steps

with tf.Session() as session:
    
  print("RNN type:", rnn_type)
  print()
    
  saver = tf.train.Saver()
  saver.restore(session, "language-rnn-0")

  session.run(tf.global_variables_initializer())
    
  #
  input_coord = tf.train.Coordinator() 
  input_threads = tf.train.start_queue_runners(session, coord=input_coord)
  #
    
  for epoch in range(1):
        
      total_loss = 0
      total_time_steps = 0
   
      epoch_hidden_states = []
      for state_pl in hidden_states:
            epoch_hidden_states.append(np.zeros((2, batch_size, state_size)))

      for step in range(epoch_size):

        # build feed dict
        feed_dict = {}
        
        for state_pl, state_val in zip(hidden_states, epoch_hidden_states):
            feed_dict[state_pl] = state_val
            
        loss_val, epoch_hidden_states = session.run([loss, final_states], feed_dict=feed_dict)

        total_loss += loss_val
        total_time_steps += num_steps
            
      epoch_perplexity = np.exp(total_loss / total_time_steps)
    
      print("epoch {} - perplexity: {:.3f}".format(epoch + 1, epoch_perplexity))
    
  #saver.save(session, "language-rnn", global_step=0)
    
  #  
  input_coord.request_stop()
  input_coord.join(input_threads)  
  #

## Inference ##

In [None]:
batch_size = 1

rnn_type = "LSTM"
tf.reset_default_graph()

words_pl = tf.placeholder(tf.int32, shape=[batch_size, None])
num_steps = tf.shape(words_pl)[1]

embeddings = tf.get_variable("embeddings", [num_classes, state_size])
rnn_inputs = tf.nn.embedding_lookup(embeddings, words_pl)

def build_layer(rnn_inputs, layer_idx):
    
    with tf.variable_scope("layer{}".format(layer_idx)):
    
        # an initial hidden state zero-filled tensor
        init_hidden_state = tf.random_uniform(minval=-1, maxval=1, shape=[2, batch_size, state_size])

        # TODO: proc se to transponuje tam a zase zpatky?
        states = tf.scan(LSTM_step, tf.transpose(rnn_inputs, [1,0,2]), initializer=init_hidden_state) 
        states = tf.transpose(states, [1,2,0,3])[0]

        return states
    
sequence = rnn_inputs
for layer_idx in range(num_layers):
    sequence = build_layer(sequence, layer_idx)
    
states_reshaped = tf.reshape(sequence, [-1, state_size])
logits = RNN_logits(states_reshaped, num_classes)
logits = tf.reshape(logits, [batch_size, num_steps, -1])

predictions = tf.argmax(logits, -1)

In [None]:
def word_by_index(index):
    for word, idx in vocab.items():
        if idx == index:
            return word

In [None]:
word_by_index(1)

In [None]:
with tf.Session() as session:
    
  saver = tf.train.Saver()
  saver.restore(session, "language-rnn-0")
    
  inputs = [[0]]
  for i in range(100):
      val = session.run(predictions, feed_dict={words_pl: inputs})
      inputs[0].append(val[0][-1])
        
  sentence = []
  for word_idx in inputs[0]:
    sentence.append(word_by_index(word_idx))
    
  for word in sentence:
    print(word, end=" ")

**Further reading**

  * Well explained LSTM: https://colah.github.io/posts/2015-08-Understanding-LSTMs/