Copyright (C) 2019 Software Platform Lab, Seoul National University

Licensed under the Apache License, Version 2.0 (the "License"); 

you may not use this file except in compliance with the License. 

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 

Unless required by applicable law or agreed to in writing, software 

distributed under the License is distributed on an "AS IS" BASIS, 


WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 


See the License for the specific language governing permissions and


limitations under the License.

#1. Recap on TF basics



## 1) Graph and Session
* **Graph** : It contains a set of Operations (units of computation) and Tensors (units of data between operations).

* **Session** : It encapsulates an execution environment such as which operations are executed and what is the current values of Tensor objects. 

Let's learn Graph and Session using examples presented below.

## 2) Constant Op

Let's create a constant in TensorFlow.

**```tf.constant(value, dtype = None, shape = None, name = 'Const', verify_shape = False)```**

In [1]:
import tensorflow as tf

graph = tf.Graph()
with graph.as_default():
  #constant of 1d tensor, or a vector
  a = tf.constant([2,2], name = 'vector')

  #constant of 2x2 tensor, or a matrix
  b = tf.constant([[0,2], [1,3]], name = 'matrix')

print(a)
print(b)

Tensor("vector:0", shape=(2,), dtype=int32)
Tensor("matrix:0", shape=(2, 2), dtype=int32)


In [2]:
# Get values of a and b
with tf.Session(graph=graph) as sess:
  print('a')
  print(sess.run(a))
  print('b')
  print(sess.run(b))

a
[2 2]
b
[[0 2]
 [1 3]]


## 3) Math Ops
TensorFlow math ops are pretty standard. The following example shows a matrix division op.

In [3]:
import tensorflow as tf

graph = tf.Graph()
with graph.as_default():
  # Create constant a and b
  a = tf.constant([2,2], name = 'a', dtype = tf.float32)
  b = tf.constant([[0,1], [2,3]], name = 'b', dtype = tf.float32)
  
  # Create divide operation using b and a
  div = tf.div(b, a)

print('Print information of div op')
print(div.op)

print('\nPrint div')
print(div)

Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Print information of div op
name: "div"
op: "RealDiv"
input: "b"
input: "a"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}


Print div
Tensor("div:0", shape=(2, 2), dtype=float32)


Run div operations.

In [4]:
with tf.Session(graph=graph) as sess:
  print('Print div.op')
  print(sess.run(div.op))
  
  print('\nPrint div')
  print(sess.run(div))

Print div.op
None

Print div
[[0.  0.5]
 [1.  1.5]]


## 4) Variables

TensorFlow object to store mutable data (e.g., model parameters).

We can create variables using **`tf.get_variable`**, which allows us to provide the variable's internal name, shape, type, and initializer to give the variable its initial value.

```
tf.get_variable(
    name,
    shape=None,
    dtype=None,
    initializer=None,
    regularizer=None,
    trainable=True,
    collections=None,
    caching_device=None,
    partitioner=None,
    validate_shape=True,
    use_resource=None,
    custom_getter=None,
    constraint=None
)
```

Create three variables using `tf.get_variable`

In [5]:
import tensorflow as tf

graph = tf.Graph()
with graph.as_default():
  s = tf.get_variable('scalar', initializer=tf.constant(2))
  m = tf.get_variable('matrix', initializer=tf.constant([[0,1], [2,3]]))
  B = tf.get_variable('big_matrix', shape=(784, 10), initializer=tf.zeros_initializer())

Instructions for updating:
Colocations handled automatically by placer.


In [6]:
with tf.Session(graph=graph) as sess:
  print(sess.run(s)) # [SPOILER] Don't get scared even you encounter an error

FailedPreconditionError: ignored

### Initialize variables

Before using a variable, you must initialize it, or else you'll run into an error.

To initiliaze them all at once: use **`tf.global_variables_initializer()`**

In [7]:
with tf.Session(graph=graph) as sess:
  # Initialize variables
  sess.run(tf.global_variables_initializer())
  print('print s')
  print(sess.run(s))
  print('\nprint m')
  print(sess.run(m))
  print('\nprint B')
  print(sess.run(B))

print s
2

print m
[[0 1]
 [2 3]]

print B
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


### Evaluate values of variables

To get the value of a variable, we need to fetch it within a session.

This example shows how to evaluate the value (`sess.run` and `eval`).

In [8]:
graph = tf.Graph()
with graph.as_default():
  # v is a 784 x 10 variable of random values
  v = tf.get_variable('normal_matrix', shape=(784,10), initializer=tf.truncated_normal_initializer())

with tf.Session(graph=graph) as sess:
  # Initialize variables
  sess.run(tf.global_variables_initializer())
  
  # Get value with sess.run()
  v_sess = sess.run(v)
  print('v value with sess.run')
  print(v_sess)

  # Get value with v.eval()
  v_eval = v.eval()
  print('\nv value with v.eval()')
  print(v_eval)

v value with sess.run
[[-1.1046802   1.3352509   0.60328496 ... -0.68056965  0.52782935
  -0.20416825]
 [-0.43662754 -0.24992189 -0.28197387 ...  0.6746541   0.11988721
  -0.8294989 ]
 [ 0.6119669   0.6092383  -1.0912067  ...  0.39604026 -0.46768284
   1.3149389 ]
 ...
 [-0.08623898  0.71242857  0.8068247  ...  0.8159966  -0.427081
   0.38499048]
 [ 1.1006298  -0.71694905 -0.11410499 ...  1.5929395  -0.4554197
  -1.1927482 ]
 [ 1.394731   -0.07344379 -1.0126097  ...  1.251117   -0.8205014
   0.5391605 ]]

v value with v.eval()
[[-1.1046802   1.3352509   0.60328496 ... -0.68056965  0.52782935
  -0.20416825]
 [-0.43662754 -0.24992189 -0.28197387 ...  0.6746541   0.11988721
  -0.8294989 ]
 [ 0.6119669   0.6092383  -1.0912067  ...  0.39604026 -0.46768284
   1.3149389 ]
 ...
 [-0.08623898  0.71242857  0.8068247  ...  0.8159966  -0.427081
   0.38499048]
 [ 1.1006298  -0.71694905 -0.11410499 ...  1.5929395  -0.4554197
  -1.1927482 ]
 [ 1.394731   -0.07344379 -1.0126097  ...  1.251117   -0.820

To change the values of variables, use `tf.assign`.

You can see variable v changes after an *assign* operation is executed.

In [9]:
graph = tf.Graph()
with graph.as_default():
  # Create variable v.
  v = tf.get_variable("a", shape=(2), initializer=tf.ones_initializer())
  
  # Create two assign operations.
  assign_2 = tf.assign(v, [2, 2])
  assign_5 = tf.assign(v, [5, 5])
  
with tf.Session(graph=graph) as sess:
  sess.run(tf.global_variables_initializer())
  
  # Before applying assign op.
  v_val = sess.run(v)
  print('Initial value: %s' % (v_val))
  
  # Run assign_2
  sess.run(assign_2)
  print('After assign_2: %s' % sess.run(v))
  
  # Run assign_5
  sess.run(assign_5)
  print('After assign_5: %s' % sess.run(v))

Initial value: [1. 1.]
After assign_2: [2. 2.]
After assign_5: [5. 5.]


## 5) Let's train a simple model using MNIST dataset

### A. Prepare MNIST dataset

In [10]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
                                  
graph = tf.Graph()
with graph.as_default():
  x = tf.placeholder(tf.float32, [None, 784])
  y = tf.placeholder(tf.float32, [None, 10])

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py fr

### B. Create weights and bias
w is initialized to random normal distribution variables with mean 0 and standard deviation 0.01. b is initialized to 0's. The shape of w depends on the dimensions of X and Y so that Y = tf.matmul(X,w). The shape of b depends on Y.

In [0]:
with graph.as_default():
  W = tf.get_variable(name='weights', shape=[784, 10], initializer=tf.random_normal_initializer(0, 0.01))
  b = tf.get_variable(name='bias', shape=[10], initializer=tf.zeros_initializer())

### C. Define a loss function

The cross entropy of softmax of logits is our loss function.

In [0]:
with graph.as_default():
  logits = tf.matmul(x, W) + b
  entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y, name='entropy')
  loss = tf.reduce_mean(entropy, name='loss') # average over all the examples in the batch

### D. Define a training op

We'll use an Adam optimizer with a learning rate of 0.01 to minimize loss.

In [0]:
with graph.as_default():
  optimizer = tf.train.AdamOptimizer(0.01)
  train_op = optimizer.minimize(loss)

### E. Train the model

Finally, we train the model and see how the training loss decreases.

In [14]:
with tf.Session(graph=graph) as sess:
  init = tf.global_variables_initializer()
  sess.run(init)

  for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    _, loss_ = sess.run([train_op, loss], feed_dict={x: batch_xs, y: batch_ys})
    if i % 50 == 0:
      print('Step:', i, '  Loss:', loss_)

Step: 0   Loss: 2.2827573
Step: 50   Loss: 0.451783
Step: 100   Loss: 0.27187717
Step: 150   Loss: 0.33831757
Step: 200   Loss: 0.24838547
Step: 250   Loss: 0.57618994
Step: 300   Loss: 0.2964528
Step: 350   Loss: 0.2665732
Step: 400   Loss: 0.29460594
Step: 450   Loss: 0.23399359
Step: 500   Loss: 0.2070493
Step: 550   Loss: 0.31179148
Step: 600   Loss: 0.1148803
Step: 650   Loss: 0.33535203
Step: 700   Loss: 0.43505493
Step: 750   Loss: 0.1802402
Step: 800   Loss: 0.24920142
Step: 850   Loss: 0.37071893
Step: 900   Loss: 0.18913549
Step: 950   Loss: 0.33932912


## 6) Recurrent neural network


*   Learn sequential data
*   Ex. prediction of a word after a partial sentence, understanding of the current scene in a video based on previous scences

![RNN cell](https://drive.google.com/uc?id=1hvEtbzjuT8hxBtNNTrBRMxthCcfFN5FG) 
출처: https://colah.github.io/posts/2015-08-Understanding-LSTMs



## 7) LSTM


* Gradient vanishing problem: during backpropagation, as gradient is calculated by chain rule, the final grandient becomes almost zero
* Long short-term memory: solves gradient vanishing problem and handles long-term dependencies
![LSTM](https://drive.google.com/uc?id=1W9JKubIgJoyvzQy4U8KGNQWcVziu_6fT)
출처: https://colah.github.io/posts/2015-08-Understanding-LSTMs




Let's learn simple LSTM model for language modeling. The code comes from [TF-RNN tutorial](https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb)

# 2. Prepare PTB dataset



PTB is dataset widely used for natural language processing (NLP). It annotates syntactic or semantic label as a tree structure. A leaf node is matching to a word. 

In [19]:
#@title Click to download PTB
!rm -rf data*
!rm -rf simple-examples*
!wget http://www.fit.vutbr.cz/%7Eimikolov/rnnlm/simple-examples.tgz
!tar -xzf simple-examples.tgz
!mv simple-examples/data ./

--2019-05-31 05:30:54--  http://www.fit.vutbr.cz/%7Eimikolov/rnnlm/simple-examples.tgz
Resolving www.fit.vutbr.cz (www.fit.vutbr.cz)... 147.229.9.23, 2001:67c:1220:809::93e5:917
Connecting to www.fit.vutbr.cz (www.fit.vutbr.cz)|147.229.9.23|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34869662 (33M) [application/x-gtar]
Saving to: ‘simple-examples.tgz’


2019-05-31 05:31:03 (3.63 MB/s) - ‘simple-examples.tgz’ saved [34869662/34869662]



Let's list up the files in the downloaded directory (`data`). Notice that a command beginning with `!`(exclamation mark) executes the shell command; in this case we will run `ls`, which *lists status* of a directory (or files).

In [20]:
!ls data

ptb.char.test.txt   ptb.char.valid.txt	ptb.train.txt  README
ptb.char.train.txt  ptb.test.txt	ptb.valid.txt


## Define input preprocessing functions
We provide utility functions to process the input files.
* `_read_words(filename)`: creates a list of all words in the file (`filename`).
* `_build_vocab(filename)`: assigns a unique ID to each word in the file (`filename`).
* `_file_to_word_ids(filename, word_to_id)`: reads a file and converts each word with its *ID*

In [0]:
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================


"""Utilities for parsing PTB text files."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import os
import sys

import tensorflow as tf

Py3 = sys.version_info[0] == 3

def _read_words(filename):
  with tf.gfile.GFile(filename, "r") as f:
    if Py3:
      return f.read().replace("\n", "<eos>").split()
    else:
      return f.read().decode("utf-8").replace("\n", "<eos>").split()


def _build_vocab(filename):
  data = _read_words(filename)

  counter = collections.Counter(data)
  count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

  words, _ = list(zip(*count_pairs))
  word_to_id = dict(zip(words, range(len(words))))

  return word_to_id


def _file_to_word_ids(filename, word_to_id):
  data = _read_words(filename)
  return [word_to_id[word] for word in data if word in word_to_id]

Let's see how the files look like. We will use `_read_words()` function followed by `join` for parsing the results.

In [18]:
train_words = _read_words('data/ptb.train.txt')
print('Train set:', ', '.join(train_words[:100]))

test_words = _read_words('data/ptb.test.txt')
print('Test set:', ', '.join(test_words[:100]))

valid_words = _read_words('data/ptb.valid.txt')
print('Validate set:', ', '.join(valid_words[:100]))

Train set: aer, banknote, berlitz, calloway, centrust, cluett, fromstein, gitano, guterman, hydro-quebec, ipo, kia, memotec, mlx, nahb, punts, rake, regatta, rubens, sim, snack-food, ssangyong, swapo, wachter, <eos>, pierre, <unk>, N, years, old, will, join, the, board, as, a, nonexecutive, director, nov., N, <eos>, mr., <unk>, is, chairman, of, <unk>, n.v., the, dutch, publishing, group, <eos>, rudolph, <unk>, N, years, old, and, former, chairman, of, consolidated, gold, fields, plc, was, named, a, nonexecutive, director, of, this, british, industrial, conglomerate, <eos>, a, form, of, asbestos, once, used, to, make, kent, cigarette, filters, has, caused, a, high, percentage, of, cancer, deaths, among, a, group, of
Test set: no, it, was, n't, black, monday, <eos>, but, while, the, new, york, stock, exchange, did, n't, fall, apart, friday, as, the, dow, jones, industrial, average, plunged, N, points, most, of, it, in, the, final, hour, it, barely, managed, to, stay, this, side, of, cha

### Quiz 1
Find the identifier of the word "*market*" using the defined functions above.

**Hint**: use `_build_vocab()`

In [26]:
train_file = 'data/ptb.train.txt'
word_to_id = _build_vocab(train_file)
print(word_to_id['market'])

47


## More input preprocessing functions

* `ptb_raw_data(data_path)`: Returns `(train_data, valid_data, test_data, vocabulary)` by combining the utility functions above.
* `ptb_producer(raw_data, batch_size, num_steps, name)`: For computational reasons, we will process data in mini-batches of size `batch_size`, Every word in a batch should correspond to a time t. TensorFlow will automatically sum the gradients of each batch for you.



In [0]:
def ptb_raw_data(data_path=None):
  """Load PTB raw data from data directory "data_path".

  Reads PTB text files, converts strings to integer ids,
  and performs mini-batching of the inputs.

  The PTB dataset comes from Tomas Mikolov's webpage:

  http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz

  Args:
    data_path: string path to the directory where simple-examples.tgz has
      been extracted.

  Returns:
    tuple (train_data, valid_data, test_data, vocabulary)
    where each of the data objects can be passed to PTBIterator.
  """

  train_path = os.path.join(data_path, "ptb.train.txt")
  valid_path = os.path.join(data_path, "ptb.valid.txt")
  test_path = os.path.join(data_path, "ptb.test.txt")

  word_to_id = _build_vocab(train_path)
  train_data = _file_to_word_ids(train_path, word_to_id)
  valid_data = _file_to_word_ids(valid_path, word_to_id)
  test_data = _file_to_word_ids(test_path, word_to_id)
  vocabulary = len(word_to_id)
  return train_data, valid_data, test_data, vocabulary


def ptb_producer(raw_data, batch_size, num_steps, name=None):
  """Iterate on the raw PTB data.

  This chunks up raw_data into batches of examples and returns Tensors that
  are drawn from these batches.

  Args:
    raw_data: one of the raw data outputs from ptb_raw_data.
    batch_size: int, the batch size.
    num_steps: int, the number of unrolls.
    name: the name of this operation (optional).

  Returns:
    A pair of Tensors, each shaped [batch_size, num_steps]. The second element
    of the tuple is the same data time-shifted to the right by one.

  Raises:
    tf.errors.InvalidArgumentError: if batch_size or num_steps are too high.
  """
  with tf.name_scope(name, "PTBProducer", [raw_data, batch_size, num_steps]):
    raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

    data_len = tf.size(raw_data)
    batch_len = data_len // batch_size
    data = tf.reshape(raw_data[0 : batch_size * batch_len],
                      [batch_size, batch_len])

    epoch_size = (batch_len - 1) // num_steps
    assertion = tf.assert_positive(
        epoch_size,
        message="epoch_size == 0, decrease batch_size or num_steps")
    with tf.control_dependencies([assertion]):
      epoch_size = tf.identity(epoch_size, name="epoch_size")

    i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()
    x = tf.strided_slice(data, [0, i * num_steps],
                         [batch_size, (i + 1) * num_steps])
    x.set_shape([batch_size, num_steps])
    y = tf.strided_slice(data, [0, i * num_steps + 1],
                         [batch_size, (i + 1) * num_steps + 1])
    y.set_shape([batch_size, num_steps])
    return x, y

### Quiz 2
Guess how many words in x or y of `ptb_producer()` when 
**batch_size=2 and num_steps=5.**

Then check it yourself.
**HINT**
* `x, y = ptb_producer(...)`
* `x_, y_ = sess.run(...)`


In [32]:
data_path='data'
train_data, _, _, _ = ptb_raw_data(data_path)
batch_size = 2
num_steps = 5
    
with tf.Graph().as_default():    
    x, y = ptb_producer(train_data, batch_size, num_steps, name=None)
  
    print('x', x)
    print('y', y)
    
    sess = tf.Session()
    tf.train.start_queue_runners(sess = sess)
    
    x_, y_ =  sess.run(x),sess.run(y)

    print('x_', x_)
    print('y_', y_)
    

x Tensor("PTBProducer/StridedSlice:0", shape=(2, 5), dtype=int32)
y Tensor("PTBProducer/StridedSlice_1:0", shape=(2, 5), dtype=int32)
x_ [[9970 9971 9972 9974 9975]
 [1969    0   98   89 2254]]
y_ [[9980 9981 9982 9983 9984]
 [ 312 1641    4 1063    8]]


# 3. Build RNN model


### Setting hyperparameters
We provide a pre-defined configuration in `medium` scale; you can find other configurations at the [TF RNN tutorial](https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py#L320).

In [0]:
class MediumConfig(object):
  """Medium config."""
  init_scale = 0.05   # the initial scale of the weights
  learning_rate = 1.0 # the initial value of the learning rate
  max_grad_norm = 5   # the maximum permissible norm of the gradient

  num_layers = 2      # the number of LSTM layers
  num_steps = 35      # the number of unrolled steps of LSTM
  hidden_size = 650   # the number of LSTM units

  max_epoch = 6       # the number of epochs trained with the initial learning rate
  max_max_epoch = 39  # the total number of epochs for training
  keep_prob = 0.5     # the probability of keeping weights in the dropout layer
  lr_decay = 0.8      # the decay of the learning rate for each epoch after "max_epoch"
  batch_size = 20     # the batch size 
  vocab_size = 10000  # the vocabulary size
   
config = MediumConfig()

### Word embeddings 
Convert word ids to vector representations by using `tf.nn.embedding_lookup`.

In [35]:
class PTBInput(object):
    """The input data."""
    def __init__(self, config, data, name=None):
      self.batch_size = batch_size = config.batch_size
      self.num_steps = num_steps = config.num_steps
      self.epoch_size = ((len(data) // batch_size) - 1) // num_steps
      self.input_data, self.targets = ptb_producer(
          data, batch_size, num_steps, name=name)
      
def inputs(input):
    embedding_size = config.hidden_size # LSTM의 은닉층의 수와 동일하게 
    vocab_size = config.vocab_size
    with tf.device("/cpu:0"):
      embedding = tf.get_variable(
          "embedding", [vocab_size, embedding_size])
      inputs = tf.nn.embedding_lookup(embedding, input.input_data)
      return inputs
  
with tf.Graph().as_default():
    data_path = 'data'
    raw_data = ptb_raw_data(data_path)
    train_data, valid_data, test_data, _ = raw_data
    train_input = PTBInput(config=config, data=train_data, name="TrainInput")
    print(inputs(train_input))

Tensor("embedding_lookup/Identity:0", shape=(20, 35, 650), dtype=float32, device=/device:CPU:0)


### Quiz 3
Guess the tensor shape of output of embedding lookup when reading embedding for wordIDs=[3, 9, 20] (vocab_size=100, embedding_size=4)

Then check it yourself. 
What happens if wordIs=[3,9,200] if other conditions are the same?

In [50]:
vocab_size = 100
embedding_size = 4
with tf.Graph().as_default():
  wordIDs=tf.placeholder(dtype=tf.int32)
   
  embedding = tf.get_variable(
      "embedding", [vocab_size, embedding_size])
  
  lookup = tf.nn.embedding_lookup(embedding, wordIDs)
  
  init = tf.global_variables_initializer()
  sess = tf.Session()
  sess.run(init)
  
  # Fill below
  print(sess.run(lookup, feed_dict = {wordIDs:[3,9,20]}))
  print(sess.run(lookup, feed_dict = {wordIDs:[3,9,200]}))

[[ 0.14736399  0.01971614 -0.18641701  0.02163848]
 [ 0.166616    0.01221365  0.11983153  0.03308383]
 [-0.0332275  -0.155791    0.21573529  0.14656037]]
[[ 0.14736399  0.01971614 -0.18641701  0.02163848]
 [ 0.166616    0.01221365  0.11983153  0.03308383]
 [ 0.          0.          0.          0.        ]]


### Define RNN graph

In [51]:
def build_rnn_graph_lstm(inputs, config, is_training=True):
    """Build the inference graph using canonical LSTM cells."""
    def make_cell():
      cell = tf.contrib.rnn.BasicLSTMCell(
          config.hidden_size, forget_bias=0.0, state_is_tuple=True,
          reuse=not is_training)
      if is_training and config.keep_prob < 1:
        cell = tf.contrib.rnn.DropoutWrapper(
            cell, output_keep_prob=config.keep_prob)
      return cell

    # Stacking multiple LSTMs
    cell = tf.contrib.rnn.MultiRNNCell(
        [make_cell() for _ in range(config.num_layers)], state_is_tuple=True)

    initial_state = cell.zero_state(config.batch_size, tf.float32)
    state = initial_state
    
    # Simplified version of tf.nn.static_rnn().
    # This builds an unrolled LSTM for tutorial purposes only.
    outputs = []
    with tf.variable_scope("RNN"):
      for time_step in range(config.num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
    output = tf.reshape(tf.concat(outputs, 1), [-1, config.hidden_size])
    return output, state, initial_state
  
  
with tf.Graph().as_default():
    data_path = 'data'
    raw_data = ptb_raw_data(data_path)
    train_data, valid_data, test_data, _ = raw_data
    train_input = PTBInput(config=config, data=train_data, name="TrainInput")
    inputs_ = inputs(train_input)
    
    output, final_state, initial_state = build_rnn_graph_lstm(inputs_, config)
    print(output)

Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Tensor("Reshape:0", shape=(700, 650), dtype=float32)


### Define loss 
`tf.contrib.seq2seq.sequence_loss`: Weighted cross-entropy loss for a sequence of logits.

In [0]:
def loss(config, input_, output):
  softmax_w = tf.get_variable(
        "softmax_w", [config.hidden_size, config.vocab_size], tf.float32)
  softmax_b = tf.get_variable("softmax_b", [config.vocab_size], tf.float32)
  logits = tf.nn.xw_plus_b(output, softmax_w, softmax_b)
  # Reshape logits to be a 3-D tensor for sequence loss
  logits = tf.reshape(logits, [config.batch_size, config.num_steps, config.vocab_size])

  # Use the contrib sequence loss and average over the batches
  loss = tf.contrib.seq2seq.sequence_loss(
        logits,
        input_.targets,
        tf.ones([config.batch_size, config.num_steps], tf.float32),
        average_across_timesteps=False,
        average_across_batch=True)
   
  cost = tf.reduce_sum(loss)
  return cost
    

### Define optimizer(train_op)

In [0]:
def optimizer(cost):
  tvars = tf.trainable_variables()
  grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
                                    config.max_grad_norm)
  optimizer = tf.train.GradientDescentOptimizer(1.0)
  train_op = optimizer.apply_gradients(
        zip(grads, tvars),
        global_step=tf.train.get_or_create_global_step())
  return train_op

# 4. Run RNN model


In [54]:
import numpy as np
import time

def define_graph(config):
  data_path = 'data'
  raw_data = ptb_raw_data(data_path)
  train_data, valid_data, test_data, _ = raw_data
  train_input = PTBInput(config=config, data=train_data, name="TrainInput")
  initializer = tf.random_uniform_initializer(-config.init_scale,
                                              config.init_scale)
  with tf.variable_scope("Model", reuse=None, initializer=initializer):
      output, final_state, initial_state = build_rnn_graph_lstm(inputs(train_input), config)
    
      cost = loss(config, train_input, output)
      train_op = optimizer(cost)
  return cost, initial_state, final_state, train_op
      
def run(sess, cost, initial_state, final_state, train_op):
  init = tf.global_variables_initializer()
  tf.train.start_queue_runners(sess=sess)
  sess.run(init)
    
  state = sess.run(initial_state)
    
  costs = 0.0
  iters = 0
  start_time = time.time()
  for step in range(500):
      feed_dict = {}
      for i, (c, h) in enumerate(initial_state):
        feed_dict[c] = state[i].c
        feed_dict[h] = state[i].h
      
      cost_, state, _ = sess.run((cost, final_state, train_op), feed_dict=feed_dict)
      costs += cost_
      iters += config.num_steps
      
      if step % 50 == 0:
         print("[step:%d] perplexity: %.3f   speed: %.0f word/sec" %
            (step, np.exp(costs / iters),
             iters * config.batch_size /
             (time.time() - start_time)))


config =  MediumConfig()
with tf.Graph().as_default():
    cost, initial_state, final_state, train_op = define_graph(config)
   
    sess = tf.Session()
    run(sess, cost, initial_state, final_state, train_op)

Instructions for updating:
Use tf.cast instead.
[step:0] perplexity: 10016.832   speed: 1376 word/sec
[step:50] perplexity: 1967.532   speed: 4732 word/sec
[step:100] perplexity: 1412.284   speed: 4881 word/sec
[step:150] perplexity: 1153.928   speed: 4927 word/sec
[step:200] perplexity: 999.808   speed: 4953 word/sec
[step:250] perplexity: 893.311   speed: 4970 word/sec
[step:300] perplexity: 814.279   speed: 4984 word/sec
[step:350] perplexity: 751.620   speed: 4995 word/sec
[step:400] perplexity: 697.457   speed: 4998 word/sec
[step:450] perplexity: 654.936   speed: 5003 word/sec


# 5. Visualize using TensorBoard

### Preparation: Setting Up TensorBoard
**NOTE**:  This setting is only for colab (you may not know the details).

TensorBoard opens an HTTP endpoint that users can access to the UI. However, Colab is a system running on cloud servers that do not expose the machine's public IP address. `ngrok` is a service that opens a proxy server and makes the machine accessible via URL provided by it.

If you use a standalone server (instead of colab), you can just use **'tensorboard --logdir=<logdir> --port=<port>'**.

In [55]:
#download and unzip ngrok
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

--2019-05-31 06:08:18--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 52.4.95.48, 34.196.237.103, 52.72.145.109, ...
Connecting to bin.equinox.io (bin.equinox.io)|52.4.95.48|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16648024 (16M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip’


2019-05-31 06:08:18 (70.2 MB/s) - ‘ngrok-stable-linux-amd64.zip’ saved [16648024/16648024]

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   


In [0]:
#run tensorboard
LOG_DIR = './log'
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)
#run ngrok
get_ipython().system_raw('./ngrok http 6006 &')

### Visualize the graph

TensorBoard provides a UI for rendering the NN graph to run. We need to write the defined graph by using `tf.summary.FileWriter`.

In [0]:
config =  MediumConfig()
with tf.Graph().as_default():
    define_graph(config)
    writer = tf.summary.FileWriter('./log', tf.get_default_graph())

Let's get the public IP address and open the URL.

In [58]:
# Get the publicly accessible URL for TensorBoard
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

https://d3a9555e.ngrok.io


## Visualize the perplexity

In [0]:
def run_with_summary(sess, cost, initial_state, final_state, train_op):
  writer = tf.summary.FileWriter('./log', tf.get_default_graph())
  init = tf.global_variables_initializer()
  tf.train.start_queue_runners(sess=sess)
  sess.run(init)
    
  state = sess.run(initial_state)
    
  costs = 0.0
  iters = 0
  start_time = time.time()
  for step in range(1500):
      feed_dict = {}
      for i, (c, h) in enumerate(initial_state):
        feed_dict[c] = state[i].c
        feed_dict[h] = state[i].h
      
      cost_, state, _ = sess.run((cost, final_state, train_op), feed_dict=feed_dict)
      costs += cost_
      iters += config.num_steps
      
      #add summary
      perplexity_summ = tf.Summary()
      perplexity_summ.value.add(
        tag='perplexity', simple_value=np.exp(costs/iters))
      
      writer.add_summary(perplexity_summ, step)
      if step % 100 == 0:
         print("[step:%d] perplexity: %.3f   speed: %.0f word/sec" %
            (step, np.exp(costs / iters),
             iters * config.batch_size /
             (time.time() - start_time)))

In [60]:
config =  MediumConfig()
with tf.Graph().as_default():
    cost, initial_state, final_state, train_op = define_graph(config)
   
    sess = tf.Session()
    run_with_summary(sess, cost, initial_state, final_state, train_op)

[step:0] perplexity: 9972.362   speed: 1386 word/sec
[step:100] perplexity: 1451.912   speed: 4903 word/sec
[step:200] perplexity: 1028.920   speed: 4989 word/sec
[step:300] perplexity: 833.964   speed: 5018 word/sec
[step:400] perplexity: 709.987   speed: 5038 word/sec
[step:500] perplexity: 626.509   speed: 5048 word/sec
[step:600] perplexity: 567.043   speed: 5053 word/sec
[step:700] perplexity: 519.573   speed: 5060 word/sec
[step:800] perplexity: 478.507   speed: 5064 word/sec
[step:900] perplexity: 446.803   speed: 5068 word/sec
[step:1000] perplexity: 422.496   speed: 5072 word/sec
[step:1100] perplexity: 398.944   speed: 5074 word/sec
[step:1200] perplexity: 380.023   speed: 5076 word/sec
[step:1300] perplexity: 362.717   speed: 5079 word/sec
[step:1400] perplexity: 349.679   speed: 5080 word/sec


Go back to the TensorBoard and check the `SCALARS` tab.