# RNN 실습 - 텐서플로우에서 LSTM 및 GRU 사용

1. RNN 모형 및 구성방법
1. 텐서플로우에서 지원하는 RNN 'Cell' 유형
1. 유형별 특성 테스트 위한 코드 구성
1. Test #1 - Vanila RNN
1. Test #2 - Basic LSTM
1. Test #3 - GRU
1. Test #4 - LSTMCell + forget_bias
1. Test #5 - LayerNormBasicLSTMCell
1. Test #6 - LayerNormBasicLSTMCell - what's wrong?
1. 정리

In [1]:
!rm -fr logdir
!mkdir -p logdir

In [2]:
%load_ext do_not_print_href
%matplotlib inline
from __future__ import print_function, division
import sys
import time
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

## RNN 모형 (복습)


<img  src="rnn.jpg" style="width:55.5rem"/>

<center>이미지 출처: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/</center>

\begin{align}
s_t & = tanh(Ux_t + Ws_{t-1}) \\
o_t & = softmax(Vs_t) \\
\end{align}


### 하지만 텐서플로우 RNN 은...

<img  src="Selection_20170914_161757_e791.png"/>


- **_V_** 에 해당하는 구조가 없음

- **_softmax()_** 도 없음

- **_V_** 와 **_softmax()_** 는 필요할 때만 만들어서 붙이면 됨 ( `tf.layers.dense`, `tf.nn.softmax` )



## RNN 모델 구성 (복습)

- [`tf.contrib.rnn.BasicRNNCell`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/basicrnncell)

<code>
  `__init__`(
    <span style="color:red">num_units,</span>
    activation=None,
    reuse=None
  )
</code>


- [`tf.nn.dynamic_rnn`](http://devdocs.io/tensorflow~python/tf/nn/dynamic_rnn)

<code>
  dynamic_rnn(
    <span style="color:red">cell,</span>
    <span style="color:red">inputs,</span>
    <span style="color:red">sequence_length=None,</span>
    initial_state=None,
    dtype=None,
    parallel_iterations=None,
    swap_memory=False,
    time_major=False,
    scope=None
  )
</code>


- RNN 구성


<code>
    cell            = tf.contrib.rnn.<b style="color:red">BasicRNNCell</b>(
                        num_hidden_units)
    last, states    = tf.nn.dynamic_rnn(
                        cell, 
                        inputs, 
                        <i style="color:red">sequence_length=sequence_length,</i>
                        dtype=tf.float32)
</code>



## RNN Cell 종류

- [`tf.contrib.rnn.BasicRNNCell`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/basicrnncell)

<code>
    ...
</code>

- [`tf.contrib.rnn.BasicLSTMCell`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/basiclstmcell)

<code>
    `__init__`(
        <span style="color:red">num_units,</span>
        forget_bias=1.0,
        state_is_tuple=True,
        activation=None,
        reuse=None
    )
</code>

- [`tf.contrib.rnn.LSTMCell`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/lstmcell)

<code>
    `__init__`(
        <span style="color:red">num_units,</span>
        use_peepholes=False,
        cell_clip=None,
        initializer=None,
        num_proj=None,
        proj_clip=None,
        num_unit_shards=None,
        num_proj_shards=None,
        forget_bias=1.0,
        state_is_tuple=True,
        activation=None,
        reuse=None
    )
</code>

- [`tf.contrib.rnn.GRUCell`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/grucell)

<code>
    `__init__`(
        <span style="color:red">num_units,</span>
        activation=None,
        reuse=None,
        kernel_initializer=None,
        bias_initializer=None
    )
</code>

- [`tf.contrib.rnn.LayerNormBasicLSTMCell`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/layernormbasiclstmcell)

> LSTM unit with **layer normalization** and **recurrent dropout.**

<code>
    `__init__`(
        <span style="color:red">num_units,</span>
        forget_bias=1.0,
        input_size=None,
        activation=tf.tanh,
        layer_norm=True,
        norm_gain=1.0,
        norm_shift=0.0,
        dropout_keep_prob=1.0,
        dropout_prob_seed=None,
        reuse=None
    )
</code>

- 기타 여러가지 Wrapper 지원 - https://www.tensorflow.org/api_guides/python/contrib.rnn

In [3]:
from tensorflow.examples.tutorials.mnist.input_data \
    import read_data_sets

mnist = read_data_sets('./mnist', one_hot=False)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting ./mnist/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting ./mnist/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./mnist/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./mnist/t10k-labels-idx1-ubyte.gz


In [4]:
INPUT_UNITS = 28
NUM_HIDDEN_UNITS = 31

BATCH_SIZE = 128
MAX_SEQ_LEN = 28

In [5]:
class MnistRnn:
    def __init__(self, 
                 inputs, 
                 labels, 
                 input_units, 
                 num_hidden_units, 
                 batch_size, 
                 max_seq_len,
                 rnn_cell_class = tf.contrib.rnn.BasicRNNCell,
                 # 여기 이후의 인수는 잠깐 무시 해 주세요
                 add_check = False,
                 lr = 0.001,
                 use_grad_clip = False):
        '''
        inputs: in shape [batch_size, max_seq_len, input_size]
        labels: in shape [batch_size]
        '''

        cell            = rnn_cell_class(num_hidden_units)
        sequence_length = [max_seq_len] * batch_size
        last, states    = tf.nn.dynamic_rnn(
                            cell, 
                            inputs, 
                            sequence_length=sequence_length, 
                            dtype=tf.float32)

        # max_seq_len 축으로 0~27 까지 값 중에 
        # 0~26 때의 출력 값은 사용하지 않음
        rnn_output = last[:,max_seq_len-1,:]
        # outputs shape will be: [batch_size, 10]
        outputs    = tf.layers.dense(rnn_output, 10)
        loss       = tf.losses.sparse_softmax_cross_entropy(
                        labels, outputs)
        
        if use_grad_clip:
            tvars_     = tf.trainable_variables()
            grads_, _  = tf.clip_by_global_norm(
                            tf.gradients(
                                loss,
                                tvars_),
                            5.0)
            optimize   = \
                tf.train.AdamOptimizer(learning_rate=lr). \
                            apply_gradients(zip(grads_, tvars_))
        else:
            optimize   = \
                tf.train.AdamOptimizer(learning_rate=lr). \
                            minimize(loss)

        # accuracy
        preds    = tf.argmax(outputs, axis=1)
        errors   = tf.count_nonzero(labels - preds)
        accuracy = 1.0 - tf.cast(errors,tf.float32) / \
                         tf.cast(tf.size(preds),tf.float32)

        # 클래스 객체 외부에서 참고할 수 있도록 속성으로 저장
        self.outputs        = outputs
        self.loss           = loss
        self.optimize       = optimize
        self.accuracy       = accuracy
        
        # check_numerics
        self.check = [tf.check_numerics(
                        t,
                        'check_numerics: {}'.format(t.name)) \
                      for t in tf.gradients(
                                  loss,
                                  tf.trainable_variables()) \
                      if t is not None] \
                     if add_check \
                     else tf.constant(1.0)

In [6]:
train_loop_count = mnist.train.num_examples // BATCH_SIZE
test_loop_count  = mnist.test.num_examples // BATCH_SIZE

train_loop_count, test_loop_count

(429, 78)

In [7]:
def train(inputs, labels, max_epochs, train_writer, test_writer):
    step = 0
    for ep in range(max_epochs):

        train_elapsed = []
        train_losses = []
        train_accuracy = []
        for i in range(train_loop_count):
            t_start     = time.time()
            offs        = i * BATCH_SIZE
            batch_input = \
                    mnist.train.images[offs:offs+BATCH_SIZE,:]
            batch_input = \
                    batch_input.reshape(
                            [BATCH_SIZE,
                               MAX_SEQ_LEN,
                               INPUT_UNITS])
            batch_label = \
                    mnist.train.labels[offs:offs+BATCH_SIZE]
            optimize, loss, accuracy, _ = \
                sess.run([model.optimize,
                          model.loss,
                          model.accuracy,
                          model.check],
                         feed_dict = {
                          inputs: batch_input,
                          labels: batch_label })
            train_losses.append(loss)
            train_accuracy.append(accuracy)
            t_elapsed   = time.time() - t_start
            train_elapsed.append(t_elapsed)

            step += 1
            summary = tf.Summary(
                value=[
                    tf.Summary.Value(
                        tag='train_accuracy',
                        simple_value=accuracy
                    ),
                    tf.Summary.Value(
                        tag='loss',
                        simple_value=loss
                    ),
                ]
            )
            train_writer.add_summary(summary,global_step=step)

            if step % 250 == 0:
                print(('[trn] ep {:d}, step {:d}, ' + 
                       'loss {:f}, accu {:f}, ' + 
                       'sec/iter {:f}').format(
                    ep + 1,
                    step,
                    np.mean(train_losses),
                    np.amin(train_accuracy),
                    np.mean(train_elapsed)))
                train_losses = []
                train_accuracy = []
                train_elapsed = []
                
        train_writer.flush()

        test_elapsed  = []
        test_accuracy = []
        
        for i in range(test_loop_count):
            t_start     = time.time()
            offs        = i * BATCH_SIZE
            batch_input = mnist.test.images[offs:offs+BATCH_SIZE,:]
            batch_input = batch_input.reshape(
                            [BATCH_SIZE,
                               MAX_SEQ_LEN,
                               INPUT_UNITS])
            batch_label = mnist.test.labels[offs:offs+BATCH_SIZE]
            accuracy, = \
                sess.run([model.accuracy],
                         feed_dict = {
                          inputs: batch_input,
                          labels: batch_label })
            test_accuracy.append(accuracy)
            t_elapsed   = time.time() - t_start
            test_elapsed.append(t_elapsed)

            step += 1
            
        if len(test_accuracy) > 0:
            print(('[tst] ep {:d}, step {:d}, ' +
                   'accu {:f}, sec/iter {:f}').format(
                ep + 1,
                step,
                np.amin(test_accuracy),
                np.mean(test_elapsed)))

            summary = tf.Summary(
                value=[
                    tf.Summary.Value(
                        tag='test_accuracy',
                        simple_value=np.amin(test_accuracy)
                    ),
                ]
            )
            test_writer.add_summary(summary,global_step=step)
            test_writer.flush()


## Test #1 - Vanila RNN

<div  style="border:1px solid black;border-radius:5px">
<code>
    cell = tf.contrib.rnn.BasicRNNCell(num_hidden_units)

</code>
</div>

In [8]:
tf.reset_default_graph()

inputs_ = tf.placeholder(tf.float32,
                         [BATCH_SIZE, MAX_SEQ_LEN, INPUT_UNITS],
                         name='inputs')
labels_ = tf.placeholder(tf.int64,
                         [BATCH_SIZE],
                         name='labels')


model = MnistRnn(inputs_,
                 labels_,
                 INPUT_UNITS,
                 NUM_HIDDEN_UNITS,
                 BATCH_SIZE,
                 MAX_SEQ_LEN,
                 tf.contrib.rnn.BasicRNNCell)

In [9]:
config = tf.ConfigProto(gpu_options={'allow_growth':True})
sess = tf.InteractiveSession(config=config)

tf.global_variables_initializer().run()

train_writer = tf.summary.FileWriter('logdir/train_basic_rnn',
                                     graph=tf.get_default_graph())
test_writer  = tf.summary.FileWriter('logdir/test_basic_rnn',
                                     graph=tf.get_default_graph())

train(inputs_, labels_, 10, train_writer, test_writer)

[trn] ep 1, step 250, loss 1.651489, accu 0.125000, sec/iter 0.006007
[tst] ep 1, step 507, accu 0.492188, sec/iter 0.002145
[trn] ep 2, step 750, loss 0.938822, accu 0.562500, sec/iter 0.005452
[tst] ep 2, step 1014, accu 0.609375, sec/iter 0.002061
[trn] ep 3, step 1250, loss 0.674857, accu 0.656250, sec/iter 0.005516
[tst] ep 3, step 1521, accu 0.695312, sec/iter 0.002071
[trn] ep 4, step 1750, loss 0.540000, accu 0.679688, sec/iter 0.005494
[tst] ep 4, step 2028, accu 0.765625, sec/iter 0.002070
[trn] ep 5, step 2250, loss 0.464122, accu 0.734375, sec/iter 0.005492
[tst] ep 5, step 2535, accu 0.781250, sec/iter 0.002068
[trn] ep 6, step 2750, loss 0.412271, accu 0.765625, sec/iter 0.005480
[tst] ep 6, step 3042, accu 0.789062, sec/iter 0.002070
[trn] ep 7, step 3250, loss 0.369663, accu 0.781250, sec/iter 0.005501
[tst] ep 7, step 3549, accu 0.812500, sec/iter 0.002609
[trn] ep 8, step 3750, loss 0.340434, accu 0.804688, sec/iter 0.005532
[tst] ep 8, step 4056, accu 0.828125, sec/i

In [10]:
# !tensorboard --ip 0.0.0.0 --logdir logdir

## Test #2 - Basic LSTM

<div  style="border:1px solid black;border-radius:5px">
<code>
    cell = tf.contrib.rnn.BasicLSTMCell(num_hidden_units)

</code>
</div>

In [10]:
tf.reset_default_graph()

inputs_ = tf.placeholder(tf.float32,
                         [BATCH_SIZE, MAX_SEQ_LEN, INPUT_UNITS],
                         name='inputs')
labels_ = tf.placeholder(tf.int64,
                         [BATCH_SIZE],
                         name='labels')

model = MnistRnn(inputs_,
                 labels_,
                 INPUT_UNITS,
                 NUM_HIDDEN_UNITS,
                 BATCH_SIZE,
                 MAX_SEQ_LEN,
                 tf.contrib.rnn.BasicLSTMCell)

In [11]:
config = tf.ConfigProto(gpu_options={'allow_growth':True})
sess = tf.InteractiveSession(config=config)

tf.global_variables_initializer().run()

train_writer = tf.summary.FileWriter('logdir/train_basic_lstm',
                                     graph=tf.get_default_graph())
test_writer  = tf.summary.FileWriter('logdir/test_basic_lstm',
                                     graph=tf.get_default_graph())

train(inputs_, labels_, 10, train_writer, test_writer)

[trn] ep 1, step 250, loss 1.473593, accu 0.085938, sec/iter 0.010070
[tst] ep 1, step 507, accu 0.585938, sec/iter 0.003632
[trn] ep 2, step 750, loss 0.545243, accu 0.609375, sec/iter 0.009967
[tst] ep 2, step 1014, accu 0.734375, sec/iter 0.003484
[trn] ep 3, step 1250, loss 0.347986, accu 0.773438, sec/iter 0.009925
[tst] ep 3, step 1521, accu 0.812500, sec/iter 0.003484
[trn] ep 4, step 1750, loss 0.263314, accu 0.843750, sec/iter 0.009952
[tst] ep 4, step 2028, accu 0.828125, sec/iter 0.003484
[trn] ep 5, step 2250, loss 0.217739, accu 0.859375, sec/iter 0.009974
[tst] ep 5, step 2535, accu 0.867188, sec/iter 0.003489
[trn] ep 6, step 2750, loss 0.188490, accu 0.875000, sec/iter 0.009954
[tst] ep 6, step 3042, accu 0.859375, sec/iter 0.003511
[trn] ep 7, step 3250, loss 0.166515, accu 0.867188, sec/iter 0.010235
[tst] ep 7, step 3549, accu 0.875000, sec/iter 0.003678
[trn] ep 8, step 3750, loss 0.151056, accu 0.875000, sec/iter 0.010457
[tst] ep 8, step 4056, accu 0.875000, sec/i

In [13]:
# !tensorboard --ip 0.0.0.0 --logdir logdir

## Test #3 - GRU

<div  style="border:1px solid black;border-radius:5px">
<code>
    cell = tf.contrib.rnn.GRUCell(num_hidden_units)

</code>
</div>

In [12]:
tf.reset_default_graph()

inputs_ = tf.placeholder(tf.float32,
                         [BATCH_SIZE, MAX_SEQ_LEN, INPUT_UNITS],
                         name='inputs')
labels_ = tf.placeholder(tf.int64,
                         [BATCH_SIZE],
                         name='labels')

model = MnistRnn(inputs_,
                 labels_,
                 INPUT_UNITS,
                 NUM_HIDDEN_UNITS,
                 BATCH_SIZE,
                 MAX_SEQ_LEN,
                 tf.contrib.rnn.GRUCell)

In [13]:
config = tf.ConfigProto(gpu_options={'allow_growth':True})
sess = tf.InteractiveSession(config=config)

tf.global_variables_initializer().run()

train_writer = tf.summary.FileWriter(
                'logdir/train_gru',
                graph=tf.get_default_graph())
test_writer  = tf.summary.FileWriter(
                'logdir/test_gru',
                graph=tf.get_default_graph())

train(inputs_, labels_, 10, train_writer, test_writer)

[trn] ep 1, step 250, loss 1.479578, accu 0.062500, sec/iter 0.010345
[tst] ep 1, step 507, accu 0.742188, sec/iter 0.003404
[trn] ep 2, step 750, loss 0.440528, accu 0.726562, sec/iter 0.009981
[tst] ep 2, step 1014, accu 0.820312, sec/iter 0.003289
[trn] ep 3, step 1250, loss 0.280535, accu 0.796875, sec/iter 0.010241
[tst] ep 3, step 1521, accu 0.835938, sec/iter 0.003347
[trn] ep 4, step 1750, loss 0.218369, accu 0.851562, sec/iter 0.010361
[tst] ep 4, step 2028, accu 0.859375, sec/iter 0.003305
[trn] ep 5, step 2250, loss 0.185422, accu 0.867188, sec/iter 0.010057
[tst] ep 5, step 2535, accu 0.859375, sec/iter 0.003301
[trn] ep 6, step 2750, loss 0.164036, accu 0.882812, sec/iter 0.010039
[tst] ep 6, step 3042, accu 0.875000, sec/iter 0.003315
[trn] ep 7, step 3250, loss 0.146128, accu 0.890625, sec/iter 0.010076
[tst] ep 7, step 3549, accu 0.890625, sec/iter 0.003299
[trn] ep 8, step 3750, loss 0.131588, accu 0.906250, sec/iter 0.010091
[tst] ep 8, step 4056, accu 0.890625, sec/i

In [16]:
# !tensorboard --ip 0.0.0.0 --logdir logdir

## Test #4 - LSTMCell + forget_bias

<div  style="border:1px solid black;border-radius:5px">
<code>
    cell = tf.contrib.rnn.BasicLSTMCell(
            num_hidden_units,
            forget_bias=lstm_forget_bias)
 
</code>
</div>

In [14]:
tf.reset_default_graph()

inputs_ = tf.placeholder(tf.float32,
                         [BATCH_SIZE, MAX_SEQ_LEN, INPUT_UNITS],
                         name='inputs')
labels_ = tf.placeholder(tf.int64,
                         [BATCH_SIZE],
                         name='labels')

lstm_with_forget_bias = lambda num_hidden_units: \
    tf.contrib.rnn.BasicLSTMCell(
            num_hidden_units,
            forget_bias=5.0)

model = MnistRnn(inputs_,
                 labels_,
                 INPUT_UNITS,
                 NUM_HIDDEN_UNITS,
                 BATCH_SIZE,
                 MAX_SEQ_LEN,
                 lstm_with_forget_bias)

In [15]:
config = tf.ConfigProto(gpu_options={'allow_growth':True})
sess = tf.InteractiveSession(config=config)

tf.global_variables_initializer().run()

train_writer = tf.summary.FileWriter('logdir/train_lstm_forget_bias',
                                     graph=tf.get_default_graph())
test_writer  = tf.summary.FileWriter('logdir/test_lstm_forget_bias',
                                     graph=tf.get_default_graph())

In [16]:
train(inputs_, labels_, 10, train_writer, test_writer)

[trn] ep 1, step 250, loss 1.542441, accu 0.046875, sec/iter 0.010403
[tst] ep 1, step 507, accu 0.609375, sec/iter 0.003613
[trn] ep 2, step 750, loss 0.606840, accu 0.640625, sec/iter 0.010188
[tst] ep 2, step 1014, accu 0.679688, sec/iter 0.003494
[trn] ep 3, step 1250, loss 0.381709, accu 0.773438, sec/iter 0.010632
[tst] ep 3, step 1521, accu 0.789062, sec/iter 0.003493
[trn] ep 4, step 1750, loss 0.263739, accu 0.812500, sec/iter 0.010303
[tst] ep 4, step 2028, accu 0.828125, sec/iter 0.003933
[trn] ep 5, step 2250, loss 0.207675, accu 0.820312, sec/iter 0.011054
[tst] ep 5, step 2535, accu 0.851562, sec/iter 0.003492
[trn] ep 6, step 2750, loss 0.178904, accu 0.851562, sec/iter 0.010285
[tst] ep 6, step 3042, accu 0.875000, sec/iter 0.003489
[trn] ep 7, step 3250, loss 0.154697, accu 0.898438, sec/iter 0.010122
[tst] ep 7, step 3549, accu 0.882812, sec/iter 0.003511
[trn] ep 8, step 3750, loss 0.138137, accu 0.914062, sec/iter 0.010210
[tst] ep 8, step 4056, accu 0.890625, sec/i

In [20]:
# !tensorboard --ip 0.0.0.0 --logdir logdir

## Test #5 - LayerNormBasicLSTMCell

<div  style="border:1px solid black;border-radius:5px">
<code>
    cell = tf.contrib.rnn.LayerNormBasicLSTMCell(
                        num_hidden_units)
 
</code>
</div>

In [17]:
tf.reset_default_graph()

inputs_ = tf.placeholder(tf.float32,
                         [BATCH_SIZE, MAX_SEQ_LEN, INPUT_UNITS],
                         name='inputs')
labels_ = tf.placeholder(tf.int64,
                         [BATCH_SIZE],
                         name='labels')

model = MnistRnn(inputs_,
                 labels_,
                 INPUT_UNITS,
                 NUM_HIDDEN_UNITS,
                 BATCH_SIZE,
                 MAX_SEQ_LEN,
                 tf.contrib.rnn.LayerNormBasicLSTMCell)

In [18]:
config = tf.ConfigProto(gpu_options={'allow_growth':True})
sess = tf.InteractiveSession(config=config)

tf.global_variables_initializer().run()

train_writer = tf.summary.FileWriter('logdir/train_ln_basic_lstm',
                                     graph=tf.get_default_graph())
test_writer  = tf.summary.FileWriter('logdir/test_ln_basic_lstm',
                                     graph=tf.get_default_graph())

train(inputs_, labels_, 1, train_writer, test_writer)

[trn] ep 1, step 250, loss nan, accu 0.039062, sec/iter 0.050739
[tst] ep 1, step 507, accu 0.054688, sec/iter 0.008982


In [23]:
# !tensorboard --ip 0.0.0.0 --logdir logdir

## Test #6 - LayerNormBasicLSTMCell - what's wrong?

- [`tf.check_numerics()`](http://devdocs.io/tensorflow~python/tf/check_numerics)
> When run, reports an `InvalidArgument` error if tensor has any values that are not a number (NaN) or infinity (Inf). Otherwise, passes tensor as-is.

```
    check_numerics(
        tensor,
        message,
        name=None
    )
```


- [`tf.gradients()`](http://devdocs.io/tensorflow~python/tf/gradients)
> Constructs symbolic partial derivatives of sum of `ys` w.r.t. x in `xs`

```
    gradients(
        ys,
        xs,
        grad_ys=None,
        name='gradients',
        colocate_gradients_with_ops=False,
        gate_gradients=False,
        aggregation_method=None
    )
```


- 사용 예:
```
    # check_numerics
    self.check = [tf.check_numerics(t,
                    'check_numerics: {}'.format(t.name)) \
                  for t in tf.gradients(
                              loss,
                              tf.trainable_variables()) \
                  if t is not None]
    ...
    summary, optimize, loss, accuracy, _ = \
        sess.run([model.train_summary,
                  model.optimize,
                  model.loss,
                  model.accuracy,
                  model.check],
                  ...
```


In [19]:
tf.reset_default_graph()

inputs_ = tf.placeholder(tf.float32,
                         [BATCH_SIZE, MAX_SEQ_LEN, INPUT_UNITS],
                         name='inputs')
labels_ = tf.placeholder(tf.int64,
                         [BATCH_SIZE],
                         name='labels')

model = MnistRnn(inputs_,
                 labels_,
                 INPUT_UNITS,
                 NUM_HIDDEN_UNITS,
                 BATCH_SIZE,
                 MAX_SEQ_LEN,
                 tf.contrib.rnn.LayerNormBasicLSTMCell,
                 add_check = True)

# 여기서 NaN 문제가 생기는 걸 확인했으면, 다음 방법들을 시도 해 보세요
#  1. learning_rate 를 줄여
#       e.g.: lr = 0.0001
#  2. 적당한 값으로 gradient clipping
#       e.g.: use_grad_clip = True
#  3. 사용한 컴퍼넌트에 smoothing 할 수 있는 파라메터가 있는지 확인하고 적용
#     rnn_cell_class = lambda num_hidden_units: \
#        tf.contrib.rnn.LayerNormBasicLSTMCell(
#                             num_hidden_units,
#                             norm_gain=0.85,
#                             norm_shift=0.15)

In [20]:
config = tf.ConfigProto(gpu_options={'allow_growth':True})
sess = tf.InteractiveSession(config=config)

tf.global_variables_initializer().run()

train_writer = tf.summary.FileWriter(
                'logdir/train_ln_basic_lstm_2',
                graph=tf.get_default_graph())
test_writer  = tf.summary.FileWriter(
                'logdir/test_ln_basic_lstm_2',
                graph=tf.get_default_graph())

train(inputs_, labels_, 10, train_writer, test_writer)

InvalidArgumentError: check_numerics: gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/output_1/batchnorm/sub/Enter_grad/b_acc_3:0 : Tensor had NaN values
	 [[Node: CheckNumerics_8 = CheckNumerics[T=DT_FLOAT, message="check_numerics: gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/output_1/batchnorm/sub/Enter_grad/b_acc_3:0", _device="/job:localhost/replica:0/task:0/gpu:0"](gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/output_1/batchnorm/sub/Enter_grad/b_acc_3)]]
	 [[Node: CheckNumerics_6/_259 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_4660_CheckNumerics_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'CheckNumerics_8', defined at:
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-19-176ee0618262>", line 17, in <module>
    add_check = True)
  File "<ipython-input-5-76c82b1f46b3>", line 65, in __init__
    if add_check \
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 413, in check_numerics
    message=message, name=name)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/user01/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): check_numerics: gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/output_1/batchnorm/sub/Enter_grad/b_acc_3:0 : Tensor had NaN values
	 [[Node: CheckNumerics_8 = CheckNumerics[T=DT_FLOAT, message="check_numerics: gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/output_1/batchnorm/sub/Enter_grad/b_acc_3:0", _device="/job:localhost/replica:0/task:0/gpu:0"](gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/output_1/batchnorm/sub/Enter_grad/b_acc_3)]]
	 [[Node: CheckNumerics_6/_259 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_4660_CheckNumerics_6", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]


<div style="padding:5px;border:1px solid black;border-radius:5px">
```
InvalidArgumentError: check_numerics: gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/state_1/batchnorm/sub/Enter_grad/b_acc_3:0 : Tensor had NaN values
	 [[Node: CheckNumerics_10 = CheckNumerics[T=DT_FLOAT, message="check_numerics: gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/state_1/batchnorm/sub/Enter_grad/b_acc_3:0", _device="/job:localhost/replica:0/task:0/cpu:0"](gradients_1/rnn/while/rnn/layer_norm_basic_lstm_cell/state_1/batchnorm/sub/Enter_grad/b_acc_3)]]


```
</div>

In [26]:
# !tensorboard --ip 0.0.0.0 --logdir logdir

In [21]:
tf.reset_default_graph()

inputs_ = tf.placeholder(tf.float32,
                         [BATCH_SIZE, MAX_SEQ_LEN, INPUT_UNITS],
                         name='inputs')
labels_ = tf.placeholder(tf.int64,
                         [BATCH_SIZE],
                         name='labels')

model = MnistRnn(inputs_,
                 labels_,
                 INPUT_UNITS,
                 NUM_HIDDEN_UNITS,
                 BATCH_SIZE,
                 MAX_SEQ_LEN,
                 lambda num_hidden_units: \
                 tf.contrib.rnn.LayerNormBasicLSTMCell(
                             num_hidden_units,
                             norm_gain=0.85,
                             norm_shift=0.15),
                 add_check = True,
                 lr = 0.0001,
                 use_grad_clip = True)

In [22]:
config = tf.ConfigProto(gpu_options={'allow_growth':True})
sess = tf.InteractiveSession(config=config)

tf.global_variables_initializer().run()

train_writer = tf.summary.FileWriter(
                'logdir/train_ln_basic_lstm_2',
                graph=tf.get_default_graph())
test_writer  = tf.summary.FileWriter(
                'logdir/test_ln_basic_lstm_2',
                graph=tf.get_default_graph())

train(inputs_, labels_, 10, train_writer, test_writer)

[trn] ep 1, step 250, loss 2.072074, accu 0.078125, sec/iter 0.084061


KeyboardInterrupt: 

## 정리해 봅시다

### 텐서플로우에서 지원하는 RNN Cell 유형

- [`tf.contrib.rnn.BasicRNNCell()`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/basicrnncell)

- [`tf.contrib.rnn.BasicLSTMCell()`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/basiclstmcell)

- [`tf.contrib.rnn.LSTMCell()`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/lstmcell)

- [`tf.contrib.rnn.GRUCell()`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/grucell)

- [`tf.contrib.rnn.LayerNormBasicLSTMCell()`](http://devdocs.io/tensorflow~python/tf/contrib/rnn/layernormbasiclstmcell)


### RNN 구성

- [`tf.nn.dynamic_rnn()`](http://devdocs.io/tensorflow~python/tf/nn/dynamic_rnn)
```
    cell         = tf.contrib.rnn.BasicRNNCell(
                    num_hidden_units)
    last, states = tf.nn.dynamic_rnn(
                        cell, 
                        inputs, 
                        sequence_length=sequence_length,
                        dtype=tf.float32)
```


### 실행중 NaN, Inf 등이 발생하면?

1. [`tf.check_numerics()`](http://devdocs.io/tensorflow~python/tf/check_numerics)로 문제가 되는 컴퍼넌트 파악
1. learning_rate 를 줄여본다
1. 적당한 값으로 gradient clipping 해 본다
1. 사용한 컴퍼넌트에 smoothing 할 수 있는 파라메터가 있는지 확인하고 적용해 본다

### gradient clipping?

- [`tf.clip_by_global_norm`](http://devdocs.io/tensorflow~python/tf/clip_by_global_norm)
```
    clip_by_global_norm(
        t_list,
        clip_norm,
        use_norm=None,
        name=None
    )
```

- [`tf.gradients`](http://devdocs.io/tensorflow~python/tf/gradients)
```
    gradients(
        ys,
        xs,
        grad_ys=None,
        name='gradients',
        colocate_gradients_with_ops=False,
        gate_gradients=False,
        aggregation_method=None
    )
```

- [`optimizer.apply_gradients`](http://devdocs.io/tensorflow~python/tf/train/optimizer#apply_gradients)
```
    apply_gradients(
        grads_and_vars,
        global_step=None,
        name=None
    )
```

- 사용예:
```
    tvars_     = tf.trainable_variables()
    grads_, _  = tf.clip_by_global_norm(
                    tf.gradients(loss,tvars_),
                    5.0)
    optimize   = tf.train.AdamOptimizer(
                    learning_rate=learning_rate) \
                    .apply_gradients(zip(grads_, tvars_))
```
