### 모두를 위한 머신러닝/딥러닝 강의
김성훈 교수님의 모두를 위한 머신러닝/딥러닝 강의 중 lab 강의 코드입니다.
## Lab12_1 RNN basics
본 자료에서는 학습은 다루지않고 tensorflow에서 rnn 계열을 다루기위한 함수들의 input, output의 형태와 forward 과정에서 어떤 함수를 써야하는 지를 다룹니다.  

참고 : http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf  
*(RNN의 case : one to one, one to man, many to one, many to many 등은 위의 pdf를 참고)*  


In [1]:
# http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/
# http://learningtensorflow.com/index.html
# http://suriyadeepan.github.io/2016-12-31-practical-seq2seq/
import tensorflow as tf
import numpy as np 
import pprint
from tensorflow.contrib import rnn
tf.set_random_seed(777)
sess = tf.InteractiveSession()

### One to One : RNN

아래의 코드는 다음과 같은 경우이다. Rank = 3 짜리의 array 또는 list로 input과 output을 다룬다.
![Alt text](http://i.imgur.com/PiVFGpy.png)

In [2]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [3]:
with tf.variable_scope('basic_rnn_one_cell'):
# 실제 RNN의 경우 cell state라고는 통칭하지않으나 LSTM의 cell state와 hidden state의 짝을 맞춰주기위해 cell state라고 명명
# One cell Rnn input_dim (4) -> output_dim (2) 여기서는 RNN의 경우 cell state의 차원 수

    hidden_size = 2
    cell = tf.contrib.rnn.BasicRNNCell(num_units = hidden_size)
    print(cell.state_size, cell.output_size)
    
    # cell에 state와 output 두 개가 있는 이유는 state는 다시 다음 state에 건네주기위함이고 output은 실제 출력 y를 계산하기위한 vector
    # RNN의 경우 state와 output이 같다. (둘다 한 번 forward 된 hidden node의 값)
    
    x_data = np.array([[h]], dtype = np.float32) # x_data = [[[1,0,0,0]]]
    print(x_data, x_data.shape)
    
    # forward
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_data, dtype = tf.float32)
    sess.run(tf.global_variables_initializer())
    print('cell states', states.eval())
    print('outputs', outputs.eval())

2 2
[[[ 1.  0.  0.  0.]]] (1, 1, 4)
cell states [[-0.54783535  0.16807944]]
outputs [[[-0.54783535  0.16807944]]]


### One to One : LSTM
RNN과는 달리 cell state만 나오는 것이 아니라 hidden state라는 것이 존재하며, cell state는 말 그대로 다음 cell에 전달되는 state이고 hidden state는 cell안에서 동작할 때, 처음에 input vector와 concatenate가 되어 연산이 일어나는 vector이며 실제 어떤 출력 y를 예측할 때 사용 되는 vector이다. 전체적인 구조는 아래의 그림과 같다.  
  
참고 : http://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Alt text](http://i.imgur.com/ddP1mwL.png)

In [4]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [5]:
with tf.variable_scope('basic_lstm_one_cell'):
# One cell lstm input_dim (4) -> output_dim (2)

    hidden_size = 2
    cell = tf.contrib.rnn.BasicLSTMCell(num_units = hidden_size, forget_bias = 0.9, state_is_tuple = True)
    print(cell.state_size, cell.output_size, cell.zero_state) 
    # lstm의 경우 cell state와 hidden state (output)가 cell에서 산출된다.
    
    x_data = np.array([[h]], dtype = np.float32) # x_data = [[[1,0,0,0]]]
    print(x_data, x_data.shape)
    
    # forward
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_data, dtype = tf.float32)
    sess.run(tf.global_variables_initializer())
    cell_hidden_states = sess.run(states)
    
    print('states', cell_hidden_states) 
    # c가 cell state, h가 hidden state (output)
    # cell state의 경우 sequence의 마지막에서 다음의 무언가에 전달하는 값
    # hidden state (output)의 경우 sequence의 마지막에서 산출되는 값
    print('outputs', outputs.eval())  

LSTMStateTuple(c=2, h=2) 2 <bound method _RNNCell.zero_state of <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x0000016933F48A58>>
[[[ 1.  0.  0.  0.]]] (1, 1, 4)
states LSTMStateTuple(c=array([[ 0.11595624,  0.11369593]], dtype=float32), h=array([[ 0.05820915,  0.05193908]], dtype=float32))
outputs [[[ 0.05820915  0.05193908]]]


### Unfolding n sequence : RNN
아래의 코드는 다음과 같은 경우이며 sequence_length = 5로 sequence가 주어지면 cell state의 값들도 sequence의 형태로 산출한다.
![Alt text](http://i.imgur.com/YSQFFUo.png)

In [6]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [7]:
with tf.variable_scope('sequence_case_RNN'):
    # One cell RNN input_dim (4) -> output_dim (2). sequence: 5
    
    hidden_size = 2
    cell = tf.contrib.rnn.BasicRNNCell(num_units = hidden_size)
    print(cell.state_size, cell.output_size)
    
    x_sequence = np.array([[h, e, l, l ,o]], dtype = np.float32)
    print(x_sequence, x_sequence.shape)
    
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_sequence, dtype = tf.float32)
    sess.run(tf.global_variables_initializer())
    print('cell states', states.eval())
    # cell states의 경우 최종 sequence에서 다음의 무언가에 전달하는 값!
    print('outputs', outputs.eval())

2 2
[[[ 1.  0.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  0.  1.]]] (1, 5, 4)
cell states [[ 0.07925642 -0.03518274]]
outputs [[[-0.6945461  -0.62356317]
  [-0.09338739 -0.13748398]
  [ 0.4579778   0.25762054]
  [ 0.47230816  0.38231286]
  [ 0.07925642 -0.03518274]]]


### Unfolding n sequence : LSTM

In [8]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [9]:
with tf.variable_scope('sequence_case_LSTM'):

    hidden_size = 2
    cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size, forget_bias = 0.9, state_is_tuple = True)
    print(cell.state_size, cell.output_size, cell.zero_state)
    
    x_sequence = np.array([[h, e, l, l ,o]], dtype = np.float32)
    print(x_sequence, x_sequence.shape)
    
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_sequence, dtype = tf.float32)
    sess.run(tf.global_variables_initializer())
    cell_hidden_states = sess.run(states)
    
    print('states', cell_hidden_states)
    # cell state의 경우 sequence의 마지막에서 다음의 무언가에 전달하는 값
    # hidden state (output)의 경우 sequence의 마지막에서 산출되는 값
    print('outputs', outputs.eval())

LSTMStateTuple(c=2, h=2) 2 <bound method _RNNCell.zero_state of <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x0000016934667AC8>>
[[[ 1.  0.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  0.  1.]]] (1, 5, 4)
states LSTMStateTuple(c=array([[-0.21913829,  0.46065116]], dtype=float32), h=array([[-0.0914715 ,  0.14865206]], dtype=float32))
outputs [[[ 0.00469851 -0.04158599]
  [-0.03217997 -0.1067911 ]
  [-0.04807003  0.05999256]
  [-0.03346811  0.1466434 ]
  [-0.0914715   0.14865206]]]


### Unfolding n sequence with batch input  : RNN
![Alt text](http://i.imgur.com/WJX8EAU.png)

In [10]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [11]:
with tf.variable_scope('3_batches_RNN'):
    # One cell RNN input_dim (4) -> output_dim (2). sequence: 5, batch 3
    # 3 batches 'hello', 'eolll', 'lleel'
    
    x_batch = np.array([[h, e, l, l, o], [e, o ,l, l, l],[l, l, e, e, l]], dtype = np.float32)
    print(x_batch, x_batch.shape)
    
    hidden_size = 2
    cell = tf.contrib.rnn.BasicRNNCell(num_units = hidden_size)
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_batch, dtype = tf.float32)
    sess.run(tf.global_variables_initializer())
    
    print('cell states', states.eval())
    # 3개의 batch이므로 cell states의 값 또는 벡터가 3개 나옴
    print('outputs', outputs.eval())

[[[ 1.  0.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  0.  1.]]

 [[ 0.  1.  0.  0.]
  [ 0.  0.  0.  1.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]]

 [[ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]]] (3, 5, 4)
cell states [[-0.54370898  0.51788414]
 [ 0.19359617  0.46263412]
 [ 0.10291118  0.60138834]]
outputs [[[ 0.63310111  0.44483227]
  [ 0.34729737  0.81523305]
  [ 0.1166632   0.5816775 ]
  [ 0.18876496  0.47139043]
  [-0.54370898  0.51788414]]

 [[ 0.5116123   0.6163646 ]
  [-0.61230969  0.65871525]
  [ 0.40001139  0.01559334]
  [ 0.10085381  0.61242771]
  [ 0.19359617  0.46263412]]

 [[ 0.22498091  0.41476288]
  [ 0.15538515  0.52840447]
  [ 0.47365046  0.67328405]
  [ 0.39103985  0.77419388]
  [ 0.10291118  0.60138834]]]


### Unfolding n sequence with batch input  : LSTM

In [12]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [13]:
with tf.variable_scope('3_batches_LSTM'):
    # One cell RNN input_dim (4) -> output_dim (2). sequence: 5, batch 3
    # 3 batches 'hello', 'eolll', 'lleel'
    x_batch = np.array([[h, e, l, l, o], [e, o ,l, l, l],[l, l, e, e, l]], dtype = np.float32)
    print(x_batch, x_batch.shape)
    
    hidden_size = 2
    cell = tf.contrib.rnn.BasicLSTMCell(num_units = hidden_size, forget_bias = 0.9, state_is_tuple = True)
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_batch, dtype = tf.float32)
 
    sess.run(tf.global_variables_initializer())
    cell_hidden_states = sess.run(states)
    
    print('states', cell_hidden_states)
    print('outputs', outputs.eval())
    # cell state의 경우 sequence의 마지막에서 다음의 무언가에 전달하는 값
    # hidden state (output)의 경우 sequence의 마지막에서 산출되는 값
    # batch (sequence 3개의 example이므로) cell state, hidden state의 값 또는 벡터가 3개씩 나옴

[[[ 1.  0.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  0.  1.]]

 [[ 0.  1.  0.  0.]
  [ 0.  0.  0.  1.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]]

 [[ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]]] (3, 5, 4)
states LSTMStateTuple(c=array([[-0.15954085,  0.04039727],
       [-0.11730883, -0.14731237],
       [-0.04427465, -0.24958855]], dtype=float32), h=array([[-0.07963151,  0.02576681],
       [-0.04642324, -0.05068645],
       [-0.01768298, -0.08971673]], dtype=float32))
outputs [[[ 0.09004864  0.0528918 ]
  [ 0.13017456 -0.05151128]
  [ 0.04223709 -0.06684009]
  [ 0.01344103 -0.07762972]
  [-0.07963151  0.02576681]]

 [[ 0.03238416 -0.07657412]
  [-0.07404277  0.05995721]
  [-0.04630205 -0.0083233 ]
  [-0.04481458 -0.03480339]
  [-0.04642324 -0.05068645]]

 [[-0.01021079 -0.03488366]
  [-0.02118519 -0.05446798]
  [ 0.00350233 -0.13476616]
  [ 0.02098042 -0.16105732]
  [-0.0176

In [14]:
with tf.variable_scope('3_batches_dynamic_length_LSTM') as scope:
    # One cell RNN input_dim (4) -> output_dim (5). sequence: 5, batch 3
    # 3 batches 'hello', 'eolll', 'lleel'
    x_data = np.array([[h, e, l, l, o],
                       [e, o, l, l, l],
                       [l, l, e, e, l]], dtype=np.float32)
    print(x_data, x_data.shape)
    
    hidden_size = 2
    cell = rnn.BasicLSTMCell(num_units=hidden_size, state_is_tuple=True)
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_data, sequence_length=[5,3,4], dtype=tf.float32)
    # sequence_length arguments로 각각의 sequence마다 output의 결과를 필요한 부분만 뽑아낼 수 있다. (sequence의 앞부터)
    # sequence_length arguments로 최종적으로 output이 나올 sequence의 끝단을 결정하면, cell states의 경우는 그 끝단에서 다음으로 전달하는 
    # 값 또는 벡터를 산출할 수 있다.
    
    sess.run(tf.global_variables_initializer())
    cell_hidden_states = sess.run(states)

    print('states', cell_hidden_states)
    print('outputs', outputs.eval())
    # cell state의 경우 sequence의 마지막에서 다음의 무언가에 전달하는 값
    # hidden state (output)의 경우 sequence의 마지막에서 산출되는 값
    # batch (sequence 3개의 example이므로) cell state, hidden state의 값 또는 벡터가 3개씩 나옴

[[[ 1.  0.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  0.  1.]]

 [[ 0.  1.  0.  0.]
  [ 0.  0.  0.  1.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]]

 [[ 0.  0.  1.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]]] (3, 5, 4)
states LSTMStateTuple(c=array([[-0.11567314,  0.15889417],
       [ 0.02387191,  0.1483482 ],
       [ 0.28869146,  0.42212284]], dtype=float32), h=array([[-0.06705084,  0.05426852],
       [ 0.00918842,  0.06765675],
       [ 0.10600475,  0.21914782]], dtype=float32))
outputs [[[-0.00191582 -0.08889396]
  [ 0.0662165  -0.02565779]
  [ 0.07728618  0.08835547]
  [ 0.07620183  0.15945849]
  [-0.06705084  0.05426852]]

 [[ 0.06170494  0.04657923]
  [-0.07225833 -0.01762054]
  [ 0.00918842  0.06765675]
  [ 0.          0.        ]
  [ 0.          0.        ]]

 [[ 0.03495911  0.09325722]
  [ 0.05101828  0.15713318]
  [ 0.08525225  0.19951533]
  [ 0.10600475  0.21914782]
  [ 0.    

### Declare the initial state (RNN, LSTM)
#### RNN

In [15]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [16]:
with tf.variable_scope('initial_state_RNN'):
    batch_size = 3
    hidden_size = 2
    x_batch = np.array([[h, e, l, l, o],
                      [e, o, l, l, l],
                      [l, l, e, e, l]], dtype=np.float32)
    
    cell = tf.contrib.rnn.BasicRNNCell(num_units = hidden_size)
    print(cell.state_size, cell.output_size)
    initial_state = cell.zero_state(batch_size = batch_size, dtype = tf.float32)
    print('initial_state\n', initial_state.eval())
    
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_batch, initial_state = initial_state, dtype = tf.float32)
    sess.run(tf.global_variables_initializer())

    print('cell states', states.eval())
    print('outputs', outputs.eval())

2 2
initial_state
 [[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
cell states [[ 0.59695184 -0.79401386]
 [-0.40585628  0.20832153]
 [-0.4432646   0.24303824]]
outputs [[[ 0.11470354 -0.511262  ]
  [-0.56179035  0.36061054]
  [-0.57698447  0.04971569]
  [-0.47768033  0.18816011]
  [ 0.59695184 -0.79401386]]

 [[-0.68042958  0.09627182]
  [ 0.64991659 -0.80446416]
  [-0.40964082  0.77316105]
  [-0.70218945 -0.08972695]
  [-0.40585628  0.20832153]]

 [[-0.55963206  0.39608955]
  [-0.58745575  0.03384132]
  [-0.60960853 -0.12756902]
  [-0.56036544 -0.06023395]
  [-0.4432646   0.24303824]]]


#### LSTM 

In [17]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [18]:
with tf.variable_scope('initial_state_LSTM'):
    batch_size = 3
    hidden_size = 2
    x_batch = np.array([[h, e, l, l, o],
                      [e, o, l, l, l],
                      [l, l, e, e, l]], dtype=np.float32)
    
    cell = tf.contrib.rnn.BasicLSTMCell(num_units = hidden_size, forget_bias = 0.9, state_is_tuple = True)
    print(cell.state_size, cell.output_size)
    initial_state = cell.zero_state(batch_size = batch_size, dtype = tf.float32)
    print('initial_state\n', sess.run(initial_state))
    
    outputs, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_batch, initial_state = initial_state, dtype = tf.float32)
    sess.run(tf.global_variables_initializer())
    cell_hidden_states = sess.run(states)
    
    print('states', cell_hidden_states)
    print('outputs', outputs.eval())

LSTMStateTuple(c=2, h=2) 2
initial_state
 LSTMStateTuple(c=array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]], dtype=float32), h=array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]], dtype=float32))
states LSTMStateTuple(c=array([[ 0.08994563,  0.4443832 ],
       [ 0.38668162,  0.60645157],
       [ 0.32665431,  0.26637641]], dtype=float32), h=array([[ 0.06036732,  0.15712595],
       [ 0.20308964,  0.31945127],
       [ 0.16781646,  0.16902837]], dtype=float32))
outputs [[[ 0.08085226  0.00354403]
  [ 0.02588757 -0.11362526]
  [ 0.13710369  0.16714595]
  [ 0.21389508  0.25678951]
  [ 0.06036732  0.15712595]]

 [[-0.05625843 -0.10844176]
  [-0.1963553   0.03598393]
  [ 0.00137143  0.25870115]
  [ 0.12408894  0.30490223]
  [ 0.20308964  0.31945127]]

 [[ 0.12128529  0.21983363]
  [ 0.20290466  0.2785309 ]
  [ 0.14494732  0.03862848]
  [ 0.0746925  -0.09567902]
  [ 0.16781646  0.16902837]]]


### Deep RNN & LSTM
One to One case로 구조를 알아본다. RNN 계열에서 cell을 stack하기위해서는 tf.contrib.rnn.MultiRNNCell을 활용한다. (이 때 cell은 미리 정해주어야한다.)

#### RNN 

In [19]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [20]:
with tf.variable_scope('2_layer_RNN_MultiRNNCell'):
    
    x_data = np.array([[h]], dtype = np.float32)
    print(x_data, x_data.shape)

    # Make rnn
    hidden_size = 2
    cell = tf.contrib.rnn.BasicRNNCell(num_units = hidden_size)
    cell = tf.contrib.rnn.MultiRNNCell(cells = [cell] * 2, state_is_tuple = True) # 2layer
    
    # RNN in/out
    output, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_data, dtype = tf.float32)
    print(output)

    sess.run(tf.global_variables_initializer())
    cell_states = sess.run(states)
    
    print('state', cell_states) # layer를 2개로 stack 했으므로 cell state vector가 두 개 나온다.
    print('output', output.eval()) 
    # input이 sequence가 아니라 하나의 token이므로 output이 하나온다.RNN이므로 마지막 layer의 cell state와 같다.

[[[ 1.  0.  0.  0.]]] (1, 1, 4)
Tensor("2_layer_RNN_MultiRNNCell/rnn/transpose:0", shape=(1, 1, 2), dtype=float32)
state (array([[-0.62157214,  0.43190563]], dtype=float32), array([[-0.05544733, -0.17107217]], dtype=float32))
output [[[-0.05544733 -0.17107217]]]


#### LSTM

In [21]:
# One hot encoding for each character in 'hello'
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]

In [22]:
with tf.variable_scope('2_layer_LSTM_MultiRNNCell'):
    
    x_data = np.array([[h]], dtype = np.float32)
    print(x_data, x_data.shape)

    # Make rnn
    hidden_size = 2
    cell = tf.contrib.rnn.BasicLSTMCell(num_units = hidden_size, forget_bias = 0.9, state_is_tuple = True)
    cell = tf.contrib.rnn.MultiRNNCell(cells = [cell] * 2, state_is_tuple = True) # 2layer
    
    # RNN in/out
    output, states = tf.nn.dynamic_rnn(cell = cell, inputs = x_data, dtype = tf.float32)
    print(output)
    
    sess.run(tf.global_variables_initializer())
    cell_states = sess.run(states)
    
    print('state', cell_states) # layer를 2개로 stack 했으므로 cell state vector가 두 개 나온다.
    print('output', output.eval()) 
    # input이 sequence가 아니라 하나의 token이므로 output이 하나온다.RNN이므로 마지막 layer의 cell state와 같다.

[[[ 1.  0.  0.  0.]]] (1, 1, 4)
Tensor("2_layer_LSTM_MultiRNNCell/rnn/transpose:0", shape=(1, 1, 2), dtype=float32)
state (LSTMStateTuple(c=array([[-0.25200501, -0.02415623]], dtype=float32), h=array([[-0.14680254, -0.01400189]], dtype=float32)), LSTMStateTuple(c=array([[ 0.0541461 ,  0.00878605]], dtype=float32), h=array([[ 0.02738931,  0.00416418]], dtype=float32)))
output [[[ 0.02738931  0.00416418]]]


### Simple bi-directional LSTM

#### Generate bi-directional LSTM

In [23]:
# Create input data
batch_size=3
sequence_length=5
input_dim=3

x_data = np.arange(45, dtype=np.float32).reshape(batch_size, sequence_length, input_dim)
print(x_data, x_data.shape)  # batch, sequence_length, input_dim

[[[  0.   1.   2.]
  [  3.   4.   5.]
  [  6.   7.   8.]
  [  9.  10.  11.]
  [ 12.  13.  14.]]

 [[ 15.  16.  17.]
  [ 18.  19.  20.]
  [ 21.  22.  23.]
  [ 24.  25.  26.]
  [ 27.  28.  29.]]

 [[ 30.  31.  32.]
  [ 33.  34.  35.]
  [ 36.  37.  38.]
  [ 39.  40.  41.]
  [ 42.  43.  44.]]] (3, 5, 3)


In [24]:
with tf.variable_scope('simple_bi_directional_LSTM'):
    
    # bi-directional LSTM
    hidden_size = 5
    cell_fw = tf.contrib.rnn.BasicLSTMCell(num_units = hidden_size, forget_bias = 0.9, state_is_tuple = True)
    cell_bw = tf.contrib.rnn.BasicLSTMCell(num_units = hidden_size, forget_bias = 0.9, state_is_tuple = True)

    # bidirectional_dynamic_rnn의 sequence_length option은 필수 arguments, not optional
    outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw = cell_fw, cell_bw = cell_bw, inputs = x_data, dtype = tf.float32,
                                                       sequence_length = [5,5,5])

    sess.run(tf.global_variables_initializer())

    #outputs, cell states, hidden states가 각 배치마다 2개씩 나오는 데, 이는 encoder part, decoder part에 대해서 나오는 것이기 때문이다.
    print('states', sess.run(states))
    print('outputs', sess.run(outputs)) 
    

states (LSTMStateTuple(c=array([[ -2.00803924e+00,   2.85866690e+00,  -8.47373784e-01,
          6.91460371e-01,  -2.66604352e+00],
       [ -1.50609565e+00,   4.98581409e+00,  -3.13502932e+00,
          5.04362509e-02,  -4.71399879e+00],
       [ -6.31192327e-01,   4.99980068e+00,  -4.64451981e+00,
          1.29466981e-03,  -4.95562410e+00]], dtype=float32), h=array([[ -7.07124591e-01,   3.51942599e-01,  -6.85873032e-01,
          2.02799379e-03,  -5.56596089e-03],
       [ -8.53566408e-01,   2.84076363e-01,  -9.96212840e-01,
          3.62268167e-07,  -2.68183230e-05],
       [ -5.52456558e-01,   2.61948526e-01,  -9.99815166e-01,
          1.73134840e-11,  -1.67255720e-07]], dtype=float32)), LSTMStateTuple(c=array([[  5.55543676e-02,   4.79968309e-01,   3.62226814e-01,
         -2.68499196e-01,   1.58703709e+00],
       [  1.20917186e-01,   5.06061554e-01,   2.34250474e-04,
         -4.67401318e-04,   2.35634232e+00],
       [  2.71322392e-02,   5.51785767e-01,   7.30238057e-08,
   

In [25]:
# flattern based softmax
hidden_size=3
sequence_length=5
batch_size=3
num_classes=5

x_data = x_data.reshape(-1, hidden_size)
print(x_data)

softmax_w = np.arange(15, dtype=np.float32).reshape(hidden_size, num_classes)
outputs = np.matmul(x_data, softmax_w)
outputs = outputs.reshape(-1, sequence_length, num_classes) # batch, seq, class
print(outputs)

[[  0.   1.   2.]
 [  3.   4.   5.]
 [  6.   7.   8.]
 [  9.  10.  11.]
 [ 12.  13.  14.]
 [ 15.  16.  17.]
 [ 18.  19.  20.]
 [ 21.  22.  23.]
 [ 24.  25.  26.]
 [ 27.  28.  29.]
 [ 30.  31.  32.]
 [ 33.  34.  35.]
 [ 36.  37.  38.]
 [ 39.  40.  41.]
 [ 42.  43.  44.]]
[[[   25.    28.    31.    34.    37.]
  [   70.    82.    94.   106.   118.]
  [  115.   136.   157.   178.   199.]
  [  160.   190.   220.   250.   280.]
  [  205.   244.   283.   322.   361.]]

 [[  250.   298.   346.   394.   442.]
  [  295.   352.   409.   466.   523.]
  [  340.   406.   472.   538.   604.]
  [  385.   460.   535.   610.   685.]
  [  430.   514.   598.   682.   766.]]

 [[  475.   568.   661.   754.   847.]
  [  520.   622.   724.   826.   928.]
  [  565.   676.   787.   898.  1009.]
  [  610.   730.   850.   970.  1090.]
  [  655.   784.   913.  1042.  1171.]]]


###  calculate seq2seq loss simple example

In [26]:
# [batch_size, sequence_length]
y_data = tf.constant([[1, 1, 1]])

# [batch_size, sequence_length, emb_dim ]
prediction = tf.constant([[[0.2, 0.7], [0.6, 0.2], [0.2, 0.9]]], dtype=tf.float32)

# [batch_size * sequence_length]
weights = tf.constant([[1, 1, 1]], dtype=tf.float32)

sequence_loss = tf.contrib.seq2seq.sequence_loss(logits=prediction, targets=y_data, weights=weights)
sess.run(tf.global_variables_initializer())
print("Loss: ", sequence_loss.eval())

Loss:  0.596759


In [27]:
# [batch_size, sequence_length]
y_data = tf.constant([[1, 1, 1]])

# [batch_size, sequence_length, emb_dim ]
prediction1 = tf.constant([[[0.3, 0.7], [0.3, 0.7], [0.3, 0.7]]], dtype=tf.float32)
print(prediction1.shape)
prediction2 = tf.constant([[[0.1, 0.9], [0.1, 0.9], [0.1, 0.9]]], dtype=tf.float32)

prediction3 = tf.constant([[[0, 1], [0, 1], [0, 1]]], dtype=tf.float32)
prediction4 = tf.constant([[[0, 1], [1, 0], [0, 1]]], dtype=tf.float32)

# [batch_size * sequence_length]
weights = tf.constant([[1, 1, 1]], dtype=tf.float32)  # weigths 해당하는 sequence의 단위별 loss에 대한 가중치이다.

sequence_loss1 = tf.contrib.seq2seq.sequence_loss(logits = prediction1, targets = y_data, weights = weights)
sequence_loss2 = tf.contrib.seq2seq.sequence_loss(logits = prediction2, targets = y_data, weights = weights)
sequence_loss3 = tf.contrib.seq2seq.sequence_loss(logits = prediction3, targets = y_data, weights = weights)
sequence_loss4 = tf.contrib.seq2seq.sequence_loss(logits = prediction4, targets = y_data, weights = weights)

sess.run(tf.global_variables_initializer())
print("Loss1: ", sequence_loss1.eval(),
      "Loss2: ", sequence_loss2.eval(),
      "Loss3: ", sequence_loss3.eval(),
      "Loss4: ", sequence_loss4.eval())

(1, 3, 2)
Loss1:  0.513015 Loss2:  0.371101 Loss3:  0.313262 Loss4:  0.646595
