We will take a look how tensorflow basic RNN works!

In [1]:
import numpy as np
from IPython.display import Image

From below example, pos tag can be different for same word according to sequence of words.   

I work at google => (pronoun) (verb) (preposition) (noun)  
I google at work => (pronoun) (verb) (preposition) (noun)

RNN is the neural network which takes previous state and current input to output current state. Therefore RNN is the greate candidate for our example.    

Below diagram shows how RNN output pos tagging for "I work at google".

In [2]:
Image(url= "https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/iworkatgoogle.png", width=500, height=250)

Below diagram shows how RNN output pos tagging for "I google at work".

In [3]:
Image(url= "https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/igoogleatwork.png", width=500, height=250)

Vanilla RNN architecture is basically like below diagram and we will take a look tensorflow BasicRNNCell for clear understanding of below diagram.

In [4]:
Image(url= "https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/rnn_simple_diagram.png", width=500, height=250)

## What is difference between output and state?

from our diagram, y is the output and arrows going to next RNN cell is the state or a.k.a hidden state. if you print state, you will get the last hidden state value which is the most right arrow from diagram.  

if you have just one cell in your RNN, the output and state have same value. Why? because you can see there are two lines outgoing from tanh from each cell. these two lines have same value.  

In [5]:
Image(url= "https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/rnn_single.png", width=500, height=250)

above diagram also shows how many weights and bias are exist when input shape is [1,2], and the rnn cell shape is [1,3].  
in order to have [1,3] from input and **W**xh's matrix multiplication, since input is [1,2], **W**xh must have [2,3]  
in order to have [1,3] from previous state and **W**hh's matrix multiplication, since previous state is [1,3], **W**hh must have [3,3]

In [6]:
import tensorflow as tf


In [7]:
inputs = np.array([
    [ [1,2] ]
])

In [8]:
tf.reset_default_graph()
tf.set_random_seed(777)
tf_inputs = tf.constant(inputs, dtype=tf.float32)
rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=3)
outputs, state = tf.nn.dynamic_rnn(cell=rnn_cell, dtype=tf.float32, inputs=tf_inputs)
variables_names =[v.name for v in tf.trainable_variables()]

print(outputs)
print(state)
print("weights")
for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES):
    print(v)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    output_run, state_run = sess.run([outputs, state])
    print("output values")
    print(output_run)
    print("\nstate value")
    print(state_run)
    print("weights")
    values = sess.run(variables_names)
    for k,v in zip(variables_names, values):
        print(k, v)

Tensor("rnn/transpose_1:0", shape=(1, 1, 3), dtype=float32)
Tensor("rnn/while/Exit_3:0", shape=(1, 3), dtype=float32)
weights
<tf.Variable 'rnn/basic_rnn_cell/kernel:0' shape=(5, 3) dtype=float32_ref>
<tf.Variable 'rnn/basic_rnn_cell/bias:0' shape=(3,) dtype=float32_ref>
output values
[[[-0.9314169   0.75578666 -0.6819246 ]]]

state value
[[-0.9314169   0.75578666 -0.6819246 ]]
weights
rnn/basic_rnn_cell/kernel:0 [[-0.62831575  0.38538355  0.79733914]
 [-0.5203329   0.30046564 -0.8150209 ]
 [ 0.39399797  0.16670114  0.4062907 ]
 [-0.6391754   0.8460203   0.5266966 ]
 [ 0.41124135  0.66347724 -0.0210759 ]]
rnn/basic_rnn_cell/bias:0 [0. 0. 0.]


# Practice rnn cell with sentence
here we practice with our example "I work at google" and "I google at work". each word represented with one hot encoding.

In [9]:
# I      [1,0,0,0]
# work   [0,1,0,0]
# at     [0,0,1,0]
# google [0,0,0,1]
#
# I work at google =  [ [1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1] ]
# I google at work =  [ [1,0,0,0], [0,0,0,1], [0,0,1,0], [0,1,0,0] ]

inputs = np.array([
    [ [1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1] ],
    [ [1,0,0,0], [0,0,0,1], [0,0,1,0], [0,1,0,0] ]
])

# RNN
By running below code block, you can see every word's output are different except for first word. This is because current output is generated not only from input but also previous state. This is reason why RNN can differentiate same word into different pos tag using the word sequence in a sentence.

In [10]:
tf.reset_default_graph()
tf.set_random_seed(777)
tf_inputs = tf.constant(inputs, dtype=tf.float32)
rnn_cell = tf.contrib.rnn.BasicRNNCell(num_units=3)
outputs, state = tf.nn.dynamic_rnn(cell=rnn_cell, dtype=tf.float32, inputs=tf_inputs)
variables_names =[v.name for v in tf.trainable_variables()]

print(outputs)
print(state)
print("weights")
for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES):
    print(v)
        
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    output_run, state_run = sess.run([outputs, state])
    print("output values")
    print(output_run)
    print("\nstate value")
    print(state_run)
    print("weights")
    values = sess.run(variables_names)
    for k,v in zip(variables_names, values):
        print(k, v)

Tensor("rnn/transpose_1:0", shape=(2, 4, 3), dtype=float32)
Tensor("rnn/while/Exit_3:0", shape=(2, 3), dtype=float32)
weights
<tf.Variable 'rnn/basic_rnn_cell/kernel:0' shape=(7, 3) dtype=float32_ref>
<tf.Variable 'rnn/basic_rnn_cell/bias:0' shape=(3,) dtype=float32_ref>
output values
[[[-0.50944704  0.33166462  0.6126557 ]
  [-0.20793891  0.24406303 -0.75278705]
  [-0.06346128 -0.52844936  0.68356085]
  [-0.36491966  0.8857268  -0.02324398]]

 [[-0.50944704  0.33166462  0.6126557 ]
  [-0.30707452  0.62735885  0.21719742]
  [ 0.5043804  -0.14038289  0.3744523 ]
  [-0.11641283  0.70696247 -0.7512605 ]]]

state value
[[-0.36491966  0.8857268  -0.02324398]
 [-0.11641283  0.70696247 -0.7512605 ]]
weights
rnn/basic_rnn_cell/kernel:0 [[-0.56198275  0.34469748  0.7131618 ]
 [-0.4653999   0.2687447  -0.7289769 ]
 [ 0.35240245  0.14910203  0.36339748]
 [-0.57169586  0.7567036   0.47109187]
 [ 0.3678255   0.5934322  -0.01885086]
 [ 0.31208777 -0.40880746  0.22867584]
 [ 0.5521256   0.682691   -0

When input was "I work at google", "work" output was [-0.20793891  0.24406303 -0.75278705]  
When input was "I google at work" "work" outpu was [-0.11641283  0.70696247 -0.7512605 ]  
Also you can see state is exactly same with last output value.