## Comparison of English character recognition performance among RNN(Vanilla), RNN(LSTM), and RNN(GRU).
                                                                                        Hyungwon Yang
                                                                                             04.19.17
                                                                                            EMCS Labs
### Task
Tensorflow에서 제공하는 기본적인 RNN방식과 LSTM cell, 그리고 GRU cell을 적용한 RNN방식 총 3가지 모델의 성능을 비교한다.
- 영어 character 단위의 데이터셋을 이용하여 훈련한 뒤, 훈련에 사용하지 않은 테스트 셋으로 결과를 추출하여 세 모델의 성능을 비교한다.

### Training Corpus
- Project Gutenberg's The Divine Comedy, Complete, by Dante Alighieri
This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org
The part of the corpus was extracted for training.

### Experimental Setting.
- Python 3.5.3
- Tnesorflow 1.0.0
- Mac OSX sierra 10.12.4

### Data Preprocessing.
- 이전 report에서 보고하였던 것으로 갈음한다.

### RNN(Vanilla) Training
- Epoch는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
- 설정값
 1. 훈련에 사용된 데이터: 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
 2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
 3. 훈련에 사용되는 데이터중 20%를 validation 셋으로 구성하였다. (1,700개) 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
 4. parameters: epoch: 200, 1 hidden layer and its size: [50, 100, 200], learning rate: 0.001, cost function: adam

In [1]:
import sys
# HY_python_NN absolute directory.
my_absdir = "/Users/hyungwonyang/Google_Drive/Python/HY_python_NN"
sys.path.append(my_absdir)

import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
rnn_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = rnn_data['train_input']
train_output = rnn_data['train_output']
test_input = rnn_data['test_input']
test_output = rnn_data['test_output']

In [2]:
# parameters
problem = 'classification' # classification, regression
rnnCell = 'rnn' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

rnn_values = set.simpleRNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

In [3]:
# Setting hidden layers: weightMatrix and biasMatrix
rnn_weightMatrix = rnn_values.genWeight()
rnn_biasMatrix = rnn_values.genBias()
rnn_input_x,rnn_input_y = rnn_values.genSymbol()


In [4]:
rnn_net = net.simpleRNNModel(inputSymbol=rnn_input_x,
                               outputSymbol=rnn_input_y,
                               rnnCell=rnnCell,
                               problem=problem,
                               trainEpoch=trainEpoch,
                               learningRate=learningRate,
                               timeStep=timeStep,
                               batchSize=batchSize,
                               validationCheck=validationCheck,
                               weightMatrix=rnn_weightMatrix,
                               biasMatrix=rnn_biasMatrix)

# Generate a RNN(vanilla) network.
rnn_net.genRNN()

RNN cell type is rnn


In [5]:
# Train the RNN(vanilla) network.
# In this tutorial, we will run only 20 epochs.
rnn_net.trainRNN(train_input,train_output)

Epoch:   1 /  20, Cost : 2.976553, Validation Accuracy: 34.87%
Epoch:   2 /  20, Cost : 2.278241, Validation Accuracy: 36.48%
Epoch:   3 /  20, Cost : 2.201451, Validation Accuracy: 36.69%
Epoch:   4 /  20, Cost : 2.170949, Validation Accuracy: 36.81%
Epoch:   5 /  20, Cost : 2.154979, Validation Accuracy: 36.87%
Epoch:   6 /  20, Cost : 2.144995, Validation Accuracy: 36.76%
Epoch:   7 /  20, Cost : 2.138060, Validation Accuracy: 36.83%
Epoch:   8 /  20, Cost : 2.132922, Validation Accuracy: 36.82%
Epoch:   9 /  20, Cost : 2.128956, Validation Accuracy: 36.80%
Epoch:  10 /  20, Cost : 2.125770, Validation Accuracy: 36.81%
Epoch:  11 /  20, Cost : 2.123116, Validation Accuracy: 36.83%
Epoch:  12 /  20, Cost : 2.120850, Validation Accuracy: 36.80%
Epoch:  13 /  20, Cost : 2.118882, Validation Accuracy: 36.80%
Epoch:  14 /  20, Cost : 2.117151, Validation Accuracy: 36.81%
Epoch:  15 /  20, Cost : 2.115612, Validation Accuracy: 36.86%
Epoch:  16 /  20, Cost : 2.114227, Validation Accuracy:

In [6]:
# Test the trained RNN(vanilla) network.
rnn_net.testRNN(test_input,test_output)

Tested with 1650 datasets.
Test Accuracy: 37.52 %


In [7]:
# Save the trained parameters.
vars = rnn_net.getVariables()
# Terminate the session.
rnn_net.closeRNN()

Variable list as a dictionary format.
>> weight, bias, y_hat, optimizer, cost

Simple RNN training session is terminated.


### RNN(LSTM) Training
- Epoch는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
- 설정값
 1. 훈련에 사용된 데이터: 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
 2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
 3. 훈련에 사용되는 데이터중 20%를 validation 셋으로 구성하였다. (1,700개) 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
 4. parameters: epoch: 200, 1 hidden layer and its size: [50, 100, 200], learning rate: 0.001, cost function: adam

In [8]:
import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
lstm_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = lstm_data['train_input']
train_output = lstm_data['train_output']
test_input = lstm_data['test_input']
test_output = lstm_data['test_output']

In [9]:
# parameters
problem = 'classification' # classification, regression
rnnCell = 'lstm' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

lstm_values = set.simpleRNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

In [10]:
# Setting hidden layers: weightMatrix and biasMatrix
lstm_weightMatrix = lstm_values.genWeight()
lstm_biasMatrix = lstm_values.genBias()
lstm_input_x,lstm_input_y = lstm_values.genSymbol()

In [11]:
lstm_net = net.simpleRNNModel(inputSymbol=lstm_input_x,
                               outputSymbol=lstm_input_y,
                               rnnCell=rnnCell,
                               problem=problem,
                               trainEpoch=trainEpoch,
                               learningRate=learningRate,
                               timeStep=timeStep,
                               batchSize=batchSize,
                               validationCheck=validationCheck,
                               weightMatrix=lstm_weightMatrix,
                               biasMatrix=lstm_biasMatrix)

# Generate a RNN(lstm) network.
lstm_net.genRNN()

RNN cell type is lstm


In [12]:
# Train the RNN(lstm) network.
# In this tutorial, we will run only 20 epochs.
lstm_net.trainRNN(train_input,train_output)

Epoch:   1 /  20, Cost : 3.005780, Validation Accuracy: 27.94%
Epoch:   2 /  20, Cost : 2.485927, Validation Accuracy: 32.40%
Epoch:   3 /  20, Cost : 2.321439, Validation Accuracy: 34.59%
Epoch:   4 /  20, Cost : 2.244965, Validation Accuracy: 35.78%
Epoch:   5 /  20, Cost : 2.196952, Validation Accuracy: 36.44%
Epoch:   6 /  20, Cost : 2.157184, Validation Accuracy: 37.47%
Epoch:   7 /  20, Cost : 2.120310, Validation Accuracy: 38.21%
Epoch:   8 /  20, Cost : 2.086483, Validation Accuracy: 38.94%
Epoch:   9 /  20, Cost : 2.054665, Validation Accuracy: 39.56%
Epoch:  10 /  20, Cost : 2.025842, Validation Accuracy: 40.10%
Epoch:  11 /  20, Cost : 1.999812, Validation Accuracy: 40.67%
Epoch:  12 /  20, Cost : 1.975952, Validation Accuracy: 41.18%
Epoch:  13 /  20, Cost : 1.953950, Validation Accuracy: 41.75%
Epoch:  14 /  20, Cost : 1.933704, Validation Accuracy: 42.25%
Epoch:  15 /  20, Cost : 1.915146, Validation Accuracy: 42.62%
Epoch:  16 /  20, Cost : 1.897975, Validation Accuracy:

In [13]:
# Test the trained RNN(lstm) network.
lstm_net.testRNN(test_input,test_output)

Tested with 1650 datasets.
Test Accuracy: 45.88 %


In [14]:
# Save the trained parameters.
vars = lstm_net.getVariables()
# Terminate the session.
lstm_net.closeRNN()

Variable list as a dictionary format.
>> weight, bias, y_hat, optimizer, cost

Simple RNN training session is terminated.


### RNN(GRU) Training
- Epoch는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
- 설정값
 1. 훈련에 사용된 데이터: 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
 2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
 3. 훈련에 사용되는 데이터중 20%를 validation 셋으로 구성하였다. (1,700개) 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
 4. parameters: epoch: 200, 1 hidden layer and its size: [50, 100, 200], learning rate: 0.001, cost function: adam

In [2]:
import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
gru_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = gru_data['train_input']
train_output = gru_data['train_output']
test_input = gru_data['test_input']
test_output = gru_data['test_output']

In [3]:
# parameters
problem = 'classification' # classification, regression
rnnCell = 'gru' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

gru_values = set.simpleRNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

In [9]:
# Setting hidden layers: weightMatrix and biasMatrix
gru_weightMatrix = gru_values.genWeight()
gru_biasMatrix = gru_values.genBias()
gru_input_x,gru_input_y = gru_values.genSymbol()

In [11]:
gru_net = net.simpleRNNModel(inputSymbol=gru_input_x,
                               outputSymbol=gru_input_y,
                               rnnCell=rnnCell,
                               problem=problem,
                               trainEpoch=trainEpoch,
                               learningRate=learningRate,
                               timeStep=timeStep,
                               batchSize=batchSize,
                               validationCheck=validationCheck,
                               weightMatrix=gru_weightMatrix,
                               biasMatrix=gru_biasMatrix)

# Generate a RNN(gru) network.
gru_net.genRNN()

RNN cell type is gru


In [12]:
# Train the RNN(gru) network.
# In this tutorial, we will run only 20 epochs.
gru_net.trainRNN(train_input,train_output)

Epoch:   1 /  20, Cost : 3.025232, Validation Accuracy: 28.73%
Epoch:   2 /  20, Cost : 2.452374, Validation Accuracy: 33.44%
Epoch:   3 /  20, Cost : 2.276861, Validation Accuracy: 35.94%
Epoch:   4 /  20, Cost : 2.192620, Validation Accuracy: 37.06%
Epoch:   5 /  20, Cost : 2.135865, Validation Accuracy: 38.27%
Epoch:   6 /  20, Cost : 2.089845, Validation Accuracy: 39.01%
Epoch:   7 /  20, Cost : 2.049159, Validation Accuracy: 39.80%
Epoch:   8 /  20, Cost : 2.012706, Validation Accuracy: 40.52%
Epoch:   9 /  20, Cost : 1.979966, Validation Accuracy: 41.10%
Epoch:  10 /  20, Cost : 1.950720, Validation Accuracy: 41.75%
Epoch:  11 /  20, Cost : 1.924765, Validation Accuracy: 42.33%
Epoch:  12 /  20, Cost : 1.901514, Validation Accuracy: 42.74%
Epoch:  13 /  20, Cost : 1.880410, Validation Accuracy: 43.10%
Epoch:  14 /  20, Cost : 1.861050, Validation Accuracy: 43.48%
Epoch:  15 /  20, Cost : 1.843196, Validation Accuracy: 43.82%
Epoch:  16 /  20, Cost : 1.826601, Validation Accuracy:

In [13]:
# Test the trained RNN(gru) network.
gru_net.testRNN(test_input,test_output)

Tested with 1650 datasets.
Test Accuracy: 47.35 %


In [14]:
# Save the trained parameters.
vars = gru_net.getVariables()
# Terminate the session.
gru_net.closeRNN()

Variable list as a dictionary format.
>> weight, bias, y_hat, optimizer, cost

Simple RNN training session is terminated.


### Comments
- 위 코드상에서 히든레이어 유닛 개수가 200개인 경우만 한정지어 진행하였으나, 실제로는 히든레이어 유닛 개수를 50, 100, 200으로 달리하여 진행하였으며 그에 따른 결과는 아래의 표에서 나타난다.
- 초반 Accuracy의 변화량을 보여주고자 본 코드에서는 각 모델의 훈련 Epoch를 20회만 진행하였으나, 실제 훈련에서는 각 실험당 총 200회의 Epoch가 진행되었다.
### Result
1. 히든레이어 개수와 상관없이 훈련이 안되던 ANN의 결과와 비교해 볼 때, RNN(Vanilla)와 RNN(LSTM), 그리고 RNN(GRU)는 안정적으로 훈련이 진행되며 그에 따라 성능 향상도 보여주고 있다. 
2. 표에서 나타나는 것처럼 RNN(LSTM)과 RNN(GRU)가 비슷한 성능을 (히든레이어 유닛 200에서 각각 72.59% 70.89%로 약 2%차이) 보여주며, 이는 RNN(Vanilla) 대비 약 22% 정도의 큰 성능차이를 보여준다.
2. RNN(LSTM)과 RNN(GRU)를 놓고 비교해볼 경우 본 실험에서는 RNN(LSTM)이 RNN(Vanilla)보다 약간 2% 정도의 높은 성능을 보여주고 있다. 하지만 최근 논문들에서 GRU가 LSTM보다 더 좋은 결과를 가져온다고 주장하는점으로 비춰 볼 때, 다른 테스크에는 어떤 차이가 나타날지 주목해 볼 필요가 있다.
3. 또한 Accuracy 측면에서 RNN(Vanilla)는 불안정하게 하향과 상향을 반복하는 반면, RNN(LSTM)과, RNN(GRU)는 비교적 안정된 Accuracy 상향을 보여주고 있다.




|       Model    | Hidden Units  | Accuracy     |
| :------------: | :-----------: | -----------: |
| RNN(Vanilla)   |       50      |     44.42%   |
| RNN(Vanilla)   |       100     |     47.86%   |
| RNN(Vanilla)   |       200     |     **50.23%**   |
| RNN(LSTM)      |       50      |     49.76%   |
| RNN(LSTM)      |       100     |     56.54%   |
| RNN(LSTM)      |       200     |     **72.59%**   |
| RNN(GRU)      |       50      |     49.68%   |
| RNN(GRU)      |       100     |     55.75%   |
| RNN(GRU)      |       200     |     **70.89%**   |

### Github Code
다음의 깃헙 코드를 다운받으면 본 실험을 재현할 수 있다.
- https://github.com/hyung8758/HY_python_NN.git

