## Comparison of English character recognition performance between RNN(Vanilla) and RNN(LSTM).
                                                                                        Hyungwon Yang
                                                                                             04.20.17
                                                                                            EMCS Labs
### Task
Tensorflow에서 제공하는 기본적인 RNN방식과 LSTM 셀을 적용한 RNN방식 두 모델의 성능을 비교한다.
- 영어 character 단위의 데이터셋을 이용하여 훈련한 뒤, 훈련에 사용하지 않은 테스트 셋으로 결과를 추출하여 두 모델의 성능을 비교한다.

### Training Corpus
- Project Gutenberg's The Divine Comedy, Complete, by Dante Alighieri
This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org
The part of the corpus was extracted for training.

### Experimental Setting.
- Python 3.5.3
- Tnesorflow 1.0.0
- Mac OSX sierra 10.12.4

### Data Preprocessing.
- 이전 report1에서 보고하였던 것으로 갈음한다.

### RNN(Vanilla) Training
- Epoch는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
- 설정값
 1. 훈련에 사용된 데이터: 8,500 * 20 * 38 (# of examples, # of time steps ,# of input features)
 2. 테스트에 사용된 데이터 : 1,650 * 20 * 38 (# of examples, # of time steps ,# of input features)
 3. 훈련에 사용되는 데이터중 20%를 validation 셋으로 구성하였다. (1,700개) 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
 4. parameters: epoch: 200, 1 hidden layer and its size: 200, learning rate: 0.001, cost function: adam

In [2]:
import sys
# HY_python_NN absolute directory.
my_absdir = "/Users/hyungwonyang/Google_Drive/Python/HY_python_NN"
sys.path.append(my_absdir)

import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
rnn_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = rnn_data['train_input']
train_output = rnn_data['train_output']
test_input = rnn_data['test_input']
test_output = rnn_data['test_output']

In [2]:
# parameters
problem = 'classification' # classification, regression
rnnCell = 'rnn' # rnn, lstm
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

rnn_values = set.simpleRNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

In [3]:
# Setting hidden layers: weightMatrix and biasMatrix
rnn_weightMatrix = rnn_values.genWeight()
rnn_biasMatrix = rnn_values.genBias()
rnn_input_x,rnn_input_y = rnn_values.genSymbol()


In [4]:
rnn_net = net.simpleRNNModel(inputSymbol=rnn_input_x,
                               outputSymbol=rnn_input_y,
                               rnnCell=rnnCell,
                               problem=problem,
                               trainEpoch=trainEpoch,
                               learningRate=learningRate,
                               timeStep=timeStep,
                               batchSize=batchSize,
                               validationCheck=validationCheck,
                               weightMatrix=rnn_weightMatrix,
                               biasMatrix=rnn_biasMatrix)

# Generate a RNN(lstm) network.
rnn_net.genRNN()

RNN cell type is rnn


In [5]:
# Train the RNN(lstm) network.
# In this tutorial, we will run only 20 epochs.
rnn_net.trainRNN(train_input,train_output)

Epoch : 1 / 20 , Cost : 3.004327
Validation Accuracy: 34.67 %
Epoch : 2 / 20 , Cost : 2.299680
Validation Accuracy: 36.10 %
Epoch : 3 / 20 , Cost : 2.213341
Validation Accuracy: 36.44 %
Epoch : 4 / 20 , Cost : 2.179098
Validation Accuracy: 36.74 %
Epoch : 5 / 20 , Cost : 2.160582
Validation Accuracy: 36.87 %
Epoch : 6 / 20 , Cost : 2.149142
Validation Accuracy: 36.90 %
Epoch : 7 / 20 , Cost : 2.141430
Validation Accuracy: 36.97 %
Epoch : 8 / 20 , Cost : 2.135674
Validation Accuracy: 36.95 %
Epoch : 9 / 20 , Cost : 2.131341
Validation Accuracy: 36.89 %
Epoch : 10 / 20 , Cost : 2.127995
Validation Accuracy: 36.90 %
Epoch : 11 / 20 , Cost : 2.125090
Validation Accuracy: 36.89 %
Epoch : 12 / 20 , Cost : 2.122569
Validation Accuracy: 36.96 %
Epoch : 13 / 20 , Cost : 2.120447
Validation Accuracy: 36.89 %
Epoch : 14 / 20 , Cost : 2.118592
Validation Accuracy: 36.81 %
Epoch : 15 / 20 , Cost : 2.116935
Validation Accuracy: 36.85 %
Epoch : 16 / 20 , Cost : 2.115445
Validation Accuracy: 36.84 %
E

In [6]:
# Test the trained RNN(lstm) network.
rnn_net.testRNN(test_input,test_output)

Tested with 1650 datasets.
Test Accuracy: 37.31 %


In [7]:
# Save the trained parameters.
vars = rnn_net.getVariables()
# Terminate the session.
rnn_net.closeRNN()

Variable list as a dictionary format.
>> weight, bias, y_hat, optimizer, cost

Simple RNN training session is terminated.


### RNN(LSTM) Training
- Epoch는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
- 설정값
 1. 훈련에 사용된 데이터: 8,500 * 20 * 38 (# of examples, # of time steps ,# of input features)
 2. 테스트에 사용된 데이터 : 1,650 * 20 * 38 (# of examples, # of time steps ,# of input features)
 3. 훈련에 사용되는 데이터중 20%를 validation 셋으로 구성하였다. (1,700개) 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
 4. parameters: epoch: 200, 1 hidden layer and its size: 200, learning rate: 0.001, cost function: adam

In [3]:
import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
ann_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = ann_data['train_input']
train_output = ann_data['train_output']
test_input = ann_data['test_input']
test_output = ann_data['test_output']

In [4]:
# parameters
problem = 'classification' # classification, regression
rnnCell = 'lstm' # rnn, lstm
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

lstm_values = set.simpleRNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

In [5]:
# Setting hidden layers: weightMatrix and biasMatrix
lstm_weightMatrix = lstm_values.genWeight()
lstm_biasMatrix = lstm_values.genBias()
lstm_input_x,lstm_input_y = lstm_values.genSymbol()

In [6]:
lstm_net = net.simpleRNNModel(inputSymbol=lstm_input_x,
                               outputSymbol=lstm_input_y,
                               rnnCell=rnnCell,
                               problem=problem,
                               trainEpoch=trainEpoch,
                               learningRate=learningRate,
                               timeStep=timeStep,
                               batchSize=batchSize,
                               validationCheck=validationCheck,
                               weightMatrix=lstm_weightMatrix,
                               biasMatrix=lstm_biasMatrix)

# Generate a RNN(lstm) network.
lstm_net.genRNN()

RNN cell type is lstm


In [7]:
# Train the RNN(lstm) network.
# In this tutorial, we will run only 20 epochs.
lstm_net.trainRNN(train_input,train_output)

Epoch : 1 / 20 , Cost : 2.945145
Validation Accuracy: 29.73 %
Epoch : 2 / 20 , Cost : 2.457450
Validation Accuracy: 33.45 %
Epoch : 3 / 20 , Cost : 2.300332
Validation Accuracy: 35.29 %
Epoch : 4 / 20 , Cost : 2.228929
Validation Accuracy: 36.03 %
Epoch : 5 / 20 , Cost : 2.182124
Validation Accuracy: 37.01 %
Epoch : 6 / 20 , Cost : 2.143274
Validation Accuracy: 37.69 %
Epoch : 7 / 20 , Cost : 2.106803
Validation Accuracy: 38.43 %
Epoch : 8 / 20 , Cost : 2.072101
Validation Accuracy: 39.16 %
Epoch : 9 / 20 , Cost : 2.040219
Validation Accuracy: 39.87 %
Epoch : 10 / 20 , Cost : 2.011420
Validation Accuracy: 40.53 %
Epoch : 11 / 20 , Cost : 1.985421
Validation Accuracy: 41.08 %
Epoch : 12 / 20 , Cost : 1.961733
Validation Accuracy: 41.47 %
Epoch : 13 / 20 , Cost : 1.939799
Validation Accuracy: 41.84 %
Epoch : 14 / 20 , Cost : 1.919247
Validation Accuracy: 42.21 %
Epoch : 15 / 20 , Cost : 1.899938
Validation Accuracy: 42.59 %
Epoch : 16 / 20 , Cost : 1.881887
Validation Accuracy: 42.89 %
E

In [8]:
# Test the trained RNN(lstm) network.
lstm_net.testRNN(test_input,test_output)

Tested with 1650 datasets.
Test Accuracy: 45.98 %


In [9]:
# Save the trained parameters.
vars = lstm_net.getVariables()
# Terminate the session.
lstm_net.closeRNN()

Variable list as a dictionary format.
>> weight, bias, y_hat, optimizer, cost

Simple RNN training session is terminated.


### Result
위 코드상에서 실험은 히든레이어 유닛 개수가 200개인경우만 한정지어 진행하였으나, 실제로는 히든레이어 유닛 개수를 50, 100, 200으로 달리하여 진행하였으며 그에 따른 결과는 다음과 같다. 
1. 히든레이어 개수와 상관없이 훈련이 안되던 ANN의 결과와 비교해 볼 때, RNN(Vanilla)와 RNN(LSTM)은 성능 향상을 보여주고 있다. 
2. 하지만 RNN(Vanilla)와 RNN(LSTM)을 놓고 비교해볼 경우 RNN(LSTM)이 RNN(Vanilla)보다 더 좋은 훈련 성능을 보여주고 있다. 두 모델의 히든레이어 개수가 200일 경우, RNN(Vanilla)의 정확도는 50.23%이고 RNN(LSTM)은 72.59로 후자가 대략 22%정도의 월등한 성능차를 보여주고 있다. 
3. 또한 Accuracy 측면에서 RNN(Vanilla)는 불안정하게 하향과 상향을 반복하는 반면, RNN(LSTM)은 비교적 안정된 상향을 보여주고 있다.
4. 따라서 character를 훈련할 때, RNN(LSTM)이 RNN(Vanilla)보다 더 좋은 성능을 보여준다는 것을 본 실험을 통해 증명되었다.



|       Model    | Hidden Units  | Accuracy     |
| :------------: | :-----------: | -----------: |
| RNN(Vanilla)   |       50      |     44.42%   |
| RNN(Vanilla)   |       100     |     47.86%   |
| RNN(Vanilla)   |       200     |     **50.23%**   |
| RNN(LSTM)      |       50      |     49.76%   |
| RNN(LSTM)      |       100     |     56.54%   |
| RNN(LSTM)      |       200     |     **72.59%**   |

### Github Code
다음의 깃헙 코드를 다운받으면 본 실험을 재현할 수 있다.
- https://github.com/hyung8758/HY_python_NN.git

