<a href="https://colab.research.google.com/github/sambitdash/EVA-2/blob/master/Phase-2/Session-2/EVA_P2S2_File_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in "A Simple Way to Initialize Recurrent Networks of Rectified Linear Units" by Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

arxiv:1504.00941v2 [cs.NE] 7 Apr 2015 http://arxiv.org/pdf/1504.00941v2.pdf

Optimizer is replaced with RMSprop which yields more stable and steady improvement.

Reaches 0.93 train/test accuracy after 900 epochs (which roughly corresponds to 1687500 steps in the original paper.)

# Phase 2 Session 2 File 1

## Data Input Design

The images are binary scanlines of $28\times28$ images. Each scanline should be considered as image state that can be used for prediction. Hence, 28 scanlines are to be provided for the system to guess what will be final digit after 28 scanlines are read. 

The original code was forcing the image to a single state 768 size vector which had limited state information to learn. By keeping the image representation intact as $28\times28$ improved the RNN cell vector correlations and thus better final prediction in just 10 epochs. 

Moreover, the batch sizes are reduced to 64 for frequent updates that can aid in faster update. Even learning rate can be increased to 1e-4 for faster convergence. 

In [3]:
from __future__ import print_function

import tensorflow.keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras import initializers
from tensorflow.keras.optimizers import RMSprop

batch_size = 64
num_classes = 10
epochs = 10
hidden_units = 100

learning_rate = 1e-4
clip_norm = 1.0

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

#x_train = x_train.reshape(x_train.shape[0], -1, 1)
#x_test = x_test.reshape(x_test.shape[0], -1, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = tensorflow.keras.utils.to_categorical(y_train, num_classes)
y_test = tensorflow.keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28)
60000 train samples
10000 test samples


## Model Design

The model is designed with 100 units and 28 states.

Thus the total number of parameters are = 100*(100+28+1) = 12910 for RNN

Dense layer of 100 to 10 with bias will have 1010 parameters

Leading to 13910 parameters in total. 

In [4]:
print('Evaluate IRNN...')
model = Sequential()
model.add(SimpleRNN(hidden_units, activation='relu', input_shape=x_train.shape[1:]))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
rmsprop = RMSprop(lr=learning_rate)
model.compile(loss='categorical_crossentropy',
              optimizer=rmsprop,
              metrics=['accuracy'])
model.summary()


Evaluate IRNN...
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn (SimpleRNN)       (None, 100)               12900     
_________________________________________________________________
dense (Dense)                (None, 10)                1010      
_________________________________________________________________
activation (Activation)      (None, 10)                0         
Total params: 13,910
Trainable params: 13,910
Non-trainable params: 0
_________________________________________________________________


## Model Training

The simple change of considering scanline as a state improves the convergence period and batch size can be reduced to 64 for faster updates. 

In [5]:
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

scores = model.evaluate(x_test, y_test, verbose=0)
print('IRNN test score:', scores[0])
print('IRNN test accuracy:', scores[1])

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
IRNN test score: 0.19227740594670176
IRNN test accuracy: 0.9413
