# FastRNN and FastGRNN in Tensorflow

This is a simple notebook that illustrates the usage of Tensorflow implementation of FastRNN and FastGRNN. We are using the USPS dataset. Please refer to `fetch_usps.py` and run it for downloading and cleaning up the dataset.

In [1]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

import helpermethods
import tensorflow as tf
import numpy as np
import sys
import os
sys.path.insert(0, '../../')

#Provide the GPU number to be used
os.environ['CUDA_VISIBLE_DEVICES'] =''

#FastRNN and FastGRNN imports
from edgeml.trainer.fastTrainer import FastTrainer
from edgeml.graph.rnn import FastGRNNCell
from edgeml.graph.rnn import FastRNNCell

# Fixing seeds for reproducibility
tf.set_random_seed(42)
np.random.seed(42)

# USPS Data

It is assumed that the USPS data has already been downloaded and set up with the help of [fetch_usps.py](fetch_usps.py) and is present in the `./usps10` subdirectory.

Note: Even though usps10 is not a time-series dataset, it can be assumed as, a time-series where each row is coming in at one single time.
So the number of timesteps = 16 and inputDims = 16

In [2]:
#Loading and Pre-processing dataset for FastCells
dataDir = "usps10/"
(dataDimension, numClasses, Xtrain, Ytrain, Xtest, Ytest) = helpermethods.preProcessData(dataDir)
print("Feature Dimension: ", dataDimension)
print("Num classes: ", numClasses)

Feature Dimension:  256
Num classes:  10


# Model Parameters

FastRNN and FastGRNN work for most of the hyper-parameters with which you could acheive decent accuracies on LSTM/GRU. Over and above that, you can use low-rank, sparsity and quatization to reduce model size upto 45x when compared to LSTM/GRU.

In [3]:
cell = "FastGRNN" # Choose between FastGRNn & FastRNN

inputDims = 16 #features taken in by RNN in one timestep
hiddenDims = 32 #hidden state of RNN

totalEpochs = 300
batchSize = 100

learningRate = 0.01
decayStep = 200
decayRate = 0.1

outFile = None #provide your file, if you need all the logging info in a file

#low-rank parameterisation for weight matrices. None => Full Rank
wRank = None 
uRank = None 

#Sparsity of the weight matrices. x => 100*x % are non-zeros
sW = 1.0 
sU = 1.0

#Non-linearities for the RNN architecture. Can choose from "tanh, sigmoid, relu, quantTanh, quantSigm"
update_non_linearity = "tanh"
gate_non_linearity = "sigmoid"

assert dataDimension % inputDims == 0, "Infeasible per step input, Timesteps have to be integer"

Placeholders for Data feeding during training and infernece

In [4]:
X = tf.placeholder("float", [None, int(dataDimension / inputDims), inputDims])
Y = tf.placeholder("float", [None, numClasses])

Creating a directory for current model in the datadirectory using timestamp

In [5]:
currDir = helpermethods.createTimeStampDir(dataDir, cell)
helpermethods.dumpCommand(sys.argv, currDir)

# FastCell Graph Object

Instantiating the FastCell Graph using modular RNN Cells which will be used for training and inference.

Note: RNN cells in edgeml.rnn can be used anywhere in place of LSTM/GRU in a plug & play fashion.

In [6]:
#Create appropriate RNN cell object based on choice
if cell == "FastGRNN":
    FastCell = FastGRNNCell(hiddenDims, gate_non_linearity=gate_non_linearity,
                            update_non_linearity=update_non_linearity,
                            wRank=wRank, uRank=uRank)
elif cell == "FastRNN":
    FastCell = FastRNNCell(hiddenDims, update_non_linearity=update_non_linearity,
                           wRank=wRank, uRank=uRank)
else:
    sys.exit('Exiting: No Such Cell as ' + cell)

# FastCell Trainer Object

Instantiating the FastCell Trainer which will be used for 3 phase training

In [7]:
FastCellTrainer = FastTrainer(FastCell, X, Y, sW=sW, sU=sU, learningRate=learningRate, outFile=outFile)

Session declaration and variable initialization. Interactive Session doesn't clog the entire GPU.

In [8]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

# FastCell Training Routine

The method to to run the 3 phase training, followed by giving out the best early stopping model, accuracy along with saving of the parameters.

In [9]:
FastCellTrainer.train(batchSize, totalEpochs, sess, Xtrain, Xtest,
                      Ytrain, Ytest, decayStep, decayRate, dataDir, currDir)


Epoch Number: 0

******************** Dense Training Phase Started ********************

Train Loss: 1.3599376968211598 Train Accuracy: 0.5645833328987161
Test Loss: 0.8439169 Test Accuracy: 0.73193824

Epoch Number: 1
Train Loss: 0.5302805780536599 Train Accuracy: 0.8190277765194575
Test Loss: 0.541681 Test Accuracy: 0.82909817

Epoch Number: 2
Train Loss: 0.32044093538489604 Train Accuracy: 0.8973611096541086
Test Loss: 0.43556738 Test Accuracy: 0.8704534

Epoch Number: 3
Train Loss: 0.22522424844404063 Train Accuracy: 0.9291666688190566
Test Loss: 0.37830535 Test Accuracy: 0.8908819

Epoch Number: 4
Train Loss: 0.1732552474261158 Train Accuracy: 0.9440277814865112
Test Loss: 0.35569373 Test Accuracy: 0.900847

Epoch Number: 5
Train Loss: 0.15992410496498147 Train Accuracy: 0.9479166691501936
Test Loss: 0.37784928 Test Accuracy: 0.8933732

Epoch Number: 6
Train Loss: 0.13975419962985647 Train Accuracy: 0.9552777806917826
Test Loss: 0.37031704 Test Accuracy: 0.90632784

Epoch Number:


Epoch Number: 62
Train Loss: 0.020008320616196014 Train Accuracy: 0.9936111172040304
Test Loss: 0.4036343 Test Accuracy: 0.9317389

Epoch Number: 63
Train Loss: 0.03814307231287886 Train Accuracy: 0.9858333418766657
Test Loss: 0.40347323 Test Accuracy: 0.9237668

Epoch Number: 64
Train Loss: 0.02620117269624542 Train Accuracy: 0.992083340883255
Test Loss: 0.416209 Test Accuracy: 0.93024415

Epoch Number: 65
Train Loss: 0.024507912253385358 Train Accuracy: 0.9919444512989786
Test Loss: 0.43610597 Test Accuracy: 0.92526156

Epoch Number: 66
Train Loss: 0.019725336282539904 Train Accuracy: 0.9944444489147928
Test Loss: 0.4169061 Test Accuracy: 0.9292476

Epoch Number: 67
Train Loss: 0.015033517196166536 Train Accuracy: 0.995416671037674
Test Loss: 0.4207301 Test Accuracy: 0.93472844

Epoch Number: 68
Train Loss: 0.03248270463276034 Train Accuracy: 0.9897222303681903
Test Loss: 0.45667982 Test Accuracy: 0.92526156

Epoch Number: 69
Train Loss: 0.026923538450824305 Train Accuracy: 0.990277


Epoch Number: 124
Train Loss: 0.0015416611581208094 Train Accuracy: 0.9995833337306976
Test Loss: 0.47261178 Test Accuracy: 0.9332337

Epoch Number: 125
Train Loss: 0.0017297351865232082 Train Accuracy: 0.9997222224871317
Test Loss: 0.47400546 Test Accuracy: 0.9362232

Epoch Number: 126
Train Loss: 0.0016402734902132782 Train Accuracy: 0.9997222224871317
Test Loss: 0.48180088 Test Accuracy: 0.93472844

Epoch Number: 127
Train Loss: 0.0013821528517736523 Train Accuracy: 0.9998611112435659
Test Loss: 0.48116747 Test Accuracy: 0.9332337

Epoch Number: 128
Train Loss: 0.001449145969773882 Train Accuracy: 0.9998611112435659
Test Loss: 0.48738334 Test Accuracy: 0.9352267

Epoch Number: 129
Train Loss: 0.0015336634696723195 Train Accuracy: 0.9995833337306976
Test Loss: 0.49717796 Test Accuracy: 0.9312407

Epoch Number: 130
Train Loss: 0.001357548176504982 Train Accuracy: 0.9998611112435659
Test Loss: 0.49855435 Test Accuracy: 0.93472844

Epoch Number: 131
Train Loss: 0.0012615662327435631 Tr

Train Loss: 0.0015056908515463066 Train Accuracy: 0.9997222224871317
Test Loss: 0.45554337 Test Accuracy: 0.937718

Epoch Number: 186
Train Loss: 0.0015354208547554056 Train Accuracy: 0.9998611112435659
Test Loss: 0.45303786 Test Accuracy: 0.93721974

Epoch Number: 187
Train Loss: 0.001567479421889099 Train Accuracy: 0.9998611112435659
Test Loss: 0.47168335 Test Accuracy: 0.9367215

Epoch Number: 188
Train Loss: 0.001717535155168864 Train Accuracy: 0.9998611112435659
Test Loss: 0.46497926 Test Accuracy: 0.9352267

Epoch Number: 189
Train Loss: 0.0015925323949785605 Train Accuracy: 0.9998611112435659
Test Loss: 0.4687307 Test Accuracy: 0.937718

Epoch Number: 190
Train Loss: 0.0017255359450953417 Train Accuracy: 0.9997222224871317
Test Loss: 0.4762092 Test Accuracy: 0.9352267

Epoch Number: 191
Train Loss: 0.001808238027782257 Train Accuracy: 0.9998611112435659
Test Loss: 0.4705828 Test Accuracy: 0.9367215

Epoch Number: 192
Train Loss: 0.001607122014320339 Train Accuracy: 0.99986111124


Epoch Number: 247
Train Loss: 0.0009671178119485072 Train Accuracy: 0.9998611112435659
Test Loss: 0.46134597 Test Accuracy: 0.935725

Epoch Number: 248
Train Loss: 0.0009414523363173228 Train Accuracy: 0.9998611112435659
Test Loss: 0.46288398 Test Accuracy: 0.935725

Epoch Number: 249
Train Loss: 0.0009164249787419168 Train Accuracy: 0.9998611112435659
Test Loss: 0.46443734 Test Accuracy: 0.9362232

Epoch Number: 250
Train Loss: 0.0008920004451687823 Train Accuracy: 0.9998611112435659
Test Loss: 0.46600705 Test Accuracy: 0.9362232

Epoch Number: 251
Train Loss: 0.0008681359632747546 Train Accuracy: 0.9998611112435659
Test Loss: 0.4675939 Test Accuracy: 0.9362232

Epoch Number: 252
Train Loss: 0.0008447880918538431 Train Accuracy: 0.9998611112435659
Test Loss: 0.46919972 Test Accuracy: 0.9352267

Epoch Number: 253
Train Loss: 0.0008219110804930096 Train Accuracy: 0.9998611112435659
Test Loss: 0.47082672 Test Accuracy: 0.9352267

Epoch Number: 254
Train Loss: 0.0007994574875232906 Train

# Model Quantization

Byte Quantization for the trained FastModels, to reduce the model size by 4x. If one uses piece-wise linear approximations for non-linearities like quantTanh for tanh and quantSigm for Sigmoid, they can benefit greatly from pure integer arithmetic after model quantization during prediction

In [10]:
#Model quantization
model_dir = currDir #you will see model dir printed at the end of trianing, use that here or use the currDir

import quantizeFastModels
quantizeFastModels.quantizeFastModels(model_dir)

Bg.npy has max: 6.105938 min: -0.47308242
Bh.npy has max: 3.0693681 min: -0.10286169
FC.npy has max: 4.952105 min: -6.216086
FCbias.npy has max: 2.014953 min: -1.2401508
U.npy has max: 2.4682086 min: -2.4029086
W.npy has max: 1.6722481 min: -1.6684073


Quantized Model Dir: usps10//FastGRNNResults/17_41_05_15_08_18/QuantizedFastModel
