# Bonsai in Tensorflow

This is a simple notebook that illustrates the usage of Tensorflow implementation of Bonsai. We are using the USPS dataset. Please refer to `fetch_usps.py` and run it for downloading and cleaning up the dataset.

In [1]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT license.

import helpermethods
import tensorflow as tf
import numpy as np
import sys
import os
sys.path.insert(0, '../../')

#Provide the GPU number to be used
os.environ['CUDA_VISIBLE_DEVICES'] =''

#Bonsai imports
from edgeml.trainer.bonsaiTrainer import BonsaiTrainer
from edgeml.graph.bonsai import Bonsai

# Fixing seeds for reproducibility
tf.set_random_seed(42)
np.random.seed(42)

ImportError: No module named 'numpy'

# USPS Data

It is assumed that the USPS data has already been downloaded and set up with the help of [fetch_usps.py](fetch_usps.py) and is present in the `./usps10` subdirectory.

In [None]:
#Loading and Pre-processing dataset for Bonsai
dataDir = "usps10/"
(dataDimension, numClasses, Xtrain, Ytrain, Xtest, Ytest) = helpermethods.preProcessData(dataDir)
print("Feature Dimension: ", dataDimension)
print("Num classes: ", numClasses)

# Model Parameters

Note that Bonsai is designed for low-memory setting and the best results are obtained when operating in that setting. Use the sparsity, projection dimension and tree depth to vary the model size.

In [4]:
sigma = 1.0 #Sigmoid parameter for tanh
depth = 3 #Depth of Bonsai Tree
projectionDimension = 28 #Lower Dimensional space for Bonsai to work on

#Regularizers for Bonsai Parameters
regZ = 0.0001
regW = 0.001
regV = 0.001
regT = 0.001

totalEpochs = 100

learningRate = 0.01

outFile = None

#Sparsity for Bonsai Parameters. x => 100*x % are non-zeros
sparZ = 0.2
sparW = 0.3
sparV = 0.3
sparT = 0.62

batchSize = np.maximum(100, int(np.ceil(np.sqrt(Ytrain.shape[0]))))

useMCHLoss = True #only for Multiclass cases True: Multiclass-Hing Loss, False: Cross Entropy. 

#Bonsai uses one classier for Binary, thus this condition
if numClasses == 2:
    numClasses = 1

Placeholders for Data feeding during training and infernece

In [5]:
X = tf.placeholder("float32", [None, dataDimension])
Y = tf.placeholder("float32", [None, numClasses])

Creating a directory for current model in the datadirectory using timestamp

In [6]:
currDir = helpermethods.createTimeStampDir(dataDir)
helpermethods.dumpCommand(sys.argv, currDir)

# Bonsai Graph Object

Instantiating the Bonsai Graph which will be used for training and inference.

In [7]:
bonsaiObj = Bonsai(numClasses, dataDimension, projectionDimension, depth, sigma)

# Bonsai Trainer Object

Instantiating the Bonsai Trainer which will be used for 3 phase training.

In [8]:
bonsaiTrainer = BonsaiTrainer(bonsaiObj, regW, regT, regV, regZ, sparW, sparT, sparV, sparZ,
                              learningRate, X, Y, useMCHLoss, outFile)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Session declaration and variable initialization. 
Interactive Session doesn't clog the entire GPU.

In [9]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

# Bonsai Training Routine

The method to to run the 3 phase training, followed by giving out the best early stopping model, accuracy along with saving of the parameters.

In [None]:
bonsaiTrainer.train(batchSize, totalEpochs, sess,
                    Xtrain, Xtest, Ytrain, Ytest, dataDir, currDir)


Epoch Number: 0

******************** Dense Training Phase Started ********************

Train Loss: 6.389298193984562 Train accuracy: 0.6244444446638227
Test accuracy 0.724464
MarginLoss + RegLoss: 1.4452395 + 3.6491284 = 5.094368


Epoch Number: 1
Train Loss: 3.688335908783807 Train accuracy: 0.8611111144224802
Test accuracy 0.769307
MarginLoss + RegLoss: 0.9957626 + 2.7783642 = 3.7741268


Epoch Number: 2
Train Loss: 2.6678174800342984 Train accuracy: 0.9186111175351672
Test accuracy 0.760339
MarginLoss + RegLoss: 0.8745117 + 2.0955048 = 2.9700165


Epoch Number: 3
Train Loss: 1.9926944921414058 Train accuracy: 0.941527778075801
Test accuracy 0.776283
MarginLoss + RegLoss: 0.7323152 + 1.5899123 = 2.3222275


Epoch Number: 4
Train Loss: 1.5220159557130601 Train accuracy: 0.9556944477889273
Test accuracy 0.809666
MarginLoss + RegLoss: 0.58971727 + 1.2277353 = 1.8174525


Epoch Number: 5
Train Loss: 1.1967213302850723 Train accuracy: 0.9623611138926612
Test accuracy 0.839063
MarginLos


Epoch Number: 49
Train Loss: 0.1523623133285178 Train accuracy: 0.9843055663837327
Test accuracy 0.935725
MarginLoss + RegLoss: 0.16232616 + 0.1009971 = 0.26332325


Epoch Number: 50
Train Loss: 0.15000420074082083 Train accuracy: 0.984166675971614
Test accuracy 0.939711
MarginLoss + RegLoss: 0.15988377 + 0.09935409 = 0.25923786


Epoch Number: 51
Train Loss: 0.14788469382458264 Train accuracy: 0.9847222343087196
Test accuracy 0.938216
MarginLoss + RegLoss: 0.15896915 + 0.09833242 = 0.25730157


Epoch Number: 52
Train Loss: 0.14853439800855187 Train accuracy: 0.9834722346729703
Test accuracy 0.936722
MarginLoss + RegLoss: 0.15929905 + 0.09814263 = 0.25744167


Epoch Number: 53
Train Loss: 0.14974186455623972 Train accuracy: 0.9852777885066138
Test accuracy 0.941206
MarginLoss + RegLoss: 0.16047765 + 0.09778558 = 0.25826323


Epoch Number: 54
Train Loss: 0.15054848376247618 Train accuracy: 0.9829166788193915
Test accuracy 0.937718
MarginLoss + RegLoss: 0.16177258 + 0.097229056 = 0.2590


Epoch Number: 98
Train Loss: 0.1486424860647983 Train accuracy: 0.9831944555044174
Test accuracy 0.923269
MarginLoss + RegLoss: 0.18793702 + 0.096515276 = 0.2844523


Epoch Number: 99
Train Loss: 0.15365824424144295 Train accuracy: 0.9820833421415753
Test accuracy 0.935227
MarginLoss + RegLoss: 0.16577286 + 0.097828515 = 0.26360136


Maximum Test accuracy at compressed model size(including early stopping): 0.94220227 at Epoch: 72
Final Test Accuracy: 0.9352267

Non-Zeros: 4156.0 Model Size: 31.703125 KB hasSparse: True

The Model Directory: usps10//TFBonsaiResults/10_16_32_15_08_18

