Optimizers are the core part of any Deep learning project. No matter, How well you have engineered your network 
and what kind of activation functions you have used, if your gradients are not computed properly, everything will be a waste of time.

I have observed some behaviours in the way optimizers are used. Most of them straight away jump to Adam or RMSprop, seeing the trend that they perform well

In this blog post, we will see how different optimizers perform on 
- MNIST
and understand how the accuracy is improving with time. 

For faster experimentation, lets build a class based framework so that the experimentation is really quick.

In [1]:
import numpy as np
import tensorflow as tf
from mlp import BO
import os 

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
optimizer = ["sgd", "momentum", "nestrov_momentum", "adagrad", "adadelta", "rmsprop", "adam"]
learning_rate = [0.0001, 0.001, 0.01, 0.1]

In [3]:
x_train = mnist.train.images
y_train = mnist.train.labels
x_valid = mnist.validation.images
y_valid = mnist.validation.labels
x_test = mnist.test.images
y_test = mnist.test.labels
print (x_test.shape)
print (y_test.shape)

(10000, 784)
(10000, 10)


In [4]:
model = BO(x_train, y_train, x_valid, y_valid, x_test, y_test)
print ("[Model Initialized]")

[Model Initialized]


In [5]:
model.build_graph()

<mlp.BO at 0x10ad0c0f0>

In [6]:
model.compile_graph(optimize = optimizer[1], learning_rate = learning_rate[0])

[Using optimizer]: momentum
[Using ftrl]


<mlp.BO at 0x10ad0c0f0>

In [7]:
model.train(summary_dir = "/tmp/mnist/"+optimizer[1]+"_"+str(learning_rate[0]))

epoch:  0 train_cost:  2.30549 Train_Accuracy:  0.102509 valid_cost:  2.30603 Validation_Accuracy:  0.0986
epoch:  0 train_cost:  2.30419 Train_Accuracy:  0.102509 valid_cost:  2.30471 Validation_Accuracy:  0.0986
epoch:  0 train_cost:  2.3031 Train_Accuracy:  0.102509 valid_cost:  2.30352 Validation_Accuracy:  0.0986
epoch:  0 train_cost:  2.30246 Train_Accuracy:  0.102509 valid_cost:  2.30277 Validation_Accuracy:  0.0986
epoch:  1 train_cost:  2.30234 Train_Accuracy:  0.102509 valid_cost:  2.30254 Validation_Accuracy:  0.0986
epoch:  1 train_cost:  2.302 Train_Accuracy:  0.102509 valid_cost:  2.30217 Validation_Accuracy:  0.0986
epoch:  1 train_cost:  2.30157 Train_Accuracy:  0.102509 valid_cost:  2.30174 Validation_Accuracy:  0.0986
epoch:  1 train_cost:  2.30145 Train_Accuracy:  0.102509 valid_cost:  2.3015 Validation_Accuracy:  0.0986
epoch:  2 train_cost:  2.30133 Train_Accuracy:  0.102509 valid_cost:  2.30141 Validation_Accuracy:  0.0986
epoch:  2 train_cost:  2.3013 Train_Accur