Optimizers are the core part of any Deep learning project. No matter, How well you have engineered your network 
and what kind of activation functions you have used, if your gradients are not computed properly, everything will be a waste of time.

I have observed some behaviours in the way optimizers are used. Most of them straight away jump to Adam or RMSprop, seeing the trend that they perform well

In this blog post, we will see how different optimizers perform on 
- MNIST
- CIFAR10 datasets 
and understand how the accuracy is improving with time. 

For faster experimentation, lets build a class based framework so that the experimentation is really quick.

In [1]:
import numpy as np
import tensorflow as tf
from mlp import BO
import os 

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
x_train = mnist.train.images
y_train = mnist.train.labels
x_valid = mnist.validation.images
y_valid = mnist.validation.labels
x_test = mnist.test.images
y_test = mnist.test.labels
print (x_test.shape)
print (y_test.shape)

(10000, 784)
(10000, 10)


In [3]:
optimizer = ["adadelta", "adagrad", "adam", "ftrl", "momentum", "rmsprop", "sgd"]

In [4]:
model = BO(x_train, y_train, x_valid, y_valid, x_test, y_test)

In [5]:
model.build_graph()

<mlp.BO at 0x7fecb465e438>

In [6]:
for i in optimizer:
    model.compile_graph(optimize = i, learning_rate = 0.01)
    model.train(summary_dir = os.getcwd()+"/optimizers/"+i)

[Using optimizer]: adadelta
epoch:  0 train_cost:  2.30241 Train_Accuracy:  0.116455 valid_cost:  2.30242 Validation_Accuracy:  0.115
epoch:  0 train_cost:  2.29933 Train_Accuracy:  0.112345 valid_cost:  2.29913 Validation_Accuracy:  0.1126
epoch:  0 train_cost:  2.29913 Train_Accuracy:  0.112345 valid_cost:  2.29862 Validation_Accuracy:  0.1126
epoch:  0 train_cost:  2.2989 Train_Accuracy:  0.112345 valid_cost:  2.29862 Validation_Accuracy:  0.1126
epoch:  1 train_cost:  2.29894 Train_Accuracy:  0.112345 valid_cost:  2.29846 Validation_Accuracy:  0.1126
epoch:  1 train_cost:  2.29876 Train_Accuracy:  0.112345 valid_cost:  2.29846 Validation_Accuracy:  0.1126
epoch:  1 train_cost:  2.29867 Train_Accuracy:  0.112345 valid_cost:  2.29807 Validation_Accuracy:  0.1126
epoch:  1 train_cost:  2.29847 Train_Accuracy:  0.112345 valid_cost:  2.29819 Validation_Accuracy:  0.1126
epoch:  2 train_cost:  2.29851 Train_Accuracy:  0.112345 valid_cost:  2.29802 Validation_Accuracy:  0.1126
epoch:  2 t