In [1]:
import keras
import numpy as np

Using Theano backend.
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


In [18]:
np.random.seed(123)

## Preprocessing Data

In [2]:
def get_mnist_data():
    from keras.datasets import mnist
    from keras.utils import np_utils
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1], X_train.shape[2])
    y_train = y_train.reshape(y_train.shape[0], 1)
    X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1], X_test.shape[2])
    y_test = y_test.reshape(y_test.shape[0], 1)
    
    y_train = np_utils.to_categorical(y_train)
    y_test  = np_utils.to_categorical(y_test)
    
    return X_train, y_train, X_test, y_test

In [3]:
X_train, y_train, X_test, y_test = get_mnist_data()

In [4]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((60000, 1, 28, 28), (60000, 10), (10000, 1, 28, 28), (10000, 10))

In [5]:
X_mean = X_train.mean().astype(np.float32)

In [6]:
X_std = X_train.std().astype(np.float32)

In [7]:
def normalizer(x):
    return (x - X_mean) / X_std

## Building the model

In [8]:
from keras.layers import Convolution2D, Dense, Flatten, Lambda, Dropout
from keras.models import Sequential
from keras.optimizers import Adam

### Neural Networks Model

In [20]:
nb_epoch = 5

In [14]:
model_1 = Sequential()
model_1.add(Lambda(normalizer, input_shape=(1, 28, 28)))
model_1.add(Flatten())
model_1.add(Dense(512, activation='softmax'))
model_1.add(Dense(10, activation='softmax'))

In [15]:
model_1.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'] )

In [25]:
model_1.fit(X_train, y_train,batch_size=64, validation_data=(X_test, y_test), nb_epoch=nb_epoch)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f936ac77110>

### Is learning rate optimal?

default in Adam is 0.001

In [19]:
model_1.optimizer.lr = 0.1

In [23]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f936abbf150>

** Interesting **

Appears like batch size also affects the performance. Why is this? Look into Adam more. If it is similar to SGD, this makes sense.

Effects of batch_size:
* Higher ==> less time to train
* Seems to affect the performance as well. So, you could consider this as a hyperparameter as well. 

In [26]:
model_1.optimizer.lr = 0.0001

In [28]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=20)

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f936b50ded0>

### Learning Rate and its usage

We could try the following
* Start with a relatively high learning rate for a few epochs
* Reduce it and continue for a few more epochs
* Reduce it further and continue for more epochs(make sure that it doesn't go so far that it overfits)

In [39]:
# Let's put this in action and do it right this time!
model_1 = Sequential()
model_1.add(Lambda(normalizer, input_shape=(1, 28, 28)))
model_1.add(Flatten())
model_1.add(Dense(512, activation='softmax'))
model_1.add(Dense(10, activation='softmax'))
model_1.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'] )

In [40]:
model_1.optimizer.lr = 0.1

In [41]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368ebd2d0>

Hmmm, let's try more epochs with same learning rate and see where it gets us

In [42]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=8)

Train on 60000 samples, validate on 10000 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x7f9368ebda90>

Clearly, the learning rate is too much and it's not moving from a rut that it had ended up at

In [43]:
model_1.optimizer.lr = 0.001

In [44]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=8)

Train on 60000 samples, validate on 10000 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x7f9369628fd0>

In [45]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368ebdc10>

In [46]:
model_1.optimizer.lr = 0.0001

In [47]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9368ebd6d0>

In [48]:
model_1.optimizer.lr = 1

In [49]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368ebda10>

In [50]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368ebdf50>

### Nope. We seem to have reached a dead end. A local minima perhaps?

Let's start over!

In [51]:
model_1 = Sequential()
model_1.add(Lambda(normalizer, input_shape=(1, 28, 28)))
model_1.add(Flatten())
model_1.add(Dense(512, activation='softmax'))
model_1.add(Dense(10, activation='softmax'))
model_1.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'] )

In [52]:
# default lr
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f936848ccd0>

already doing much better

Moral of the story - ** Learning rate matters. If shits going wrong initially itself, check lr maybe? **

In [53]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368ebd610>

In [54]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368d09810>

In [55]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f936848ce50>

In [56]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368434210>

In [57]:
# Starting to overfit now. But let's see where it goes
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f93684344d0>

In [58]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9369f32410>

In [59]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9368d38410>

## Seems to be overfitting the data

**Question** 

How much difference in training and validation data is tolerable? How much is too much?

In [60]:
model_1.optimizer.lr = 0.0001

In [61]:
model_1.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=10)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f936848cf10>

In [62]:
# .... yeah.. I think we are definitely overfitting now

## Different (possibly, random) architechtures

In [63]:
model_2 = Sequential()
model_2.add(Lambda(normalizer, input_shape=(1, 28, 28)))
model_2.add(Flatten())
model_2.add(Dense(512, activation='softmax'))
model_2.add(Dense(256, activation='softmax'))
model_2.add(Dense(10, activation='softmax'))
model_2.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'] )

In [64]:
model_2.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=10)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f9367998cd0>

In [66]:
model_2.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9367935610>

In [67]:
# possible checkpoint to stop optimizing. can overfit

In [68]:
# let's reduce lr and continue

In [69]:
model_2.optimizer.lr = 0.0001

In [70]:
model_2.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9367998f50>

In [71]:
model_2.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f93679357d0>

In [72]:
model_2.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9367998e90>

In [73]:
model_2.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9368ebdb10>

In [74]:
model_2.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9367998e10>

In [75]:
# Nope. We should've stopped the first time we hit around 95.5 val accuracy

## Next Architecture

In [76]:
model_3 = Sequential()
model_3.add(Lambda(normalizer, input_shape=(1, 28, 28)))
model_3.add(Flatten())
model_3.add(Dense(1024, activation='softmax'))
model_3.add(Dense(512, activation='softmax'))
model_3.add(Dense(256, activation='softmax'))
model_3.add(Dense(10, activation='softmax'))
model_3.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'] )

In [77]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f9366c2a550>

### The more complex the architecture is, more time it takes to fit it?

In [78]:
model_3.optimizer.lr = 0.01

In [79]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f9367943250>

In [80]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9366c2a790>

In [81]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9367998d10>

In [82]:
## Big progess! What happened there? 

In [83]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9367935110>

In [84]:
#took more time, but we're got to that 95.5~ mark. Let's go further

In [85]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9367943410>

In [86]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=10)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f9366c2ad10>

In [87]:
model_3.optimizer.lr = 0.001

In [88]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f936848ce90>

In [89]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9366c2ab10>

In [90]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=5)

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f9366c2a5d0>

In [91]:
model_3.fit(X_train, y_train, batch_size=64, validation_data=(X_test, y_test), nb_epoch=15)

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f9366c2a950>