## Kaggel: Digit Recognizer(MNIST)  by Hyperopt + Data Augmentaion 
Kaggle Digit recognizer: https://www.kaggle.com/c/digit-recognizer  
Hyperopt: https://github.com/hyperopt/hyperopt  

### Score: 0.99671
* max_evals= 20 (time: 2h 5m)  
* data_augmentation (time: 3m 17s) 

Python 3.6  
NVIDIA GTX1060  

In [51]:
import warnings
warnings.filterwarnings('ignore')

import hyperopt
from hyperopt import hp, fmin, rand, tpe, Trials, space_eval, STATUS_OK

from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from keras.utils import np_utils
import keras

import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# fix random seed
import tensorflow as tf
import random as rn
import os
os.environ['PYTHONHASHSEED'] = '0'
seed = 123
rn.seed(seed)
np.random.seed(seed)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
tf.set_random_seed(seed)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

print('tensorflow ver.:',tf.__version__)
print('keras ver.     : ', keras.__version__)
print('hyperopt ver.  :   ', hyperopt.__version__)

tensorflow ver.: 1.11.0
keras ver.     :  2.2.2
hyperopt ver.  :    0.2


## Data preparation: MNIST from Kaggle

In [52]:
train = pd.read_csv('../train.csv')
label = train.label
train = train.drop(['label'], axis=1)
train = train.values.reshape(-1, 28, 28, 1)

x_train, x_val, y_train, y_val = train_test_split(train, label, test_size=0.15, shuffle=True, random_state=seed)
x_train = x_train.astype('float32') / 255.0
x_val = x_val.astype('float32') / 255.0

[x_train.shape, x_val.shape]

[(35700, 28, 28, 1), (6300, 28, 28, 1)]

## Hyperparameters for Hyperopt:  
* Mainly each Dropout ratio

In [53]:
params = {
    'Dropout_0':        hp.uniform('Dropout_0', 0.0, 0.5),
    'Dropout_1':        hp.uniform('Dropout_1', 0.0, 0.5),
    'Dropout_2':        hp.uniform('Dropout_2', 0.0, 0.5),
    'Dropout_3':        hp.uniform('Dropout_3', 0.0, 0.5),
    'Dropout_4':        hp.uniform('Dropout_4', 0.0, 0.5)
}

batch_size = 64

## CNN Model:

In [54]:
cnt = 0
def cnn_model(params):
    
    initializer = keras.initializers.glorot_uniform(seed=seed)
    
    model = Sequential() 
        
    model.add(Conv2D(32*2, (5,5), padding='same', activation='relu', kernel_initializer=initializer, input_shape=(28,28,1)))
    model.add(Conv2D(32*2, (5,5), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_0'], seed=seed))
    
    model.add(Conv2D(64*2, (3,3), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(Conv2D(64*2, (3,3), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_1'], seed=seed))
    
    model.add(Conv2D(128*2, (3,3), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(Conv2D(128*2, (3,3), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_2'], seed=seed))

    model.add(Flatten())
    model.add(Dense(512, activation="relu", kernel_initializer=initializer))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_3'], seed=seed))
    model.add(Dense(128, activation = "relu", kernel_initializer=initializer))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_4'], seed=seed))
    
    model.add(Dense(10, activation = "softmax", kernel_initializer=initializer))

    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, min_lr=1e-5,verbose=1, cooldown=1)
    early_stopping = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

    hist = model.fit(x_train, y_train,
                     batch_size=batch_size,
                     epochs=50,
                     verbose=1,
                     shuffle=True,
                     validation_data=(x_val, y_val),
                     callbacks=[reduce_lr, early_stopping])
    
    loss = hist.history['val_loss'][-1]
    acc = hist.history['val_acc'][-1]
    
    global cnt
    print(cnt, ': Val loss:', loss, ': Val acc:', acc, '\n\n')
    cnt += 1
    
    return {'loss': -acc, 'status': STATUS_OK, 'model': model, 'hist': hist}
    

## Search the Hyperparameters & the Best model:

In [55]:
trials = Trials()
best = fmin(fn=cnn_model, 
            space=params, 
            algo=tpe.suggest, 
            max_evals=20, # 50: 5h 15m
            trials=trials,
            verbose=1,
            rstate=np.random.RandomState(seed))

best

Train on 35700 samples, validate on 6300 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50

Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 12/50
Epoch 13/50
Epoch 14/50

Epoch 00014: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 15/50
Epoch 16/50

Epoch 00016: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 17/50
Epoch 00017: early stopping
0 : Val loss: 0.01591937809467449 : Val acc: 0.996031746031746 


Train on 35700 samples, validate on 6300 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 6/50
Epoch 7/50
Epoch 8/50

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch

Epoch 13/50
Epoch 14/50

Epoch 00014: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50

Epoch 00018: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 19/50
Epoch 20/50

Epoch 00020: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 21/50
Epoch 00021: early stopping
2 : Val loss: 0.015297429318057507 : Val acc: 0.996031746031746 


Train on 35700 samples, validate on 6300 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 5/50
Epoch 6/50
Epoch 7/50

Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 8/50
Epoch 9/50
Epoch 10/50

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 11/50
Epoch 12/50

Epoch 00012: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 13/50
Epoch 14/50
Epoch 15/50

Epoch 00015

Epoch 14/50
Epoch 15/50

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 16/50
Epoch 17/50

Epoch 00017: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 18/50
Epoch 00018: early stopping
7 : Val loss: 0.01814576182998569 : Val acc: 0.9953968253968254 


Train on 35700 samples, validate on 6300 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50

Epoch 00014: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 15/50
Epoch 16/50
Epoch 17/50

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 18/50
Epoch 19/50

Epoch 00019: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 20/50
Epoch 00020: early stopping
8 : Val loss: 0.017149052320210825 : Val acc: 0.99523

Epoch 10/50

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 16/50
Epoch 17/50
Epoch 18/50

Epoch 00018: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 19/50
Epoch 20/50

Epoch 00020: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 21/50
Epoch 00021: early stopping
10 : Val loss: 0.01578753410013474 : Val acc: 0.996031746031746 


Train on 35700 samples, validate on 6300 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50

Epoch 00013: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 14/50
Epoch 15/50

Epoch 00015: ReduceLROnPlateau reducing learning rate t

Epoch 17/50

Epoch 00017: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 18/50
Epoch 19/50
Epoch 20/50

Epoch 00020: ReduceLROnPlateau reducing learning rate to 1.5625000742147677e-05.
Epoch 21/50
Epoch 22/50

Epoch 00022: ReduceLROnPlateau reducing learning rate to 1e-05.
Epoch 23/50
Epoch 00023: early stopping
12 : Val loss: 0.014300214854625262 : Val acc: 0.996031746031746 


Train on 35700 samples, validate on 6300 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50

Epoch 00013: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 14/50
Epoch 15/50
Epoch 16/50

Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 17/50
Epoch 18/50

Epoch 00018: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 19/50
E

Epoch 24/50

Epoch 00024: ReduceLROnPlateau reducing learning rate to 1.5625000742147677e-05.
Epoch 25/50
Epoch 00025: early stopping
14 : Val loss: 0.01518195620736122 : Val acc: 0.9965079365079365 


Train on 35700 samples, validate on 6300 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 10/50
Epoch 11/50
Epoch 12/50

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 13/50
Epoch 14/50
Epoch 15/50

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50

Epoch 00019: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50

Epoch 00023: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 24/50
Epoch 25/50

Epoch 00025: ReduceLROnPlateau reducing learning rate

Epoch 16/50
Epoch 17/50
Epoch 18/50

Epoch 00018: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 19/50
Epoch 20/50

Epoch 00020: ReduceLROnPlateau reducing learning rate to 1.5625000742147677e-05.
Epoch 21/50
Epoch 00021: early stopping
19 : Val loss: 0.014482729101649649 : Val acc: 0.996031746031746 




{'Dropout_0': 0.29949679088221454,
 'Dropout_1': 0.3191838155900335,
 'Dropout_2': 0.20843884641241,
 'Dropout_3': 0.17705224022098692,
 'Dropout_4': 0.28130745896863774}

## The Best Hyperparameters:

In [56]:
space_eval(params, best)

{'Dropout_0': 0.29949679088221454,
 'Dropout_1': 0.3191838155900335,
 'Dropout_2': 0.20843884641241,
 'Dropout_3': 0.17705224022098692,
 'Dropout_4': 0.28130745896863774}

## The Best Result

In [57]:
trials.best_trial['result']

{'hist': <keras.callbacks.History at 0x7f9983136f98>,
 'loss': -0.9966666666666667,
 'model': <keras.engine.sequential.Sequential at 0x7f9984ed6160>,
 'status': 'ok'}

## The Best model:

In [58]:
best_model = trials.best_trial['result']['model']
#best_model.save('hyperopt_best_model.hdf5')

## Evaluation before Data Augumantation

In [59]:
score = best_model.evaluate(x_val, y_val, verbose=1)
score



[0.013626170573792593, 0.9966666666666667]

## Data Augmentation:

In [106]:
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(rotation_range=10,
                             width_shift_range=0.1,
                             height_shift_range=0.1,
                             zoom_range=0.1)

datagen.fit(x_train, augment=True, seed=seed)

reduce_lr_aug = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, min_lr=1e-6, verbose=1)
early_stop_aug = EarlyStopping(monitor='val_loss', patience=5, verbose=1)

hist_gen = best_model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), 
                                    epochs=50, 
                                    validation_data=(x_val, y_val),
                                    steps_per_epoch=x_train.shape[0]//batch_size,
                                    callbacks=[reduce_lr_aug, early_stop_aug],
                                    verbose=2)

loss_aug = hist_gen.history['val_loss'][-1]
acc_aug = hist_gen.history['val_acc'][-1]

print('Loss_aug: ', loss_aug, ', Acc_aug: ', acc_aug, '\n\n')

Epoch 1/50
 - 18s - loss: 0.0181 - acc: 0.9940 - val_loss: 0.0118 - val_acc: 0.9962
Epoch 2/50
 - 18s - loss: 0.0170 - acc: 0.9948 - val_loss: 0.0117 - val_acc: 0.9960
Epoch 3/50
 - 18s - loss: 0.0161 - acc: 0.9950 - val_loss: 0.0119 - val_acc: 0.9959

Epoch 00003: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-06.
Epoch 4/50
 - 18s - loss: 0.0168 - acc: 0.9944 - val_loss: 0.0118 - val_acc: 0.9963
Epoch 5/50
 - 18s - loss: 0.0156 - acc: 0.9950 - val_loss: 0.0118 - val_acc: 0.9963

Epoch 00005: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-06.
Epoch 6/50
 - 18s - loss: 0.0182 - acc: 0.9943 - val_loss: 0.0118 - val_acc: 0.9962
Epoch 7/50
 - 18s - loss: 0.0175 - acc: 0.9944 - val_loss: 0.0117 - val_acc: 0.9963

Epoch 00007: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-06.
Epoch 8/50
 - 18s - loss: 0.0163 - acc: 0.9949 - val_loss: 0.0116 - val_acc: 0.9963
Epoch 9/50
 - 18s - loss: 0.0176 - acc: 0.9948 - val_loss: 0.0119 - val_acc: 0.9960
E

## Evaluation after Data Augmentation:

In [107]:
score_aug = best_model.evaluate(x_val, y_val, verbose=1)
print('Before augmentation: ', score)
print('After  augmentation: ', score_aug)

Before augmentation:  [0.013626170573792593, 0.9966666666666667]
After  augmentation:  [0.011852738058484632, 0.9963492063492063]


## Prediction for the Submission:

In [108]:
test = pd.read_csv('../test.csv')
test_index = test.index
test = test.values.reshape(-1, 28, 28, 1).astype('float32') / 255.0

pred = best_model.predict(test, verbose=1)
result = pred.argmax(axis=1)



## Output the Submission csv file:

In [104]:
submission = pd.DataFrame({'ImageId': test_index+1, 'Label': result})
submission.to_csv('hyperopt_augment_submission.csv', index=False)

#### Accuracy estimation: comparison with the previous result scored at 0.99671

In [105]:
prev_cnn = pd.read_csv('hyperopt_augment_submission_99671.csv', index_col=0)
res = pd.read_csv('hyperopt_augment_submission.csv', index_col=0)
match_num = np.sum(prev_cnn.Label.values == res.Label.values)
acc = match_num / len(res) #* 0.99671
print('Approx. accuracy: {0:.5f}'.format(acc))

Approx. accuracy: 1.00000
