## Kaggel: Digit Recognizer(MNIST)  by Hyperopt  
Kaggle Digit recognizer: https://www.kaggle.com/c/digit-recognizer  
Hyperopt: https://github.com/hyperopt/hyperopt  

### Score:
* max_evals= 10, score: 0.99128 ( 25 mins: NVIDIA GTX1060)
* max_evals= 20, score: 0.99257 ( 49 mins: NVIDIA GTX1060)
* max_evals=100, score: 0.99185 (372 mins: NVIDIA GTX1060)

In [40]:
import warnings
warnings.filterwarnings('ignore')

from hyperopt import hp, fmin, rand, tpe, Trials, space_eval, STATUS_OK

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Reshape, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from keras.utils import np_utils
import keras

from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

# fix random seed
import tensorflow as tf
import random as rn
import os
os.environ['PYTHONHASHSEED'] = '0'
seed = 123
rn.seed(seed)
np.random.seed(seed)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
tf.set_random_seed(seed)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

## Data preparation: MNIST from Kaggle

In [41]:
train = pd.read_csv('../train.csv')
label = train.label
train = train.drop(['label'], axis=1)

X_train, X_test, Y_train, Y_test = train_test_split(train, label, test_size=0.1, random_state=seed)
X_train = X_train.values.astype('float32') / 255.0
X_test = X_test.values.astype('float32') / 255.0

nb_classes = 10 
Y_train = np_utils.to_categorical(Y_train, nb_classes)
Y_test = np_utils.to_categorical(Y_test, nb_classes)

## Hyperparameters:

In [42]:
params = {
    'Dropout_0':        hp.uniform('Dropout_0', 0.0, 0.5),
    'Dropout_1':        hp.uniform('Dropout_1', 0.0, 0.5),
    'Dropout_2':        hp.uniform('Dropout_2', 0.0, 0.5),
    'Dropout_3':        hp.uniform('Dropout_3', 0.0, 0.5),
    'Dense_0':          hp.choice('Dense_0', [128, 256, 512]),
    'Dense_1':          hp.choice('Dense_1', [64, 128, 256])
    #'validation_split': hp.uniform('validation_split', 0.1, 0.3)
}

## CNN Model:

In [43]:
cnt = 1
def cnn_model(params):
    
    initializer = keras.initializers.glorot_uniform(seed=seed)
    
    model = Sequential() 
    model.add(Reshape((28,28,1), input_shape=(784,)))
    
    model.add(Conv2D(32, (5,5), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(Conv2D(32, (5,5), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(MaxPool2D(pool_size=(2,2)))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_0'], seed=seed))
    
    model.add(Conv2D(64, (3,3), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(Conv2D(64, (3,3), padding='same', activation='relu', kernel_initializer=initializer))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_1'], seed=seed))

    model.add(Flatten())
    model.add(Dense(params['Dense_0'], activation="relu", kernel_initializer=initializer))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_2'], seed=seed))
    model.add(Dense(params['Dense_1'], activation = "relu", kernel_initializer=initializer))
    model.add(BatchNormalization())
    model.add(Dropout(params['Dropout_3'], seed=seed))
    
    model.add(Dense(10, activation = "softmax", kernel_initializer=initializer))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
    
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, min_lr=1e-5,verbose=1)
    early_stopping = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

    hist = model.fit(X_train, Y_train,
                     batch_size=32,
                     epochs=50,
                     verbose=1,
                     #validation_split=params['validation_split'],
                     validation_data=(X_test, Y_test),
                     callbacks=[reduce_lr, early_stopping])
    
    #score, acc = model.evaluate(X_test, Y_test, batch_size=params['batch_size'] , verbose=0)
    loss = hist.history['val_loss'][-1]
    acc = hist.history['val_acc'][-1]
    
    global cnt
    print(cnt, ': Val_loss:', loss, ', Val_acc:', acc, '\n\n')
    cnt += 1
    
    return {'loss': -acc, 'status': STATUS_OK, 'model': model}
    

## Search the Best model:

In [44]:
trials = Trials()
best = fmin(fn=cnn_model, 
            space=params, 
            algo=tpe.suggest, 
            max_evals=10, 
            trials=trials,
            verbose=1,
            rstate=np.random.RandomState(seed))

best

Train on 37800 samples, validate on 4200 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50

Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 17/50
Epoch 18/50

Epoch 00018: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 19/50
Epoch 00019: early stopping
1 : Val_loss: 0.017391850375036787 , Val_acc: 0.9952380952380953 


Train on 37800 samples, validate on 4200 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50



Epoch 00014: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 15/50
Epoch 16/50
Epoch 17/50

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 18/50
Epoch 19/50

Epoch 00019: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 20/50
Epoch 00020: early stopping
3 : Val_loss: 0.018098909668094295 , Val_acc: 0.9966666666666667 


Train on 37800 samples, validate on 4200 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 11/50
Epoch 12/50
Epoch 13/50

Epoch 00013: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50

Epoch 00017: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 18/50
Epoch 19/50

Epoch 23/50

Epoch 00023: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50

Epoch 00027: ReduceLROnPlateau reducing learning rate to 1.5625000742147677e-05.
Epoch 28/50
Epoch 29/50

Epoch 00029: ReduceLROnPlateau reducing learning rate to 1e-05.
Epoch 30/50
Epoch 00030: early stopping
5 : Val_loss: 0.01559568798201889 , Val_acc: 0.9964285714285714 


Train on 37800 samples, validate on 4200 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50

Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 16/50
Epoch 17/50

Epoch 00017: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 18/50
Ep

Epoch 19/50
Epoch 20/50

Epoch 00020: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 21/50
Epoch 22/50

Epoch 00022: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 23/50
Epoch 00023: early stopping
7 : Val_loss: 0.01604700237555551 , Val_acc: 0.9954761904761905 


Train on 37800 samples, validate on 4200 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50

Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 17/50
Epoch 18/50
Epoch 19/50

Epoch 00019: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 20/50
Epoch 21/50

Epoch 00021: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 22/50
Epoch 00022: early stopping
8 : Val_loss: 0.01561672602811346 , Val_acc: 0.9961904761904762 


Train on 37800 samples, validate on 4200 s

Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50

Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50

Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 16/50
Epoch 17/50
Epoch 18/50

Epoch 00018: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 19/50
Epoch 20/50

Epoch 00020: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 21/50
Epoch 00021: early stopping
10 : Val_loss: 0.01501453281791501 , Val_acc: 0.9961904761904762 




{'Dense_0': 2,
 'Dense_1': 2,
 'Dropout_0': 0.01183659359352257,
 'Dropout_1': 0.41822117909744994,
 'Dropout_2': 0.38154401500871815,
 'Dropout_3': 0.4962590536997257}

## The Best Hyperparameters:

In [45]:
space_eval(params, best)

{'Dense_0': 512,
 'Dense_1': 256,
 'Dropout_0': 0.01183659359352257,
 'Dropout_1': 0.41822117909744994,
 'Dropout_2': 0.38154401500871815,
 'Dropout_3': 0.4962590536997257}

## The Best Result

In [46]:
trials.best_trial['result']

{'loss': -0.9966666666666667,
 'model': <keras.engine.sequential.Sequential at 0x7f754ccb0f60>,
 'status': 'ok'}

## The Best model:

In [47]:
best_model = trials.best_trial['result']['model']
best_model

<keras.engine.sequential.Sequential at 0x7f754ccb0f60>

## Prediction for Submission:

In [48]:
test = pd.read_csv('../test.csv')
test_index = test.index
test = test.values.astype('float32') / 255.0

pred = best_model.predict(test)
result = pred.argmax(axis=1)

## Submission csv file output:

In [49]:
submission = pd.DataFrame({'ImageId': test_index+1, 'Label': result})
submission.to_csv('hyperopt_submission.csv', index=False)

#### Accuracy estimation:

In [50]:
prev_cnn = pd.read_csv('../cnn_submission.csv', index_col=0)
res = pd.read_csv('hyperopt_submission.csv', index_col=0)
diff_num = np.sum(prev_cnn.Label.values != res.Label.values)
acc = (len(res) - diff_num) / len(res) * 0.99852
print('Approx. accuracy: {0:.5f}'.format(acc))

Approx. accuracy: 0.99481


Approx. accuracy: 0.99364  
Approx. accuracy: 0.99368  
Approx. accuracy: 0.99442 : validation_data(X_test, Y_test) split:0.2
Approx. accuracy: 0.99481 : validation_data(X_test, Y_test) split:0.1, score:0.99528

In [52]:
#best_model.save('hyperopt_model_99528.hdf5')