# Machine Learning Engineer Nanodegree

## Capstone project: Exoplanet search (from Kaggle)

Student Name: Graciano Patino

Kaggle reference: https://www.kaggle.com/keplersmachines/kepler-labelled-time-series-data?/

The mission as stated in the Github (https://github.com/winterdelta/KeplerAI) is to build a classification algorithm for identifying if a particular time series input includes an exoplanet or not. It also mentions that a number of methods were tested: 1-D CNN in Torch7, XGBoost in R and PCA in Python. However, none of these methods provided strong results according to the kaggle and Github references. 

For this project, I would evaluate deep learning algorithms. Per paper in the paragraph (below), these algorithms appear to provide better results compared to the ones already tried as mentioned above.

    1)	Initially I would evaluate 1-D CNN using Keras instead of Torch7. 
    2)	Based on reference paper, I would try adding different number of layers and filters in combination with other CNN parameters. Details would be included in project report.
    3)	The output of the CNNs would be the input to one or more dense layers.
    4)	Performance of each model to be measured as per evaluation metrics section.
    5)	Per kaggle source the test set is confirmed to have 5 exoplanets. This will also be useful on checking performance of algorithms. If an algorithm is unable to identify exoplanets on then testing set, then model might not be good. 

Please that the list above of models considered is not meant to be exhaustive for all possible scenarios in deep learning algorithms. It might be the case that other deep learning algorithms might be considered later should the ones proposed (above) fail in identifying any exoplanet as expected.

Reference paper: IDENTIFYING EXOPLANETS WITH DEEP LEARNING: A FIVE PLANET RESONANT CHAIN AROUND KEPLER-80 AND AN EIGHTH PLANET AROUND KEPLER-90 
(https://www.cfa.harvard.edu/~avanderb/kepler90i.pdf)



#### Note

In this Jupyter Notebook: Applying Grid Search to identify parameters for network.


In [1]:
# Import libraries necessary for this project
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import itertools
from IPython.display import display # Allows the use of display() for DataFrames
from get_results import plot_roc_auc, confusion_matrix_com

# Pretty display for notebooks
%matplotlib inline

# Some Sklearn libraries are required
from sklearn.metrics import roc_auc_score
from sklearn.metrics import precision_recall_curve
from sklearn.utils.fixes import signature
from sklearn.metrics import average_precision_score

In [2]:
# Load the training data from Exoplanet dataset
train_data = pd.read_csv('kepler/exoTrain.csv')

In [3]:
# Find dimensions of the train data
train_data.shape

(5087, 3198)

In [4]:
# Testing  if train_data has any null fields
testing = pd.isnull(train_data)
testing *= 1
testing2 = testing.sum()
testing2.sum() # If results is zero, then there a no fields with "null" value

0

In [5]:
# Getting X_train and y_train
# Using iloc to select data using position instead of label and converting to numpy array using values
X_train = train_data.iloc[:,1:].values 
y_train = train_data.iloc[:,0:1].values 

In [6]:
# Find dimensions of the X_train data
X_train.shape

(5087, 3197)

In [7]:
# Find dimensions of the labels (y_train) data
y_train.shape

(5087, 1)

In [8]:
# y_train: Label is 2 for exoplanet and 1 for non-exoplanet
y_train[:5]

array([[2],
       [2],
       [2],
       [2],
       [2]])

In [9]:
y_train -= 1 # Changing labels to: 1 for exoplanet and 0 for non-exoplanet

In [10]:
y_train[:5]

array([[1],
       [1],
       [1],
       [1],
       [1]])

In [11]:
# Load the testing data from Exoplanet dataset
test_data = pd.read_csv('kepler/exoTest.csv')

In [12]:
# Find dimensions of the test data
test_data.shape

(570, 3198)

In [13]:
# Testing  if test_data has any "null" fields
testing = pd.isnull(test_data)
testing *= 1
testing2 = testing.sum()
testing2.sum() # If results is zero, then there a no fields with "null" value

0

In [14]:
# Getting X_test and y_test
# Using iloc to select data using position instead of label and converting to numpy array (using values)
X_test = test_data.iloc[:,1:].values
y_test = test_data.iloc[:,0:1].values

In [15]:
# Find dimensions of the X_test data
X_test.shape

(570, 3197)

In [16]:
# Find dimensions of the labels (y_test) data
y_test.shape

(570, 1)

In [17]:
# y_test: Label is 2 for exoplanet and 1 for non-exoplanet
y_test[:6]

array([[2],
       [2],
       [2],
       [2],
       [2],
       [1]])

In [18]:
y_test -= 1 # Changing labels to: 1 for exoplanet and 0 for non-exoplanet

In [19]:
y_test[:6]

array([[1],
       [1],
       [1],
       [1],
       [1],
       [0]])

In [20]:
# Normalizing the data since it is not normalized according to Kaggle/Github
from sklearn.preprocessing import StandardScaler

In [21]:
# Checking X_train data
X_train

array([[  93.85,   83.81,   20.1 , ...,   61.42,    5.08,  -39.54],
       [ -38.88,  -33.83,  -58.54, ...,    6.46,   16.  ,   19.93],
       [ 532.64,  535.92,  513.73, ...,  -28.91,  -70.02,  -96.67],
       ..., 
       [ 273.39,  278.  ,  261.73, ...,   88.42,   79.07,   79.43],
       [   3.82,    2.09,   -3.29, ...,  -14.55,   -6.41,   -2.55],
       [ 323.28,  306.36,  293.16, ...,  -16.72,  -14.09,   27.82]])

In [22]:
# Tranposing X_train before applying scaling such that mean is zero and variance is one
X_train.transpose()

array([[  93.85,  -38.88,  532.64, ...,  273.39,    3.82,  323.28],
       [  83.81,  -33.83,  535.92, ...,  278.  ,    2.09,  306.36],
       [  20.1 ,  -58.54,  513.73, ...,  261.73,   -3.29,  293.16],
       ..., 
       [  61.42,    6.46,  -28.91, ...,   88.42,  -14.55,  -16.72],
       [   5.08,   16.  ,  -70.02, ...,   79.07,   -6.41,  -14.09],
       [ -39.54,   19.93,  -96.67, ...,   79.43,   -2.55,   27.82]])

In [23]:
# # Checking X_train data
X_test

array([[  1.19880000e+02,   1.00210000e+02,   8.64600000e+01, ...,
          3.57800000e+01,   2.69430000e+02,   5.77200000e+01],
       [  5.73659000e+03,   5.69998000e+03,   5.71716000e+03, ...,
         -2.36619000e+03,  -2.29486000e+03,  -2.03472000e+03],
       [  8.44480000e+02,   8.17490000e+02,   7.70070000e+02, ...,
         -1.62680000e+02,  -3.67900000e+01,   3.06300000e+01],
       ..., 
       [ -5.40100000e+01,  -4.41300000e+01,  -4.12300000e+01, ...,
          5.47000000e+00,   1.44600000e+01,   1.87000000e+01],
       [  9.13600000e+01,   8.56000000e+01,   4.88100000e+01, ...,
         -8.43000000e+00,  -6.48000000e+00,   1.76000000e+01],
       [  3.07119000e+03,   2.78253000e+03,   2.60869000e+03, ...,
         -2.77220000e+02,  -6.96300000e+01,   1.21560000e+02]])

In [24]:
# Tranposing X_test before applying scaling such that mean is zero and variance is one
X_test.transpose()

array([[  1.19880000e+02,   5.73659000e+03,   8.44480000e+02, ...,
         -5.40100000e+01,   9.13600000e+01,   3.07119000e+03],
       [  1.00210000e+02,   5.69998000e+03,   8.17490000e+02, ...,
         -4.41300000e+01,   8.56000000e+01,   2.78253000e+03],
       [  8.64600000e+01,   5.71716000e+03,   7.70070000e+02, ...,
         -4.12300000e+01,   4.88100000e+01,   2.60869000e+03],
       ..., 
       [  3.57800000e+01,  -2.36619000e+03,  -1.62680000e+02, ...,
          5.47000000e+00,  -8.43000000e+00,  -2.77220000e+02],
       [  2.69430000e+02,  -2.29486000e+03,  -3.67900000e+01, ...,
          1.44600000e+01,  -6.48000000e+00,  -6.96300000e+01],
       [  5.77200000e+01,  -2.03472000e+03,   3.06300000e+01, ...,
          1.87000000e+01,   1.76000000e+01,   1.21560000e+02]])

In [25]:
# Standardize features by removing the mean and scaling to unit variance (sklearn.preprocessing)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train) # Output in a numpy.ndarray
X_test = scaler.fit_transform(X_test) # Output in a numpy.ndarray

In [26]:
X_train.transpose() # Transpose back to original dimensions

array([[-0.00235557, -0.00852774,  0.01804893, ...,  0.00599336,
        -0.00654212,  0.00831333],
       [-0.00205404, -0.0074516 ,  0.01868969, ...,  0.00685579,
        -0.00580352,  0.00815701],
       [-0.00579778, -0.00938685,  0.01673115, ...,  0.00523005,
        -0.00686528,  0.00666449],
       ..., 
       [ 0.0341983 ,  0.03109682,  0.02910084, ...,  0.03572195,
         0.0299112 ,  0.02978874],
       [ 0.02736753,  0.02803863,  0.02275218, ...,  0.03191466,
         0.0266614 ,  0.02618942],
       [ 0.01805157,  0.02216476,  0.01410023, ...,  0.02628002,
         0.02060995,  0.02271046]])

In [27]:
X_test.transpose() # Transpose back to original dimensions

array([[-0.03143654,  0.41497541,  0.02615413, ..., -0.04525719,
        -0.03370329,  0.20313139],
       [-0.05057432,  0.39343385,  0.00629912, ..., -0.0620191 ,
        -0.05173276,  0.16210798],
       [-0.03559448,  0.41363759,  0.01894572, ..., -0.04578193,
        -0.0385983 ,  0.16563567],
       ..., 
       [-0.00907555, -0.2624662 , -0.03001166, ..., -0.01227304,
        -0.01373939, -0.04209481],
       [ 0.00470934, -0.26591026, -0.02760726, ..., -0.02219864,
        -0.02440852, -0.03107299],
       [-0.00786554, -0.22375419, -0.01066057, ..., -0.01189145,
        -0.01200494, -0.00127881]])

In [28]:
# Fix random seed for reproducibility
seed = 10
np.random.seed(seed)

In [29]:
# Importing Keras libraries

from keras.models import Sequential, Model
from keras.layers import Conv1D, MaxPool1D, Dense, Dropout, Flatten
from keras.layers import BatchNormalization, Input, concatenate, Activation
from keras.optimizers import Adam, SGD, RMSprop, Adagrad, Adadelta, Adamax, Nadam
from keras.callbacks import ModelCheckpoint 

Using TensorFlow backend.


In [30]:
#Convert data into 3d tensor (Input 0 in Conv1D is incompatible with layer conv1d_1: expected ndim=3, found ndim=2)
X_train = np.reshape(X_train,(X_train.shape[0],X_train.shape[1],1))
X_test = np.reshape(X_test,(X_test.shape[0],X_test.shape[1],1))

In [31]:
# Checking shape of X_test tensor
X_test.shape

(570, 3197, 1)

In [32]:
# Checking shape of X_train_new tensor
X_train.shape

(5087, 3197, 1)

#### Data exploration and preparation ended (above)

# Ending Data Preparation

# GRID SEARCH 

## Grid Search Implementation

Reference on using GridSearch for tuning Hyperparameters for DL models:

https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/

# GRID SEARCH 8cnn, 2dnn

In [33]:
# Selecting a subset of the training set
X_GS1000 = X_train[0:1000,:]
y_GS1000 = y_train[0:1000,:]

In [34]:
# Use scikit-learn to grid search the batch size and epochs
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

# Function to create model, required for KerasClassifier
def create_model_drop(dropout_rate=0.0):
    # create model
    model = Sequential()
    # Defining network architecture
    model.add(Conv1D(filters=12, kernel_size=6, activation='relu', input_shape=(3197,1)))
    model.add(Conv1D(filters=12, kernel_size=6, activation='relu'))
    model.add(MaxPool1D(strides=4))
    model.add(Dropout(0.3))
    model.add(BatchNormalization())
    model.add(Conv1D(filters=24, kernel_size=6, activation='relu'))
    model.add(Conv1D(filters=24, kernel_size=6, activation='relu'))
    model.add(MaxPool1D(strides=4))
    model.add(Dropout(0.3))
    model.add(BatchNormalization())
    model.add(Conv1D(filters=48, kernel_size=6, activation='relu'))
    model.add(Conv1D(filters=48, kernel_size=6, activation='relu'))
    model.add(MaxPool1D(strides=4))
    model.add(Dropout(0.3))
    model.add(BatchNormalization())
    model.add(Conv1D(filters=96, kernel_size=6, activation='relu'))
    model.add(Conv1D(filters=96, kernel_size=6, activation='relu'))
    model.add(MaxPool1D(strides=4))
    model.add(Dropout(0.3))
    model.add(BatchNormalization())
    model.add(Flatten())
    model.add(Dense(96, activation='relu'))
    model.add(Dropout(0.3))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(optimizer=Nadam(1e-5), loss="binary_crossentropy", metrics=["accuracy"])
    return model

# Fix random seed for reproducibility (done above on previous cells)
# Dataset was created in previous section


In [35]:
# Create model
model = KerasClassifier(build_fn=create_model_drop, epochs=30, batch_size=32, verbose=2)
# define the grid search parameters
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
param_grid = dict(dropout_rate=dropout_rate)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X_GS1000, y_GS1000)


Epoch 1/30
Epoch 1/30
Epoch 1/30
Epoch 1/30
14s - loss: 0.6890 - acc: 0.8829
14s - loss: 0.6891 - acc: 0.8081
Epoch 2/30
14s - loss: 0.6889 - acc: 0.8126
Epoch 2/30
Epoch 2/30
14s - loss: 0.6890 - acc: 0.8829
Epoch 2/30
14s - loss: 0.6776 - acc: 0.9565
Epoch 3/30
14s - loss: 0.6837 - acc: 0.9055
Epoch 3/30
14s - loss: 0.6814 - acc: 0.9100
Epoch 3/30
14s - loss: 0.6777 - acc: 0.9580
Epoch 3/30
14s - loss: 0.6649 - acc: 0.9805
Epoch 4/30
14s - loss: 0.6762 - acc: 0.9265
Epoch 4/30
14s - loss: 0.6649 - acc: 0.9805
Epoch 4/30
14s - loss: 0.6726 - acc: 0.9250
Epoch 4/30
14s - loss: 0.6384 - acc: 0.9910
Epoch 5/30
14s - loss: 0.6560 - acc: 0.9355
Epoch 5/30
14s - loss: 0.6384 - acc: 0.9910
Epoch 5/30
14s - loss: 0.6536 - acc: 0.9340
Epoch 5/30
14s - loss: 0.6025 - acc: 0.9895
Epoch 6/30
14s - loss: 0.6168 - acc: 0.9400
Epoch 6/30
14s - loss: 0.6025 - acc: 0.9895
Epoch 6/30
14s - loss: 0.6312 - acc: 0.9295
Epoch 6/30
14s - loss: 0.5246 - acc: 0.9940
Epoch 7/30
14s - loss: 0.5461 - acc: 0.9265

Epoch 17/30
14s - loss: 0.2534 - acc: 0.9355
Epoch 17/30
14s - loss: 0.0564 - acc: 0.9985
Epoch 18/30
14s - loss: 0.2248 - acc: 0.9430
Epoch 18/30
14s - loss: 0.2696 - acc: 0.9340
Epoch 18/30
14s - loss: 0.2259 - acc: 0.9400
Epoch 18/30
14s - loss: 0.0425 - acc: 0.9970
Epoch 19/30
14s - loss: 0.2217 - acc: 0.9430
Epoch 19/30
14s - loss: 0.2459 - acc: 0.9400
Epoch 19/30
14s - loss: 0.2520 - acc: 0.9415
Epoch 19/30
14s - loss: 0.0356 - acc: 0.9985
Epoch 20/30
14s - loss: 0.2235 - acc: 0.9400
Epoch 20/30
14s - loss: 0.2368 - acc: 0.9400
Epoch 20/30
14s - loss: 0.2482 - acc: 0.9415
Epoch 20/30
14s - loss: 0.0297 - acc: 0.9985
Epoch 21/30
14s - loss: 0.2170 - acc: 0.9400
Epoch 21/30
14s - loss: 0.2378 - acc: 0.9385
Epoch 21/30
14s - loss: 0.2284 - acc: 0.9400
Epoch 21/30
14s - loss: 0.0209 - acc: 0.9970
Epoch 22/30
14s - loss: 0.2439 - acc: 0.9400
Epoch 22/30
14s - loss: 0.2178 - acc: 0.9445
Epoch 22/30
14s - loss: 0.2324 - acc: 0.9400
Epoch 22/30
14s - loss: 0.0203 - acc: 0.9985
Epoch 23/3

17s - loss: 0.6984 - acc: 0.2402
Epoch 2/30
14s - loss: 0.6848 - acc: 0.9505
Epoch 4/30
14s - loss: 0.6699 - acc: 0.9235
Epoch 4/30
14s - loss: 0.6878 - acc: 0.8981
Epoch 4/30
14s - loss: 0.6926 - acc: 0.7192
Epoch 3/30
14s - loss: 0.6796 - acc: 0.9850
Epoch 5/30
14s - loss: 0.6583 - acc: 0.9355
Epoch 5/30
14s - loss: 0.6803 - acc: 0.9190
Epoch 5/30
14s - loss: 0.6847 - acc: 0.9505
Epoch 4/30
14s - loss: 0.6744 - acc: 0.9910
Epoch 6/30
14s - loss: 0.6317 - acc: 0.9295
Epoch 6/30
14s - loss: 0.6764 - acc: 0.9130
Epoch 6/30
14s - loss: 0.6795 - acc: 0.9850
Epoch 5/30
14s - loss: 0.6621 - acc: 0.9970
Epoch 7/30
14s - loss: 0.5879 - acc: 0.9415
Epoch 7/30
14s - loss: 0.6668 - acc: 0.9175
Epoch 7/30
14s - loss: 0.6744 - acc: 0.9910
Epoch 6/30
14s - loss: 0.6443 - acc: 0.9880
Epoch 8/30
14s - loss: 0.5023 - acc: 0.9340
Epoch 8/30
14s - loss: 0.6574 - acc: 0.9205
Epoch 8/30
14s - loss: 0.6621 - acc: 0.9955
Epoch 7/30
14s - loss: 0.6213 - acc: 0.9910
Epoch 9/30
14s - loss: 0.4561 - acc: 0.9280

Epoch 19/30
14s - loss: 0.2803 - acc: 0.9340
Epoch 20/30
14s - loss: 0.2295 - acc: 0.9415
Epoch 20/30
14s - loss: 0.2406 - acc: 0.9400
Epoch 18/30
14s - loss: 0.0407 - acc: 0.9955
Epoch 20/30
14s - loss: 0.2887 - acc: 0.9370
Epoch 21/30
14s - loss: 0.2206 - acc: 0.9415
Epoch 21/30
14s - loss: 0.2794 - acc: 0.9325
Epoch 19/30
14s - loss: 0.0241 - acc: 0.9985
Epoch 21/30
14s - loss: 0.2668 - acc: 0.9370
Epoch 22/30
14s - loss: 0.2284 - acc: 0.9400
Epoch 22/30
14s - loss: 0.0278 - acc: 0.9955
Epoch 22/30
14s - loss: 0.2820 - acc: 0.9325
Epoch 20/30
14s - loss: 0.2754 - acc: 0.9340
Epoch 23/30
14s - loss: 0.2198 - acc: 0.9430
Epoch 23/30
14s - loss: 0.0203 - acc: 0.9985
Epoch 23/30
14s - loss: 0.2925 - acc: 0.9340
Epoch 21/30
14s - loss: 0.2513 - acc: 0.9385
Epoch 24/30
14s - loss: 0.2329 - acc: 0.9430
Epoch 24/30
14s - loss: 0.0200 - acc: 0.9970
Epoch 24/30
14s - loss: 0.2682 - acc: 0.9370
Epoch 22/30
14s - loss: 0.2534 - acc: 0.9340
Epoch 25/30
14s - loss: 0.2167 - acc: 0.9430
Epoch 25/3

14s - loss: 0.6553 - acc: 0.9835
Epoch 6/30
14s - loss: 0.5882 - acc: 0.9325
Epoch 6/30
14s - loss: 0.6873 - acc: 0.7628
Epoch 3/30
14s - loss: 0.6145 - acc: 0.9340
Epoch 6/30
14s - loss: 0.6252 - acc: 0.9940
Epoch 7/30
14s - loss: 0.5307 - acc: 0.9400
Epoch 7/30
14s - loss: 0.6829 - acc: 0.9489
Epoch 4/30
14s - loss: 0.5263 - acc: 0.9340
Epoch 7/30
14s - loss: 0.5620 - acc: 0.9895
Epoch 8/30
14s - loss: 0.4826 - acc: 0.9370
Epoch 8/30
14s - loss: 0.6712 - acc: 0.9880
Epoch 5/30
14s - loss: 0.3851 - acc: 0.9385
Epoch 8/30
14s - loss: 0.5007 - acc: 0.9865
Epoch 9/30
14s - loss: 0.4385 - acc: 0.9325
Epoch 9/30
14s - loss: 0.6555 - acc: 0.9835
Epoch 6/30
14s - loss: 0.2925 - acc: 0.9370
Epoch 9/30
14s - loss: 0.4334 - acc: 0.9760
Epoch 10/30
14s - loss: 0.3619 - acc: 0.9385
Epoch 10/30
14s - loss: 0.6259 - acc: 0.9940
Epoch 7/30
14s - loss: 0.2507 - acc: 0.9355
Epoch 10/30
14s - loss: 0.4004 - acc: 0.9865
Epoch 11/30
14s - loss: 0.3607 - acc: 0.9385
Epoch 11/30
14s - loss: 0.5631 - acc: 0

6s - loss: 0.1645 - acc: 0.9590
Epoch 21/30
6s - loss: 0.1746 - acc: 0.9600
Epoch 22/30
6s - loss: 0.1739 - acc: 0.9620
Epoch 23/30
6s - loss: 0.1713 - acc: 0.9620
Epoch 24/30
6s - loss: 0.1735 - acc: 0.9620
Epoch 25/30
6s - loss: 0.1589 - acc: 0.9620
Epoch 26/30
6s - loss: 0.1773 - acc: 0.9620
Epoch 27/30
6s - loss: 0.1732 - acc: 0.9610
Epoch 28/30
6s - loss: 0.1635 - acc: 0.9620
Epoch 29/30
6s - loss: 0.1585 - acc: 0.9610
Epoch 30/30
6s - loss: 0.1628 - acc: 0.9610


In [36]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.963000 using {'dropout_rate': 0.2}
0.962000 (0.051554) with: {'dropout_rate': 0.0}
0.961000 (0.050890) with: {'dropout_rate': 0.1}
0.963000 (0.052248) with: {'dropout_rate': 0.2}
0.963000 (0.052248) with: {'dropout_rate': 0.3}
0.963000 (0.052248) with: {'dropout_rate': 0.4}
0.962000 (0.051554) with: {'dropout_rate': 0.5}
0.962000 (0.051554) with: {'dropout_rate': 0.6}
0.962000 (0.051554) with: {'dropout_rate': 0.7}
0.963000 (0.052248) with: {'dropout_rate': 0.8}
0.963000 (0.052248) with: {'dropout_rate': 0.9}


### Run 1: Best: 0.963000 using {'dropout_rate': 0.3} (3 cnn, 2 dnn)
    0.960000 (0.050258) with: {'dropout_rate': 0.0}
    0.961000 (0.050890) with: {'dropout_rate': 0.1}
    0.962000 (0.051554) with: {'dropout_rate': 0.2}
    0.963000 (0.052248) with: {'dropout_rate': 0.3}
    0.962000 (0.051554) with: {'dropout_rate': 0.4}
    0.640000 (0.433109) with: {'dropout_rate': 0.5}
    0.959000 (0.049476) with: {'dropout_rate': 0.6}
    0.961000 (0.048726) with: {'dropout_rate': 0.7}
    0.959000 (0.049476) with: {'dropout_rate': 0.8}
    0.962000 (0.051554) with: {'dropout_rate': 0.9}

### Run 2: Best: 0.963000 using {'dropout_rate': 0.2} (8 cnn, 2 dnn)
    0.962000 (0.051554) with: {'dropout_rate': 0.0}
    0.961000 (0.050890) with: {'dropout_rate': 0.1}
    0.963000 (0.052248) with: {'dropout_rate': 0.2}
    0.963000 (0.052248) with: {'dropout_rate': 0.3}
    0.963000 (0.052248) with: {'dropout_rate': 0.4}
    0.962000 (0.051554) with: {'dropout_rate': 0.5}
    0.962000 (0.051554) with: {'dropout_rate': 0.6}
    0.962000 (0.051554) with: {'dropout_rate': 0.7}
    0.963000 (0.052248) with: {'dropout_rate': 0.8}
    0.963000 (0.052248) with: {'dropout_rate': 0.9}