_MNIST + KerasClassifier (sklearn wrapper) + Pipeline + Gridsearch_
---

Below is a procedure for building a neural network to recognize handwritten digits.  The data is from Kaggle, and you will submit your results to Kaggle to test how well you did!

1. Load the training data (`train.csv`) from Kaggle
2. Setup X and y (feature matrix and target vector)
3. Split X and y into train and test subsets.
4. Preprocess your data

   - When dealing with image data, you need to normalize your `X` by dividing each value by the max value of a pixel (255).
   - Since this is a multiclass classification problem, keras needs `y` to be a one-hot encoded matrix
   
5. Create your network.

   - Remember that for multi-class classification you need a softamx activation function on the output layer.
   - You may want to consider using regularization or dropout to improve performance.
   
6. Trian your network.
7. If you are unhappy with your model performance, try to tighten up your model by adding hidden layers, adding hidden layer units, chaning the activation functions on the hidden layers, etc.
8. Load in Kaggle's `test.csv`
9. Create your predictions (these should be numbers in the range 0-9).
10. Save your predictions and submit them to Kaggle.

---

For this lab, you should complete the above sequence of steps for _at least_ two of the three "configurations":

1. Using a `tensorflow` network
2. Using a `keras` "sequential" network
3. Using a `keras` convolutional network
4. Using a `tensorflow` convolutional network (we did _not_ cover this in class!)

In [1]:
!ls ./datasets

sample_submission.csv  test.csv  train.csv


In [2]:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.optimizers import Adam
from keras.wrappers.scikit_learn import KerasClassifier
from keras.layers import Convolution2D, MaxPooling2D, Dropout, Dense, Flatten
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV

Using TensorFlow backend.


In [3]:
train = pd.read_csv('./datasets/train.csv')
# test = pd.read_csv('./datasets/test.csv') # this is for validation, not used in this notebook

# Train Test Split

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    train.drop(labels='label', axis=1), train['label'], test_size=0.33, random_state=42, stratify=train['label'])

In [5]:
# OHE the y_train data, visualized in a dataframe for convenience
# Note that the y target is actually 10 columns, not one
pd.DataFrame(OneHotEncoder().fit_transform(y_train.values.reshape(-1,1)).todense()).head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [6]:
# massaged y_train OHE'd numpy array.
# This is what we want our y_train to 
# look like once it hits our model.
# sklearn doesn't support pipeline
# operations on y so we have to do this
# when we feed it into the pipe, not IN the pipe
np.array(OneHotEncoder().fit_transform(y_train.values.reshape(-1,1)).todense())

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.]])

In [33]:
# massaged X_train numpy array, we will
# do these operations in our pipe
np.array(X_train)

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

# Create Function Transformers (transforms data in your pipe)

In [8]:
def make_np_array(X):
    '''
    Converts the input df to a numpy array
    returns said array
    '''
    return np.array(X)

In [9]:
def change_shape(X, shape='2d'):
    '''
    Converts from a 1D row of 784 samples
    to a 28x28 2d matrix
    '''
    if shape == '1d':
        return X
    elif shape == '2d':
        return X.reshape(X.shape[0], 28, 28, 1).astype('float32')

In [10]:
# Show the output shape after a 2D transformation.
# Takes a row of 784 pixels and 'wraps' them to 
# a 28x28 2D matrix
make_np_array(X_train).shape

(28140, 784)

# Make the model / network arch

## 2D Conv with input parameters

In [11]:
def instantiate_convolution_model(params=None, output='softmax'):
    '''
    puts together a convolution model with the following structure:
    1. 2D convolution
    2. 2D MAX pooling
    3. 2D convolution
    4. 2D MAX pooling
    5. Dropout
    6. Flatten
    7. Dense
    8. Dropout
    9. Dense (output)
    '''
    
    model = Sequential()

    model.add(Convolution2D(filters = params['filters'][0],
                            kernel_size = params['kernel_size'][0],
                            activation = params['activation'][0],
                            input_shape = params['input_shape']))
    model.add(MaxPooling2D(pool_size = params['pool_size'][0]))
    model.add(Convolution2D(filters = params['filters'][1],
                            kernel_size = params['kernel_size'][1],
                            activation = params['activation'][0]))
    model.add(MaxPooling2D(pool_size = params['pool_size'][0]))
    model.add(Dropout(params['dropout'][0]))
    model.add(Flatten())
    model.add(Dense(units=params['units'][0],activation=params['activation'][0]))
    model.add(Dropout(params['dropout'][1]))
    model.add(Dense(units=params['units'][1],activation=output)) # output layer
    # compile model
    model.compile(loss=params['loss'][0],
                  optimizer=params['optimizer'][0],
                  metrics=params['metrics'][0])
    return model

In [12]:
conv_params = {'input_shape':(28,28,1),
               'filters':[6,16],
               'kernel_size':[3,3],
               'activation':['relu'],
               'pool_size':[(2,2)],
               'dropout':[0.25,0.1],
               'units':[128,10],
               'optimizer':['adam'],
               'loss':['categorical_crossentropy'],
               'metrics':[['accuracy']]
              }

In [37]:
# show that a keras model is returned when fed parameters
# this ISN'T needed to run this model. It will instantiate
# when the pipe is called with the .fit() method
instantiate_convolution_model(conv_params).summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_11 (Conv2D)           (None, 26, 26, 6)         60        
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 13, 13, 6)         0         
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 11, 11, 16)        880       
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 5, 5, 16)          0         
_________________________________________________________________
dropout_14 (Dropout)         (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_17 (Dense)             (None, 128)               51328     
__________

## Simple 2D conv model without input paramers

In [14]:
# not used in this demonstration
def make_simple2d():
    n_filters = 32
    kernel = (5,5)
    pool = (2,2)
    n_output = 10 # y_train.shape[1]

    learning_rate = 0.01

    model = Sequential()

    model.add(Convolution2D(n_filters, kernel, input_shape=(28,28,1), activation='relu'))
    model.add(MaxPooling2D(pool_size=pool))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(n_output, activation='softmax'))
    
    adam = Adam(learning_rate)
    model.compile(optimizer=adam, metrics=['accuracy'], loss='categorical_crossentropy')
    
    return model

## Simple 1D non-conv model without input parameters

In [38]:
def make_simple1d():
    model = Sequential()
    model.add(Dense(784, input_shape=(784,), activation='relu'))
#     model.add(Dropout(.5))
    model.add(Dense(10, activation='softmax'))
    model.compile(optimizer='adam', metrics=['accuracy'], loss='categorical_crossentropy')
    return model

# Make a pipeline

## ...for our 2D Conv model

In [16]:
# set the pipe architecture
# 'np' changes from dataframe to numpy array
# 'shp' changes from a row of 748 pixels to a 28x28 matrix for each digit
# 'clf' runs the data through the conv2d net
pipe = Pipeline([
    ('np', FunctionTransformer(make_np_array)),  # change from df to a numpy array
    ('shp', FunctionTransformer(change_shape)),  # go from 1D row of 784 to 2D / 28x28 pixels
    ('clf', KerasClassifier(build_fn=instantiate_convolution_model, epochs=2, params=conv_params)) # train model
])

In [17]:
# This gets input to the 'shp' named step in the pipe
# which gets fed to the change_shape function transformer
# note that it must be input to shp__kw_args per the 
# FunctionTransformer() docs as a dictionary and
# must be received by the function as a keword 
# argument. Positional args are not accepted!
# Note: you can see all the params you can 
# modify with pipe.get_params().keys()
params = [
    {'shp__kw_args': [{'shape': '2d'}]}
]

In [18]:
# create a gridsearch object with the above params
# that feed down into our estimator, which is our
# pipeline
gs = GridSearchCV(pipe, param_grid=params, cv=2)

In [19]:
# run the data through the gridsearch:
#   DATA (X_train) --> GS --> PIPE --> KERAS
gs.fit(X_train,
         np.array(OneHotEncoder().fit_transform(y_train.values.reshape(-1,1)).todense()))



Epoch 1/2
Epoch 2/2



Epoch 1/2
Epoch 2/2




Epoch 1/2
Epoch 2/2


GridSearchCV(cv=2, error_score='raise',
       estimator=Pipeline(memory=None,
     steps=[('np', FunctionTransformer(accept_sparse=False,
          func=<function make_np_array at 0x7f7e3fcfbea0>,
          inv_kw_args=None, inverse_func=None, kw_args=None,
          pass_y='deprecated', validate=True)), ('shp', FunctionTransformer(accept_sparse=False,
          func=<function cha...   validate=True)), ('clf', <keras.wrappers.scikit_learn.KerasClassifier object at 0x7f7e3fbcecf8>)]),
       fit_params=None, iid=True, n_jobs=1,
       param_grid=[{'shp__kw_args': [{'shape': '2d'}]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [20]:
gs.score(X_test,
         np.array(OneHotEncoder().fit_transform(y_test.values.reshape(-1,1)).todense()))



0.96529581529581532

## ...for our 1D model

In [21]:
# Make it so the keras model takes in a 1D
# array instead of a 2D array. This means
# the 28x28 image will be one single row
# that is 784 pixels wide. No convolution
# will be run on it. Note I am changing
# this shape with an argument of my 
# FunctionTransformer, 'change_shape',
# and not manually
params = [
    {'shp__kw_args': [{'shape': '1d'}]}
]

In [22]:
# change the build model from 2dconv to simple 1d
pipe = Pipeline([
    ('np', FunctionTransformer(make_np_array)),
    ('shp', FunctionTransformer(change_shape)),
    ('clf', KerasClassifier(build_fn=make_simple1d, epochs=2))
])

In [23]:
gs = GridSearchCV(pipe, param_grid=params, cv=2)

In [24]:
gs.fit(X_train,
         np.array(OneHotEncoder().fit_transform(y_train.values.reshape(-1,1)).todense()))

Epoch 1/2
Epoch 2/2
Epoch 2/2
Epoch 2/2


GridSearchCV(cv=2, error_score='raise',
       estimator=Pipeline(memory=None,
     steps=[('np', FunctionTransformer(accept_sparse=False,
          func=<function make_np_array at 0x7f7e3fcfbea0>,
          inv_kw_args=None, inverse_func=None, kw_args=None,
          pass_y='deprecated', validate=True)), ('shp', FunctionTransformer(accept_sparse=False,
          func=<function cha...   validate=True)), ('clf', <keras.wrappers.scikit_learn.KerasClassifier object at 0x7f7e3fbcee48>)]),
       fit_params=None, iid=True, n_jobs=1,
       param_grid=[{'shp__kw_args': [{'shape': '1d'}]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [25]:
gs.score(X_test,
         np.array(OneHotEncoder().fit_transform(y_test.values.reshape(-1,1)).todense()))



0.09704184704184704