<h2>Neural Network</h2>

In this module, we learn to use Neural Network to solve classification and regression problems

<h3>Classification</h3>

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np

<h4> Applied on the Credit Approval Data </h4>

As usual, try the model on the credit approval data

In [2]:
crx = pd.read_csv('crx.data', header=None)
crx.head()

Y = np.zeros(crx.shape[0])           #create a vector of zeros with size = the data
Y[crx[15]=='+'] = 1                  #when the actual target is +, Y is assigned 1
crx[15] = Y  

from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)

for train_index, test_index in split.split(crx, crx[15]):
    strat_train_set = crx.loc[train_index]
    strat_test_set = crx.loc[test_index]
    
trainX = strat_train_set.loc[:,:14]
trainY = strat_train_set.loc[:,15]
trainX.shape, trainY.shape

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import FunctionTransformer

num_cols = trainX.columns[(trainX.dtypes == np.int64) | (trainX.dtypes == np.float64)]

num_pipeline = Pipeline([
    ('impute', SimpleImputer(strategy='median')),
    ('standardize', StandardScaler())
])

from sklearn.preprocessing import OneHotEncoder

#get a list of class columns
cat_cols = trainX.columns[trainX.dtypes==object]

cat_pipeline = Pipeline([
    ('impute', SimpleImputer(strategy='constant',fill_value='missing')),
    ('encode', OneHotEncoder())
])

from sklearn.compose import ColumnTransformer

full_pipeline = ColumnTransformer([
    ('numeric', num_pipeline, num_cols),
    ('class', cat_pipeline, cat_cols)
])

trainX_prc = full_pipeline.fit_transform(trainX)

testX = strat_test_set.loc[:,:14]
testY = strat_test_set.loc[:,15]

testX_prc = full_pipeline.transform(testX)  

For classification, we use MLPClassifier.

The architecture of the NN is decided by the hidden_layer_sizes hyperparameter. In short, this is a list of integer numbers, each number represent the number of hidden neuron in the corresponding layer. 

For example, 

hidden_layer_sizes=[10,20,30] 

represents a NN with three hidden layers, the first hidden layer has 10 neurons, the 2nd 20 neurons, and the last 30 neurons.

NN is also trained iteratively, so you can also set max_iter to a high value to make sure the training converge

In [6]:
from sklearn.neural_network import MLPClassifier

n_features = trainX_prc.shape[1] #get the number of input features
mlp = MLPClassifier(hidden_layer_sizes=[n_features,n_features,n_features], max_iter=1000)

mlp.fit(trainX_prc, trainY)
print(mlp.score(trainX_prc, trainY))
print(mlp.score(testX_prc, testY))

0.9980657640232108
0.8208092485549133


It seems like the model is overfitting.

Now let's finetune the NN. I'm just gonna train a few architectures.

In [11]:
from sklearn.model_selection import GridSearchCV

param_grid = [{
    'hidden_layer_sizes' : [[n_features,n_features],                       #two hidden layer with n_features neurons
                            [n_features,n_features,n_features],            #three hidden layer with n_features neurons 
                            [n_features//2,n_features//2],                 #two hidden layer with n_features/2 neurons
                            [n_features//2,n_features//2,n_features//2],   #three hidden layer with n_features/2 neurons
                            [n_features*2,n_features*2],                   #two hidden layer with n_features*2 neurons
                            [n_features*2,n_features*2,n_features*2]],     #three hidden layer with n_features*2 neurons
    'alpha' : [0.001, 0.01, 0.1, 1, 10]                                    #regularization terms
}]

mlp = MLPClassifier(max_iter=1000)

grid_search = GridSearchCV(mlp, param_grid, cv=3, scoring='accuracy', return_train_score=True)

grid_search.fit(trainX_prc,trainY)

GridSearchCV(cv=3, error_score=nan,
             estimator=MLPClassifier(activation='relu', alpha=0.0001,
                                     batch_size='auto', beta_1=0.9,
                                     beta_2=0.999, early_stopping=False,
                                     epsilon=1e-08, hidden_layer_sizes=(100,),
                                     learning_rate='constant',
                                     learning_rate_init=0.001, max_fun=15000,
                                     max_iter=1000, momentum=0.9,
                                     n_iter_no_change=10,
                                     nesterovs_momentum=True, power_t=0.5,
                                     random_state=None, shuffle=True,
                                     solver='adam', tol=0.0001,
                                     validation_fraction=0.1, verbose=False,
                                     warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid

Best training model:

In [12]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'alpha': 10, 'hidden_layer_sizes': [102, 102]}
0.8510328449164315


In [13]:
best_dt = grid_search.best_estimator_
best_dt.score(testX_prc, testY)

0.8728323699421965

<h4>Summarize all Results</h4>

Compared to other models so far:

|Model|Training CV Accuracy| Testing Accuracy|
|-----|--------------------|-----------------|
|No Regularization|0.834|0.850|
|L2 Regularization|0.857|0.861|
|L1 Regularization|0.861|0.861|
|ENet Regularization|0.863|0.861|
|L1 Linear SVM|0.851|0.861|
|L2 Linear SVM|0.853|0.873|
|Kernel SVM|0.872|0.867|
|Decision Tree|0.858|0.867|
|NN|0.851|0.873|

For regression, training NN is essentially the same. The only difference is that we use MLPRegressor instead of MLPClassifier

In [None]:
from sklearn.neural_network import MLPRegressor