<a href="https://colab.research.google.com/github/matthewpecsok/4482_fall_2022/blob/main/tutorials/4482_classification_MLP_titanic_cleaned.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Welcome to the MLP classification notebook. In this notebook we will be exploring two new items. 

* The first will be learning how to use the MultiLayerPerceptron Algorithm. In the case that we start introducing hidden layers one would call this "Deep Learning" and "Deep Neural Network" which is an extremely powerful modeling technique. 

* The second concept which is extremely relevant for DNN is the selection of and tuning of hyperparameters. This selection can be computationally expensive and time consuming. For this we introduce sklearn's gridsearchcv to help us search/explore the parameter space without having to code this manually. 

* The notebook will begin by simply creating MLP models with various numbers of layers and a variety of neurons in each layer with the models becoming more computationally expensive and complex. Pay attention to the metrics to see if this complexity has actually improved our predictions. DNNs are notorious for overfitting data because they can become arbitrarily complex. 

* After you have grasped the primary concept of the DNN we will introduce other hyperparameters that are often tuned to improve the model, and finally we will introduce gridsearchcv to bring this code complexity back down to a reasonable effort. 

## Setup

In [1]:
import numpy as np
import pandas as pd

from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix,\
 recall_score, precision_score, f1_score, accuracy_score, make_scorer,\
  precision_recall_fscore_support

from sklearn.model_selection import train_test_split, cross_validate

from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC

import warnings
warnings.filterwarnings('ignore')


## Data

In [2]:
titanic_cleaned = pd.read_csv('https://raw.githubusercontent.com/matthewpecsok/4482_fall_2022/main/data/titanic_cleaned.csv').drop('Cabin', axis=1) # drop cabin

In [3]:
titanic_cleaned.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,male,22.0,1,0,7.25,S
1,1,1,female,38.0,1,0,71.2833,C
2,1,3,female,26.0,0,0,7.925,S
3,1,1,female,35.0,1,0,53.1,S
4,0,3,male,35.0,0,0,8.05,S


In [4]:
titanic_cleaned['Pclass'] = titanic_cleaned.Pclass.astype(str)

In [5]:
titanic_cleaned.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 714 entries, 0 to 713
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  714 non-null    int64  
 1   Pclass    714 non-null    object 
 2   Sex       714 non-null    object 
 3   Age       714 non-null    float64
 4   SibSp     714 non-null    int64  
 5   Parch     714 non-null    int64  
 6   Fare      714 non-null    float64
 7   Embarked  714 non-null    object 
dtypes: float64(2), int64(3), object(3)
memory usage: 44.8+ KB


In [6]:
y = titanic_cleaned.pop('Survived')

In [7]:
X = pd.get_dummies(titanic_cleaned)
print(X.shape, y.shape)

(714, 13) (714,)


In [8]:
X.head()

Unnamed: 0,Age,SibSp,Parch,Fare,Pclass_1,Pclass_2,Pclass_3,Sex_female,Sex_male,Embarked_C,Embarked_Q,Embarked_S,Embarked_missing
0,22.0,1,0,7.25,0,0,1,0,1,0,0,1,0
1,38.0,1,0,71.2833,1,0,0,1,0,1,0,0,0
2,26.0,0,0,7.925,0,0,1,1,0,0,0,1,0
3,35.0,1,0,53.1,1,0,0,1,0,0,0,1,0
4,35.0,0,0,8.05,0,0,1,0,1,0,0,1,0


## MLP

# Changing the number of hidden layers on the model

### Model 1 (no hidden layers)

In [9]:
model_1 = MLPClassifier(random_state=2021,hidden_layer_sizes=()).fit(X,y)


In [10]:
model_1

MLPClassifier(hidden_layer_sizes=(), random_state=2021)

In [11]:
model_1.n_layers_

2

In [12]:
model_1.hidden_layer_sizes

()

In [13]:
model_1.classes_

array([0, 1])

In [14]:
len(model_1.coefs_)

1

In [15]:
model_1.coefs_[0].shape

(13, 1)

In [16]:
model_1.coefs_[0]

array([[-0.02392046],
       [ 0.12518862],
       [-0.13776903],
       [ 0.00281199],
       [ 0.86980426],
       [ 0.08472776],
       [-0.74070596],
       [ 0.87249304],
       [-0.33628334],
       [ 0.3924393 ],
       [-0.65300024],
       [-0.37585229],
       [ 0.82755228]])

In [46]:
model_1_cv_results = pd.DataFrame(cross_validate(model_1,
               X,
               y,
               cv = 3,
               return_train_score=True,
               scoring=['accuracy','recall','precision','f1']))

model_1_cv_results.mean()

fit_time           0.210368
score_time         0.005510
test_accuracy      0.731092
train_accuracy     0.740896
test_recall        0.523804
train_recall       0.537890
test_precision     0.736622
train_precision    0.753440
test_f1            0.612097
train_f1           0.627644
dtype: float64

### Model 2 (1 hidden layer with 50 neurons)

In [18]:
# why do we fit here? without fitting we have no layers
model_2 = MLPClassifier(random_state=2021,hidden_layer_sizes=(50,)).fit(X,y)
print("hidden layers sizes",model_2.hidden_layer_sizes)
print("n_layers_",model_2.n_layers_)

hidden layers sizes (50,)
n_layers_ 3


In [54]:
model_2_cv_results = pd.DataFrame(cross_validate(model_2,
               X,
               y,
               cv = 3,
               return_train_score=True,
               scoring=['accuracy','recall','precision','f1']))

model_2_cv_results.mean()

fit_time           1.112288
score_time         0.009019
test_accuracy      0.780112
train_accuracy     0.823529
test_recall        0.693048
train_recall       0.749978
test_precision     0.747515
train_precision    0.803148
test_f1            0.719149
train_f1           0.775552
dtype: float64

### Model 3 (2 hidden layers, the first has 15 neurons and the second has 10 neurons)


In [20]:
# why do we fit here? without fitting we have no layers
model_3 = MLPClassifier(random_state=2021,hidden_layer_sizes=(15,10)).fit(X,y)
print("hidden layers sizes",model_3.hidden_layer_sizes)
print("n_layers_",model_3.n_layers_)

hidden layers sizes (15, 10)
n_layers_ 4


In [55]:
model_3_cv_results = pd.DataFrame(cross_validate(model_3,
               X,
               y,
               cv = 3,
               return_train_score=True,
               scoring=['accuracy','recall','precision','f1']))

model_3_cv_results.mean()

fit_time           0.823231
score_time         0.011280
test_accuracy      0.775910
train_accuracy     0.824230
test_recall        0.668850
train_recall       0.736161
test_precision     0.751963
train_precision    0.813177
test_f1            0.707955
train_f1           0.772719
dtype: float64

### Model 3 (3 hidden layers, the first has 50 neurons , the second has 25 neurons, the third has 10 neurons)

In [22]:
# why do we fit here? without fitting we have no layers
model_4 = MLPClassifier(random_state=2021,hidden_layer_sizes=(50,25,10)).fit(X,y)
print("hidden layers sizes",model_4.hidden_layer_sizes)
print("n_layers_",model_4.n_layers_)

hidden layers sizes (50, 25, 10)
n_layers_ 5


In [56]:
model_4_cv_results = pd.DataFrame(cross_validate(model_4,
               X,
               y,
               cv = 3,
               return_train_score=True,
               scoring=['accuracy','recall','precision','f1']))

model_4_cv_results.mean()

fit_time           0.930040
score_time         0.011297
test_accuracy      0.782913
train_accuracy     0.843137
test_recall        0.710338
train_recall       0.772297
test_precision     0.749714
train_precision    0.833564
test_f1            0.728013
train_f1           0.799583
dtype: float64

### Model 5 (4 hidden layers, the first has 100 neurons , the second has 50 neurons, the third has 25 neurons and the fourth has 10 neurons)

In [24]:
# why do we fit here? without fitting we have no layers
model_5 = MLPClassifier(random_state=2021,hidden_layer_sizes=(100,50,25,10)).fit(X,y)
print("hidden layers sizes",model_5.hidden_layer_sizes)
print("n_layers_",model_5.n_layers_)

hidden layers sizes (100, 50, 25, 10)
n_layers_ 6


In [57]:
model_5_cv_results = pd.DataFrame(cross_validate(model_5,
               X,
               y,
               cv = 3,
               return_train_score=True,
               scoring=['accuracy','recall','precision','f1']))

model_5_cv_results.mean()

fit_time           0.920702
score_time         0.007917
test_accuracy      0.787115
train_accuracy     0.833333
test_recall        0.685961
train_recall       0.748295
test_precision     0.766558
train_precision    0.826809
test_f1            0.723250
train_f1           0.784373
dtype: float64

### Model 6 (5 hidden layers, the first has 100 neurons , the second has 50 neurons, the third has 25 neurons , the fourth has 25 neurons , the fifth has 10 neurons)

In [58]:
model_6_cv_results = pd.DataFrame(cross_validate(MLPClassifier(random_state=2021,hidden_layer_sizes=(100,50,25,25,10)),
               X,
               y,
               cv = 3,
               return_train_score=True,
               scoring=['accuracy','recall','precision','f1']))

model_6_cv_results.mean()

fit_time           1.905425
score_time         0.008299
test_accuracy      0.778711
train_accuracy     0.827731
test_recall        0.717103
train_recall       0.796476
test_precision     0.733250
train_precision    0.782985
test_f1            0.724911
train_f1           0.789277
dtype: float64

### Model 7 (3 hidden layers each with 500 neurons)

notice how much longer this model took to train compared to the others!

In [59]:
model_7_cv_results = pd.DataFrame(cross_validate(MLPClassifier(random_state=2021,hidden_layer_sizes=(500,500,500)),
               X,
               y,
               cv = 3,
               return_train_score=True,
               scoring=['accuracy','recall','precision','f1']))

model_7_cv_results.mean()

fit_time           10.450978
score_time          0.020187
test_accuracy       0.778711
train_accuracy      0.823529
test_recall         0.662156
train_recall        0.732564
test_precision      0.768261
train_precision     0.815892
test_f1             0.707525
train_f1            0.768784
dtype: float64

In [60]:
model_1_cv_results.mean()

fit_time           0.210368
score_time         0.005510
test_accuracy      0.731092
train_accuracy     0.740896
test_recall        0.523804
train_recall       0.537890
test_precision     0.736622
train_precision    0.753440
test_f1            0.612097
train_f1           0.627644
dtype: float64

In [61]:
model_2_cv_results.mean()

fit_time           1.112288
score_time         0.009019
test_accuracy      0.780112
train_accuracy     0.823529
test_recall        0.693048
train_recall       0.749978
test_precision     0.747515
train_precision    0.803148
test_f1            0.719149
train_f1           0.775552
dtype: float64

In [63]:
model_3_cv_results.mean()

fit_time           0.823231
score_time         0.011280
test_accuracy      0.775910
train_accuracy     0.824230
test_recall        0.668850
train_recall       0.736161
test_precision     0.751963
train_precision    0.813177
test_f1            0.707955
train_f1           0.772719
dtype: float64

In [64]:
model_4_cv_results.mean()

fit_time           0.930040
score_time         0.011297
test_accuracy      0.782913
train_accuracy     0.843137
test_recall        0.710338
train_recall       0.772297
test_precision     0.749714
train_precision    0.833564
test_f1            0.728013
train_f1           0.799583
dtype: float64

In [65]:
model_5_cv_results.mean()

fit_time           0.920702
score_time         0.007917
test_accuracy      0.787115
train_accuracy     0.833333
test_recall        0.685961
train_recall       0.748295
test_precision     0.766558
train_precision    0.826809
test_f1            0.723250
train_f1           0.784373
dtype: float64

In [66]:
model_6_cv_results.mean()

fit_time           1.905425
score_time         0.008299
test_accuracy      0.778711
train_accuracy     0.827731
test_recall        0.717103
train_recall       0.796476
test_precision     0.733250
train_precision    0.782985
test_f1            0.724911
train_f1           0.789277
dtype: float64

In [67]:
model_7_cv_results.mean()

fit_time           10.450978
score_time          0.020187
test_accuracy       0.778711
train_accuracy      0.823529
test_recall         0.662156
train_recall        0.732564
test_precision      0.768261
train_precision     0.815892
test_f1             0.707525
train_f1            0.768784
dtype: float64

Analyzing the results as shown above notice that the performance of the models is not always increasing even though the complexity is increasing. The increased complexity does seem to increase the fit time though which is expected. A more complex model will take longer to train. So there is a trade off between quality of the model and complexity of the model. 

In addition, notice how much code we had to write just to get these results? And if you were starting to feel like the code is redundant for each section you are correct! Notice with a simple function how we can do exactly the same thing much easier as shown below. 

In [90]:
from sklearn.model_selection import GridSearchCV
parameters = {'hidden_layer_sizes':[(),
                                    (5),
                                    (10,),
                                    (20,),
                                    (30,),
                                    (40,),
                                    (50,),
                                    (50,),
                                    (60,),
                                    (70,),
                                    (80,),
                                    (90,),
                                    (100,),


                                    (5,5),
                                    (10,10),
                                    (20,20),
                                    (30,30),
                                    (40,40),
                                    (50,50),
                                    (60,60),
                                    (70,70),
                                    (80,80),
                                    (90,90),
                                    (100,100),
                                    ],
              }
mlp = MLPClassifier(random_state=2021)
clf = GridSearchCV(mlp, parameters,scoring='accuracy',return_train_score=True,cv=3)
clf.fit(X, y)

GridSearchCV(cv=3, estimator=MLPClassifier(random_state=2021),
             param_grid={'hidden_layer_sizes': [(), 5, (10,), (20,), (30,),
                                                (40,), (50,), (50,), (60,),
                                                (70,), (80,), (90,), (100,),
                                                (5, 5), (10, 10), (20, 20),
                                                (30, 30), (40, 40), (50, 50),
                                                (60, 60), (70, 70), (80, 80),
                                                (90, 90), (100, 100)]},
             return_train_score=True, scoring='accuracy')

In [92]:
grid_search_df = pd.DataFrame(clf.cv_results_)
grid_search_df.sort_values('rank_test_score')[['rank_test_score','param_hidden_layer_sizes','mean_test_score','mean_train_score']]

Unnamed: 0,rank_test_score,param_hidden_layer_sizes,mean_test_score,mean_train_score
19,1,"(60, 60)",0.794118,0.840336
11,2,"(90,)",0.791317,0.834034
20,2,"(70, 70)",0.791317,0.843137
18,4,"(50, 50)",0.791317,0.82493
23,5,"(100, 100)",0.789916,0.841036
15,6,"(20, 20)",0.788515,0.831232
22,7,"(90, 90)",0.788515,0.838235
8,8,"(60,)",0.787115,0.826331
9,8,"(70,)",0.787115,0.82563
10,10,"(80,)",0.785714,0.829832


# Learning Rate Hyperparameter

Before this we were simply looking at the number of hidden layers and the quantity of neurons in this hidden layers to improve our model. Another common tuning hyperparameter is learning rate. This is the rate at which the weights are adjusted and has two primary benefits. The first is that a learning weight can impact the models ability to learn more quickly or slowly and second that it can help the model stay out of local minima and hopefully find the globbal minumum when reducing loss as it is getting tuned.

Let's do a quick exploration of this and its impact on training using a simple example

after the gridsearch explores all the model combinations it selects the best model set of hyperparameters to be used. Once this model is trained it may result in an improvement over our previous efforts. Notice that I have not explored EVERY possible model we created previously and have instead opted to explore a variety of other hyperparameters to see if it's possible to achieve a better prediction than we achieved previously in the notebook. 

The point to take away from this is that
1. There are an infinite number of hyperparameter combinations
1. It's impossible to explore them all by hand
1. For each combination a model must be trained



In [100]:
from sklearn.model_selection import GridSearchCV
parameters = {'hidden_layer_sizes':[(),
                                    (5),
                                    (10,),
                                    (20,),
                                    (30,),
                                    (40,),
                                    (50,),
                                    (50,),
                                    (60,),
                                    (70,),
                                    (80,),
                                    (90,),
                                    (100,),


                                    (5,5),
                                    (10,10),
                                    (20,20),
                                    (30,30),
                                    (40,40),
                                    (50,50),
                                    (60,60),
                                    (70,70),
                                    (80,80),
                                    (90,90),
                                    (100,100),
                                    ],
              'learning_rate_init':[0.1,0.01,0.001],
              'activation':['identity', 'logistic', 'tanh', 'relu'],
              'alpha':[0.0001,0.001,0.01,0.1]
              }
mlp = MLPClassifier(random_state=2021)
clf = GridSearchCV(mlp, parameters,scoring='accuracy',return_train_score=True,cv=3)
clf.fit(X, y)

GridSearchCV(cv=3, estimator=MLPClassifier(random_state=2021),
             param_grid={'activation': ['identity', 'logistic', 'tanh', 'relu'],
                         'alpha': [0.0001, 0.001, 0.01, 0.1],
                         'hidden_layer_sizes': [(), 5, (10,), (20,), (30,),
                                                (40,), (50,), (50,), (60,),
                                                (70,), (80,), (90,), (100,),
                                                (5, 5), (10, 10), (20, 20),
                                                (30, 30), (40, 40), (50, 50),
                                                (60, 60), (70, 70), (80, 80),
                                                (90, 90), (100, 100)],
                         'learning_rate_init': [0.1, 0.01, 0.001]},
             return_train_score=True, scoring='accuracy')

In [101]:
grid_search_df = pd.DataFrame(clf.cv_results_)
grid_search_df = round(grid_search_df,4)
grid_search_df = grid_search_df.sort_values('rank_test_score')[['rank_test_score','params','mean_test_score','mean_train_score']]
from google.colab import data_table
data_table.DataTable(grid_search_df)


Unnamed: 0,rank_test_score,params,mean_test_score,mean_train_score
367,1,"{'activation': 'logistic', 'alpha': 0.001, 'hi...",0.8179,0.8515
430,2,"{'activation': 'logistic', 'alpha': 0.001, 'hi...",0.8151,0.8438
463,3,"{'activation': 'logistic', 'alpha': 0.01, 'hid...",0.8137,0.8487
295,4,"{'activation': 'logistic', 'alpha': 0.0001, 'h...",0.8123,0.8515
484,4,"{'activation': 'logistic', 'alpha': 0.01, 'hid...",0.8123,0.8438
...,...,...,...,...
642,1147,"{'activation': 'tanh', 'alpha': 0.0001, 'hidde...",0.6541,0.6443
702,1149,"{'activation': 'tanh', 'alpha': 0.001, 'hidden...",0.6513,0.6548
636,1150,"{'activation': 'tanh', 'alpha': 0.0001, 'hidde...",0.6457,0.6849
783,1151,"{'activation': 'tanh', 'alpha': 0.01, 'hidden_...",0.6443,0.6324
