## Multilayer Perceptron: Fit and evaluate a model

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will fit and evaluate a simple Multilayer Perceptron model.

### Read in Data

In [4]:
import joblib
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV

import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)

train_features = pd.read_csv('../Data/train_features.csv')
train_labels = pd.read_csv('../Data/train_labels.csv', header=None)

### Hyperparameter tuning

![hidden layer](img/hidden_layers.png)

In [5]:
def print_results(results):
    print('BEST PARAMS: {}'.format(results.best_params_))
    
    means = results.cv_results_['mean_test_score']
    stds = results.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, results.cv_results_['params']):
        print('{} (+- {}) for {}'.format(round(mean,3), round(std *2, 3), params))

#### Hyper parameters tuning Notes
- #### hidden_layer_sizes
   - as the problem is relatively simple, we will use one layer only => passing value in the tuple with one value represents 1 layer
   - here 1 hidden layer with 10 nodes, 50 nodes and 100 nodes.
- #### activation
   - `relu`, `tanh`, `logistic`
- #### learning_rate
    - `constant`: it will just take the initial learning rate and keep it the same throughout the entire optimization process.
    - `invscaling`: (inverse scaling) it gradually decreases the learning rate at each step. So this will allow it to take large jump at first. and then it slowly decreases as it gets closer and closer to optimal model.
    - `adaptive`: this keeps the learning constant as long as training loss keeps decreasing. If the learning rate stops going down, then it will decrease the learning rate, so that it takes smaller steps. 

In [9]:
mlp = MLPClassifier(max_iter = 1000)

parameters = {
    'hidden_layer_sizes': [(10,), (50,), (100,)], 
    'activation': ['relu', 'tanh', 'logistic'],
    'learning_rate': ['constant', 'invscaling', 'adaptive'],
}

cv = GridSearchCV(mlp, parameters, cv=5)
cv.fit(train_features, train_labels.values.ravel())

print_results(cv)



BEST PARAMS: {'activation': 'tanh', 'hidden_layer_sizes': (10,), 'learning_rate': 'constant'}
0.787 (+- 0.114) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'constant'}
0.792 (+- 0.099) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'invscaling'}
0.79 (+- 0.1) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'adaptive'}
0.774 (+- 0.136) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'constant'}
0.787 (+- 0.082) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'invscaling'}
0.8 (+- 0.102) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'adaptive'}
0.789 (+- 0.123) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rate': 'constant'}
0.783 (+- 0.105) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rate': 'invscaling'}
0.805 (+- 0.109) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rat

### Write out pickled model

In [10]:
cv.best_estimator_

MLPClassifier(activation='tanh', hidden_layer_sizes=(10,), max_iter=1000)

In [12]:
joblib.dump(cv.best_estimator_, '../Pickled_Models/MLP_model.pkl')

['../Pickled_Models/MLP_model.pkl']