## Summary: Compare model results and final model selection

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will do the following:
1. Evaluate all of our saved models on the validation set
2. Select the best model based on performance on the validation set
3. Evaluate that model on the holdout test set

## Comparision of our models on different parameters
![Model Comparision](img/model_comp.png)

### Read in Data

In [2]:
import pandas as pd

val_features = pd.read_csv('tmp/val_features.csv')
val_labels = pd.read_csv('tmp/val_labels.csv')

te_features = pd.read_csv('tmp/test_features.csv')
te_labels = pd.read_csv('tmp/test_labels.csv')

### Read in Models

In [3]:
import joblib
models = {}

for mdl in ['LR', 'SVM', 'MLP', 'RF', 'GB']:
    models[mdl] = joblib.load(f'joblib/{mdl}_Model.pkl')

In [6]:
## See how models look
for model in models:
    print(models[model], '\n')

LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False) 

SVC(C=0.1, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False) 

MLPClassifier(activation='tanh', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(100,), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver

### Evaluate models on the validation set

![Evaluation Metrics](img/eval_metrics.png)

In [7]:
from sklearn.metrics import accuracy_score, precision_score, recall_score
from time import time

def evaluate_model(name, model, features, labels):
    ## Start timer
    start = time()
    
    ## Perform predition
    pred = model.predict(features)
    
    ## End timer
    end = time()
    
    ## Calculate Accuracy, Precision and Recall
    accuracy = round(accuracy_score(labels, pred), 3)
    precision = round(precision_score(labels, pred), 3)
    recall = round(recall_score(labels, pred), 3)
    
    ## Display the result for Model
    print(f'{name} -- Accuracy: {accuracy} / Precision: {precision} / Recall: {recall} / Latency: {round((end - start)*1000, 1)}ms')

In [8]:
## Evaluate each model on validation set one by one.

for name, model in models.items():
    evaluate_model(name, model, val_features, val_labels)

LR -- Accuracy: 0.775 / Precision: 0.712 / Recall: 0.646 / Latency: 3.9ms
SVM -- Accuracy: 0.747 / Precision: 0.672 / Recall: 0.6 / Latency: 4.0ms
MLP -- Accuracy: 0.764 / Precision: 0.683 / Recall: 0.662 / Latency: 5.1ms
RF -- Accuracy: 0.792 / Precision: 0.85 / Recall: 0.523 / Latency: 4.9ms
GB -- Accuracy: 0.815 / Precision: 0.808 / Recall: 0.646 / Latency: 7.6ms


### Evaluate best model on test set

In [11]:
## Now we have confusion on 2 models Random Forest and Gradient Boosting
## RF has best precision and GBT has best Accuracy and recall
## Lest evaluate both on test set and see which perform better
evaluate_model('Random Forest', models['RF'], te_features, te_labels)

evaluate_model('Gradient Boosting Tree', models['GB'], te_features, te_labels)

Random Forest -- Accuracy: 0.782 / Precision: 0.894 / Recall: 0.553 / Latency: 9.8ms
Gradient Boosting Tree -- Accuracy: 0.816 / Precision: 0.852 / Recall: 0.684 / Latency: 8.7ms


`Look like Gradient Boosting Tree gives better result so we may end up using GBTree`