## Conclusion: Compare model results and final model selection

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will do the following:
1. Evaluate all of our saved models on the validation set
2. Select the best model based on performance on the validation set
3. Evaluate that model on the holdout test set

### Read in Data

In [13]:
import joblib
import pandas as pd

from sklearn.metrics import accuracy_score, precision_score, recall_score
from time import time

val_features = pd.read_csv('../Data/val_features.csv')
val_labels = pd.read_csv('../Data/val_labels.csv')

test_features = pd.read_csv('../Data/test_features.csv')
test_labels = pd.read_csv('../Data/test_labels.csv')

### Read in Models

In [14]:
gb_mdl = joblib.load('../Pickled_Models/GB_model.pkl') 
rf_mdl = joblib.load('../Pickled_Models/RF_model.pkl')
stacked_mdl = joblib.load('../Pickled_Models/Stacked_model.pkl')

### Evaluate models on the validation set

In [15]:
def evaluate_model(model, features, labels):
    start = time()
    pred = model.predict(features)
    end = time()
    
    accuracy = round(accuracy_score(labels, pred), 3)
    precision = round(precision_score(labels, pred), 3)
    recall = round(recall_score(labels, pred), 3)
    
    print('{} -- Accuracy: {} / Precision: {} / Recall: {} / Latency: {}ms'.format(str(model).split('(')[0],
                                                                                   accuracy,
                                                                                   precision,
                                                                                   recall,
                                                                                   round((end - start)*1000, 1)))

In [16]:
for mdl in [gb_mdl, rf_mdl, stacked_mdl]:
    evaluate_model(mdl, val_features, val_labels)

GradientBoostingClassifier -- Accuracy: 0.809 / Precision: 0.804 / Recall: 0.631 / Latency: 6.0ms
RandomForestClassifier -- Accuracy: 0.815 / Precision: 0.82 / Recall: 0.631 / Latency: 40.0ms
StackingClassifier -- Accuracy: 0.809 / Precision: 0.816 / Recall: 0.615 / Latency: 13.0ms


### Evaluate best model on test set

As we have seen in Evaluation of three models, we will choose the best one for our problem. As we don't need to consider Latency for this one, we will choose `Random Forest`which has the best Accuracy and Precision.

In [17]:
evaluate_model(rf_mdl, test_features, test_labels)

RandomForestClassifier -- Accuracy: 0.793 / Precision: 0.831 / Recall: 0.645 / Latency: 43.0ms
