## Build Models To Compare Features: Compare And Evaluate All Models

In this section, we will do the following:
1. Evaluate all of our saved models on the validation set
2. Select the best model based on performance on the validation set
3. Evaluate that model on the holdout test set

### Read In Data

In [1]:
# Read in data
import joblib
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score
from time import time
%matplotlib inline

val_features_raw = pd.read_csv('../Data/Final_Data/val_features_raw.csv') 
val_features_original = pd.read_csv('../Data/Final_Data/val_features_original.csv') 
val_features_all = pd.read_csv('../Data/Final_Data/val_features_all.csv') 
val_features_reduced = pd.read_csv('../Data/Final_Data/val_features_reduced.csv') 

val_labels = pd.read_csv('../Data/Final_Data/val_labels.csv')

val_features_raw.head()

Unnamed: 0,Pclass,Sex,Age_clean,SibSp,Parch,Fare,Cabin,Embarked
0,1,0,29.699118,1,0,89.1042,86,0
1,1,1,45.5,0,0,28.5,56,2
2,3,1,29.699118,0,0,7.75,147,1
3,2,0,24.0,1,0,26.0,147,2
4,2,1,36.0,0,0,12.875,90,0


### Read In Models

In [4]:
# Read in models
models = {}

for mdl in ['raw_original', 'cleaned_original', 'all', 'reduced']:
    models[mdl] = joblib.load('../Pickled_Models/mdl_{}_features.pkl'.format(mdl))

### Evaluate Models On The Validation Set

In [11]:
def evaluate_model(name, model, features, labels):
    start = time()
    pred = model.predict(features)
    end = time()
    
    accuracy = round(accuracy_score(labels, pred), 3)
    precision = round(precision_score(labels, pred), 3)
    recall = round(recall_score(labels, pred), 3)
    
    print('{} \t-- \tAccuracy: {} / Precision: {} / Recall: {} / Latency: {}ms'.format(name,
                                                                                                                     accuracy,
                                                                                                                     precision,
                                                                                                                     recall,
                                                                                                                     round((end - start) * 1000 , 1)))

In [13]:
# Evaluate all of our models on the validation set
evaluate_model('Raw Features', models['raw_original'], val_features_raw, val_labels)
evaluate_model('Cleaned Features', models['cleaned_original'], val_features_original, val_labels)
evaluate_model('All Features', models['all'], val_features_all, val_labels)
evaluate_model('Reduced Features', models['reduced'], val_features_reduced, val_labels)

Raw Features 	-- 	Accuracy: 0.809 / Precision: 0.782 / Recall: 0.662 / Latency: 54.0ms
Cleaned Features 	-- 	Accuracy: 0.803 / Precision: 0.778 / Recall: 0.646 / Latency: 82.1ms
All Features 	-- 	Accuracy: 0.831 / Precision: 0.797 / Recall: 0.723 / Latency: 88.1ms
Reduced Features 	-- 	Accuracy: 0.809 / Precision: 0.772 / Recall: 0.677 / Latency: 20.0ms


As we can see `All Features` has the best Accuracy, Precision and Recall. but if there is any requirement for less latency, we would have chosen `Reduced Features` which is the second best model.

### Evaluate Best Model On Test Set

In [14]:
# Read in our test features
test_features = pd.read_csv('../Data/Final_Data/test_features_all.csv')
test_labels = pd.read_csv('../Data/Final_Data/test_labels.csv')

In [24]:
# Evaluate our final model on the test set
evaluate_model('All Features', models['all'], test_features, test_labels)

All Features 	-- 	Accuracy: 0.816 / Precision: 0.831 / Recall: 0.711 / Latency: 98.1ms
