##**Compare model results and final model selection**

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will do the following:
1. Evaluate all of our saved models on the validation set
2. Select the best model based on performance on the validation set
3. Evaluate that model on the holdout test set

### Read in Data

In [10]:
import joblib
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score
from time import time

Xtr = pd.read_csv('Xtr.csv')
ytr = pd.read_csv('ytr.csv')

Xts = pd.read_csv('Xts.csv')
yts = pd.read_csv('yts.csv')

Xval = pd.read_csv('Xval.csv')
yval = pd.read_csv('yval.csv')

In [11]:
print('Training shape: Xtr: {} ytr: {}'.format(Xtr.shape, ytr.shape))
print('Test shape: Xts{} yts:{} '.format(Xts.shape, yts.shape))
print('Validation shape: Xval{} yval:{} '.format(Xval.shape, yval.shape))

Training shape: Xtr: (534, 6) ytr: (534, 1)
Test shape: Xts(179, 6) yts:(179, 1) 
Validation shape: Xval(178, 6) yval:(178, 1) 


### Read in Models

In [6]:
models = {}
for mdl in ['LR', 'SVM', 'MLP', 'RF', 'XGB']:
    models[mdl] = joblib.load('{}_model.pkl'.format(mdl))

### Evaluate models on the validation set

![Evaluation Metrics](../../img/eval_metrics.png)

In [8]:
models

{'LR': LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=500,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 'MLP': MLPClassifier(activation='tanh', alpha=0.0001, batch_size='auto', beta_1=0.9,
               beta_2=0.999, early_stopping=False, epsilon=1e-08,
               hidden_layer_sizes=(50, 1), learning_rate='constant',
               learning_rate_init=0.001, max_fun=15000, max_iter=200,
               momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
               power_t=0.5, random_state=None, shuffle=True, solver='adam',
               tol=0.0001, validation_fraction=0.1, verbose=False,
               warm_start=False),
 'RF': RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                        criterion='gini',

### Evaluate best model on test set

###**Performance metrics:**
1. **Accuracy:** #predicted correctly/total # examples
2. **Precision:** 
 - #true predictions/ # total true predictions expected
 - How many times the model predicted correctly.
3. **Recall:** 
 - #true predictions / # total predictions of survive.
 - How many times the model correctly predicted the true prediction

In [13]:
def evaluate_model(name, model, features, labels):
    start = time()
    pred = model.predict(features)  # Array of predictions
    end = time()
    accuracy = round(accuracy_score(labels, pred), 3)
    precision = round(precision_score(labels, pred), 3)
    recall = round(recall_score(labels, pred), 3)
    print('{} -- Accuracy: {} / Precision: {} / Recall: {} / Latency: {}ms'.format(name,
                                                                                   accuracy,
                                                                                   precision,
                                                                                   recall,
                                                                                   round((end - start)*1000, 1)))

In [14]:
for name, model in models.items():
  evaluate_model(name, model, Xval, yval)

LR -- Accuracy: 0.758 / Precision: 0.778 / Recall: 0.675 / Latency: 2.2ms
SVM -- Accuracy: 0.753 / Precision: 0.767 / Recall: 0.675 / Latency: 4.3ms
MLP -- Accuracy: 0.742 / Precision: 0.776 / Recall: 0.627 / Latency: 11.5ms
RF -- Accuracy: 0.787 / Precision: 0.846 / Recall: 0.663 / Latency: 9.5ms
XGB -- Accuracy: 0.798 / Precision: 0.862 / Recall: 0.675 / Latency: 6.9ms


In [16]:
best_model = models['XGB']
evaluate_model('XGB', best_model, Xts, yts)

XGB -- Accuracy: 0.832 / Precision: 0.792 / Recall: 0.655 / Latency: 4.5ms


###**Predict on test.csv:**

In [17]:
test_data = pd.read_csv('test.csv')

In [20]:
test_data.shape

(418, 11)

In [21]:
test_data.isnull().sum()

PassengerId      0
Pclass           0
Name             0
Sex              0
Age             86
SibSp            0
Parch            0
Ticket           0
Fare             1
Cabin          327
Embarked         0
dtype: int64

In [23]:
test_data.drop(columns=['PassengerId', 'Name', 'Fare', 'Embarked'])

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Ticket,Cabin
0,3,male,34.5,0,0,330911,
1,3,female,47.0,1,0,363272,
2,2,male,62.0,0,0,240276,
3,3,male,27.0,0,0,315154,
4,3,female,22.0,1,1,3101298,
...,...,...,...,...,...,...,...
413,3,male,,0,0,A.5. 3236,
414,1,female,39.0,0,0,PC 17758,C105
415,3,male,38.5,0,0,SOTON/O.Q. 3101262,
416,3,male,,0,0,359309,


In [22]:
Xtr.head()

Unnamed: 0,Pclass,Sex,Age,Fare,Family_cnt,cabin_ind
0,2,0,62.0,10.5,0,0
1,3,0,8.0,29.125,5,0
2,3,0,32.0,56.4958,0,0
3,3,1,20.0,9.825,1,0
4,2,1,28.0,13.0,0,0
