## Pipeline: Evaluate results on validation set

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will use what we learned in last section to fit the best few models on the full training set and then evaluate the model on the validation set.

### Read in data

![Eval on Validation](../img/evaluate_on_validation.png)

In [2]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score

train_features = pd.read_csv('../Data/train_features.csv')
train_labels = pd.read_csv('../Data/train_labels.csv')

val_features = pd.read_csv('../Data/val_features.csv')
val_labels = pd.read_csv('../Data/val_labels.csv')

test_features = pd.read_csv('../Data/test_features.csv')
test_labels = pd.read_csv('../Data/test_labels.csv')

### Fit best models on full training set

Results from last section:
```
0.76 (+/-0.116) for {'max_depth': 2, 'n_estimators': 5}
0.796 (+/-0.119) for {'max_depth': 2, 'n_estimators': 50}
0.803 (+/-0.117) for {'max_depth': 2, 'n_estimators': 100}
--> 0.828 (+/-0.074) for {'max_depth': 10, 'n_estimators': 5}
0.816 (+/-0.028) for {'max_depth': 10, 'n_estimators': 50}
--> 0.826 (+/-0.046) for {'max_depth': 10, 'n_estimators': 100}
0.785 (+/-0.106) for {'max_depth': 20, 'n_estimators': 5}
0.813 (+/-0.027) for {'max_depth': 20, 'n_estimators': 50}
0.809 (+/-0.029) for {'max_depth': 20, 'n_estimators': 100}
0.794 (+/-0.04) for {'max_depth': None, 'n_estimators': 5}
0.809 (+/-0.037) for {'max_depth': None, 'n_estimators': 50}
--> 0.818 (+/-0.035) for {'max_depth': None, 'n_estimators': 100}
```

#### Why do we need to refit on the full training set?
- these models are originally fit only on 80% of the training data
- because when we are doing five fold cross validation, in each loop we are only using 80% of the data for the training and remaining 20% for testing.
- Now we want to evaluate this on validation set, this will allow our model to learn from full training data set

In [3]:
#using best model based on last section result, 2nd best and 3rd best

rf1 = RandomForestClassifier(n_estimators=5, max_depth=10)
rf1.fit(train_features, train_labels.values.ravel())

rf2 = RandomForestClassifier(n_estimators=100, max_depth=10)
rf2.fit(train_features, train_labels.values.ravel())

rf3 = RandomForestClassifier(n_estimators=100, max_depth=None)
rf3.fit(train_features, train_labels.values.ravel())

RandomForestClassifier()

### Evaluate models on validation set

![Evaluation Metrics](../img/eval_metrics.png)

In [5]:
for mdl in [rf1, rf2, rf3]:
    y_pred = mdl.predict(val_features)
    
    accuracy = round(accuracy_score(val_labels, y_pred), 3)
    precision = round(precision_score(val_labels, y_pred), 3)
    recall = round(recall_score(val_labels, y_pred), 3)
    
    print('MAX DEPTH: {} / # OF EST: {} -- A: {} / P: {} / R: {}'.format(mdl.max_depth,
                                                                                                    mdl.n_estimators,
                                                                                                    accuracy,
                                                                                                    precision,
                                                                                                    recall))

MAX DEPTH: 10 / # OF EST: 5 -- A: 0.816 / P: 0.841 / R: 0.697
MAX DEPTH: 10 / # OF EST: 100 -- A: 0.838 / P: 0.862 / R: 0.737
MAX DEPTH: None / # OF EST: 100 -- A: 0.793 / P: 0.783 / R: 0.711


### Evaluate the best model on the test set

![Final Model](../img/final_model_selection.png)

#### based on the result, we knew 2nd model generate higher accuracy, precision and recall
As a final step, we will evaluate this model with Testing data set


In [6]:
y_pred = rf2.predict(test_features)

accuracy = round(accuracy_score(test_labels, y_pred), 3)
precision = round(precision_score(test_labels, y_pred), 3)
recall = round(recall_score(test_labels, y_pred), 3) 

print(
    'MAX DEPTH: {} / # OF EST: {} -- A: {} / P: {}, R: {}'.format(rf2.max_depth,
                                                                                          rf2.n_estimators,
                                                                                          accuracy,
                                                                                          precision,
                                                                                          recall))

MAX DEPTH: 10 / # OF EST: 100 -- A: 0.787 / P: 0.729, R: 0.662
