## Pipeline: Evaluate results on validation set

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will use what we learned in last section to fit the best few models on the full training set and then evaluate the model on the validation set.

### Read in data

![Eval on Validation](../../img/evaluate_on_validation.png)

In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score

tr_features = pd.read_csv('../../../train_features.csv')
tr_labels = pd.read_csv('../../../train_labels.csv')

val_features = pd.read_csv('../../../val_features.csv')
val_labels = pd.read_csv('../../../val_labels.csv')

te_features = pd.read_csv('../../../test_features.csv')
te_labels = pd.read_csv('../../../test_labels.csv')

### Fit best models on full training set

Results from last section:
```
0.76 (+/-0.116) for {'max_depth': 2, 'n_estimators': 5}
0.796 (+/-0.119) for {'max_depth': 2, 'n_estimators': 50}
0.803 (+/-0.117) for {'max_depth': 2, 'n_estimators': 100}
--> 0.828 (+/-0.074) for {'max_depth': 10, 'n_estimators': 5}
0.816 (+/-0.028) for {'max_depth': 10, 'n_estimators': 50}
--> 0.826 (+/-0.046) for {'max_depth': 10, 'n_estimators': 100}
0.785 (+/-0.106) for {'max_depth': 20, 'n_estimators': 5}
0.813 (+/-0.027) for {'max_depth': 20, 'n_estimators': 50}
0.809 (+/-0.029) for {'max_depth': 20, 'n_estimators': 100}
0.794 (+/-0.04) for {'max_depth': None, 'n_estimators': 5}
0.809 (+/-0.037) for {'max_depth': None, 'n_estimators': 50}
--> 0.818 (+/-0.035) for {'max_depth': None, 'n_estimators': 100}
```

My results from the last section:
```
BEST PARAMS: {'max_depth': 10, 'n_estimators': 5}

BEST SCORE: 0.8315464644683477

MEAN: 0.766 with STD: (+/-0.124) for PARAMETERS: {'max_depth': 2, 'n_estimators': 5}
MEAN: 0.802 with STD: (+/-0.095) for PARAMETERS: {'max_depth': 2, 'n_estimators': 50}
MEAN: 0.798 with STD: (+/-0.116) for PARAMETERS: {'max_depth': 2, 'n_estimators': 100}
MEAN: 0.832 with STD: (+/-0.066) for PARAMETERS: {'max_depth': 10, 'n_estimators': 5}
MEAN: 0.817 with STD: (+/-0.053) for PARAMETERS: {'max_depth': 10, 'n_estimators': 50}
MEAN: 0.822 with STD: (+/-0.047) for PARAMETERS: {'max_depth': 10, 'n_estimators': 100}
MEAN: 0.809 with STD: (+/-0.061) for PARAMETERS: {'max_depth': 20, 'n_estimators': 5}
MEAN: 0.824 with STD: (+/-0.021) for PARAMETERS: {'max_depth': 20, 'n_estimators': 50}
MEAN: 0.815 with STD: (+/-0.029) for PARAMETERS: {'max_depth': 20, 'n_estimators': 100}
MEAN: 0.792 with STD: (+/-0.033) for PARAMETERS: {'max_depth': None, 'n_estimators': 5}
MEAN: 0.813 with STD: (+/-0.031) for PARAMETERS: {'max_depth': None, 'n_estimators': 50}
MEAN: 0.809 with STD: (+/-0.025) for PARAMETERS: {'max_depth': None, 'n_estimators': 100}
```

In [2]:
# Instantiate a new RandomForestClassifier using our best parameters and fit it
rf1 = RandomForestClassifier(n_estimators=5, max_depth=10)
rf1.fit(tr_features, tr_labels.values.ravel())

# Instantiate a new RandomForestClassifier using our 2nd best parameters and fit it
rf2 = RandomForestClassifier(n_estimators=100, max_depth=10)
rf2.fit(tr_features, tr_labels.values.ravel())

# Instantiate a new RandomForestClassifier using our 3rd best parameters and fit it
rf3 = RandomForestClassifier(n_estimators=100, max_depth=None)
rf3.fit(tr_features, tr_labels.values.ravel())

RandomForestClassifier()

### Evaluate models on validation set

![Evaluation Metrics](../../img/eval_metrics.png)

The only examples that the models have seen up to this point have been in the train set. So now this is the true test to find the best model. This is the test of the models ability to generalize to unseen data. If they are overfit or underfit, they will fail here.

Here is where we will be using `Accuracy`, `Precision`, and `Recall` to evaluate these models. In this step we will be selecting the model that generalizes best to the validation set based on these metrics.

In [10]:
num = 0
# Iterate through our 3 models
for mdl in [rf1, rf2, rf3]:
    num += 1
    # Get the prediction for each model
    y_pred = mdl.predict(val_features)
    
    # Get the accuracy score for each model
    accuracy = round(accuracy_score(val_labels, y_pred), 3)
    
    # Get the precision metric for each model
    precision = round(precision_score(val_labels, y_pred), 3)
    
    # Get the recall metric for each model
    recall = round(recall_score(val_labels, y_pred), 3)
    
    # Print our results
    print(f"""
    MODEL:                rf{num}
    MAX DEPTH:            {mdl.max_depth} 
    NUMBER OF ESTIMATORS: {mdl.n_estimators}
    ACCURACY SCORE:       {accuracy} 
    PRECISION SCORE:      {precision} 
    RECALL SCORE:         {recall}
    """
    )


    MODEL:                rf1
    MAX DEPTH:            10 
    NUMBER OF ESTIMATORS: 5
    ACCURACY SCORE:       0.838 
    PRECISION SCORE:      0.841 
    RECALL SCORE:         0.763
    

    MODEL:                rf2
    MAX DEPTH:            10 
    NUMBER OF ESTIMATORS: 100
    ACCURACY SCORE:       0.827 
    PRECISION SCORE:      0.857 
    RECALL SCORE:         0.711
    

    MODEL:                rf3
    MAX DEPTH:            None 
    NUMBER OF ESTIMATORS: 100
    ACCURACY SCORE:       0.788 
    PRECISION SCORE:      0.771 
    RECALL SCORE:         0.711
    


### Evaluate the best model on the test set

![Final Model](../../img/final_model_selection.png)

In [11]:
# My best model turned out to be rf1, hes was rf2
y_pred = rf1.predict(te_features)

# Get the accuracy score for each model
accuracy = round(accuracy_score(te_labels, y_pred), 3)

# Get the precision metric for each model
precision = round(precision_score(te_labels, y_pred), 3)

# Get the recall metric for each model
recall = round(recall_score(te_labels, y_pred), 3)

# Print our results
print(f"""
MAX DEPTH:            {rf1.max_depth} 
NUMBER OF ESTIMATORS: {rf1.n_estimators}
ACCURACY SCORE:       {accuracy} 
PRECISION SCORE:      {precision} 
RECALL SCORE:         {recall}
"""
)


MAX DEPTH:            10 
NUMBER OF ESTIMATORS: 5
ACCURACY SCORE:       0.787 
PRECISION SCORE:      0.737 
RECALL SCORE:         0.646



In [12]:
# My best model turned out to be rf1, hes was rf2 
#    Going to try his best model now against the test set
y_pred = rf2.predict(te_features)

# Get the accuracy score for each model
accuracy = round(accuracy_score(te_labels, y_pred), 3)

# Get the precision metric for each model
precision = round(precision_score(te_labels, y_pred), 3)

# Get the recall metric for each model
recall = round(recall_score(te_labels, y_pred), 3)

# Print our results
print(f"""
MAX DEPTH:            {rf1.max_depth} 
NUMBER OF ESTIMATORS: {rf1.n_estimators}
ACCURACY SCORE:       {accuracy} 
PRECISION SCORE:      {precision} 
RECALL SCORE:         {recall}
"""
)


MAX DEPTH:            10 
NUMBER OF ESTIMATORS: 5
ACCURACY SCORE:       0.798 
PRECISION SCORE:      0.764 
RECALL SCORE:         0.646

