### Evaluation
[Course content](https://ds.codeup.com/classification/evaluation/)

**Objective:** Understand and apply various metrics used to evaluate the performance of a classification model. 

In [1]:
import pandas as pd 
import sklearn.metrics
from sklearn.metrics import confusion_matrix

In [2]:
### A dataframe which contains predicted values and actual values

df = pd.DataFrame({
    'actual': ['coffee', 'no coffee', 'no coffee', 'coffee', 'coffee', 'coffee', 'no coffee', 'coffee'],
    'prediction': ['no coffee', 'no coffee', 'coffee', 'coffee', 'coffee', 'coffee', 'no coffee', 'no coffee'],
})


In [3]:
### View the dataframe
df


Unnamed: 0,actual,prediction
0,coffee,no coffee
1,no coffee,no coffee
2,no coffee,coffee
3,coffee,coffee
4,coffee,coffee
5,coffee,coffee
6,no coffee,no coffee
7,coffee,no coffee


In [5]:
### Use a crosstab to count the outcomes

pd.crosstab(df.actual, df.prediction)

prediction,coffee,no coffee
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
coffee,3,2
no coffee,1,2


### Terminology

The two outcomes in classification are labeled as either **positive** or **negative**. 


While the designations are arbitrary, they impact how evaluation metrics are interpreted. 


### Evaluation on train, test, and split


| Split |  Purpose |
| ----------- | :----------- |
| Train | Evaluate in-sample performance|
| Validate |  Evaluate out of sample performance to tune hyper-parameters |
| Test | Evaluate performance of model |

### Confusion Matrix

A diagram which summarizes the outcomes of a model. 



| Designation      | Description |
| ----------- | ----------- |
| True Negative      | Model correctly predicted the negative outcome       |
| False Positive   | Model incorrectly predicted the positive outcome        |
| False Negative   | Model incorrectly predicted the negative outcome        |
| True Positive      | Model correctly predicted the positive outcome       |



### Confusion Matrix with `sklearn`

'coffee' is the positive outcome`

'no coffee' is the negative outcome


The function `confusion_matrix` returns a 2x2 array

### Components of a confusion matrix
 
 For a confusion matrix $C$,


| Index (row, column)      | Count of |
| ----------- | ----------- |
| $C_{0,0}$      | True Negatives       |
| $C_{1,0}$    |   False Negatives      |
| $C_{1,1}$    |   True Positives      |
| $C_{0,1}$    |   False Positives      |

In [8]:
### Return a confusion matrix for the model's predictions


confusion_matrix(df.actual, df.prediction, labels = ('no coffee','coffee'))

array([[2, 1],
       [2, 3]])

### Evaluation Metrics

### Accuracy 

Accuracy evaluates how many correct predictions (both positive and negative) were made over the total number of predictions. 


$\texttt{Accuracy} = \dfrac{TP + TN}{TP + FP + FN + TN}$

### Precision

Precision evaluates how many of the positive predictions were correct.

$\texttt{Precision} = \dfrac{TP}{TP + FP}$

### Recall

Recall evaluates how the model handled all positive outcomes. 

$\texttt{Recall} = \dfrac{TP}{TP + FN}$


### Misclassification Rate

Misclassification rate concerns how many predictions were incorrect. This accounts for all other outcomes not included in the calculation of accuracy. 

$\texttt{Misclassification Rate} = 1 - \texttt{Accuracy}$

### Sensitivity (True Positive Rate)


$\texttt{True Positive Rate} = \dfrac{TP}{TP + FN} $

### Specificity 

How well does the model predict negative outcomes?


$\texttt{Specificity} = \dfrac{TN}{FP + TN}$

### Negative Predictive Value

$\texttt{NPV} = \dfrac{TN}{TN + FN}$

### F1 Score

$\texttt{F1  Score} = 2 * \dfrac{Precision * Recall}{Precision + Recall}$

## Baseline

The baseline is a simple model that is a reference point for the performance of other models. 

For a classification model, a baseline is often the mode.
    

In [9]:
### Find the counts of each outcome
df.actual.value_counts()

coffee       5
no coffee    3
Name: actual, dtype: int64

In [44]:
### Set the baseline_prediction to be coffee
df['baseline_prediction'] = 'coffee'

<div class="alert alert-block alert-info">
    
### Evaluation Examples

## Accuracy 

In [42]:
#comapre the models prediction to actual
model_accuracy = (df.prediction == df.actual).mean()
model_accuracy

0.625

In [43]:
#compare the baseline to actual
baseline_accuarcy = (df.baseline_prediction == df.actual).mean()
baseline_accuarcy

0.625

In [32]:
print(f'Model Accuracy: {model_accuracy.2f}')
print(f'Model Accuracy: {baseline_accuracy.2f}')

SyntaxError: f-string: invalid syntax (987084884.py, line 1)

## Recall

In [38]:
#restrict to positive values ('coffee') for the actual values

subset = df[df.actual == 'coffee']

In [39]:
# Model Recall

model_recall = (subset.prediction == subset.actual).mean()

In [40]:
# Baseline Recall

baseline_recall = model_recall = (subset.baseline_prediction == subset.actual).mean()

In [45]:
# SOMETHING IS WRONG HERE

print(f'Model recall: {model_recall:.2%}')
print(f'Baseline recall: {baseline_recall:.2%}')

Model recall: 100.00%
Baseline recall: 100.00%


## Precision

In [47]:
#Restrict to positive values ('coffee') for the PREDICTED values
subset = df[df.prediction == 'coffee']

#Model Precision
model_precision = (subset.prediction == subset.actual).mean()

#Baseline Precision
subset = df[df.baseline_prediciton == 'coffee']
baseline_precision = (subset.baseline_prediciton == subset.actual).mean()

In [48]:
print(f'Model precision: {model_precision:.2%}')
print(f'Baseline precision: {baseline_precision:.2%}')

Model precision: 75.00%
Baseline precision: 62.50%
