<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Evaluation-Metrics---Lab" data-toc-modified-id="Evaluation-Metrics---Lab-1">Evaluation Metrics - Lab</a></span><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1.1">Introduction</a></span></li><li><span><a href="#Objectives" data-toc-modified-id="Objectives-1.2">Objectives</a></span></li><li><span><a href="#Getting-Started" data-toc-modified-id="Getting-Started-1.3">Getting Started</a></span></li><li><span><a href="#Confusion-Matrix" data-toc-modified-id="Confusion-Matrix-1.4">Confusion Matrix</a></span></li><li><span><a href="#Checking-Our-Work-with-sklearn" data-toc-modified-id="Checking-Our-Work-with-sklearn-1.5">Checking Our Work with sklearn</a></span></li><li><span><a href="#(Optional)-Visualizing-Confusion-Matrices" data-toc-modified-id="(Optional)-Visualizing-Confusion-Matrices-1.6">(Optional) Visualizing Confusion Matrices</a></span></li><li><span><a href="#Calculating-Evaluation-Metrics" data-toc-modified-id="Calculating-Evaluation-Metrics-1.7">Calculating Evaluation Metrics</a></span><ul class="toc-item"><li><span><a href="#Precision" data-toc-modified-id="Precision-1.7.1">Precision</a></span></li><li><span><a href="#Recall" data-toc-modified-id="Recall-1.7.2">Recall</a></span></li><li><span><a href="#Accuracy" data-toc-modified-id="Accuracy-1.7.3">Accuracy</a></span></li><li><span><a href="#F1-Score" data-toc-modified-id="F1-Score-1.7.4">F1-Score</a></span></li></ul></li><li><span><a href="#Calculating-Metrics-with-sklearn" data-toc-modified-id="Calculating-Metrics-with-sklearn-1.8">Calculating Metrics with sklearn</a></span></li><li><span><a href="#Classification-Reports" data-toc-modified-id="Classification-Reports-1.9">Classification Reports</a></span></li><li><span><a href="#Summary" data-toc-modified-id="Summary-1.10">Summary</a></span></li></ul></li></ul></div>

# Evaluation Metrics - Lab

## Introduction

In this lab, we'll calculate various evaluation metrics to compare to evaluate classifier performance!

## Objectives

You will be able to:

* Read and interpret results using a Confusion Matrix
* Calculate and interpret precision and recall and evaluation metrics for classification
* Calculate and interpret accuracy and f1-score as evaluation metrics for classification

## Getting Started

For this lab, you're going to read in a DataFrame containing various predictions from different models, as well as the ground-truth labels for the dataset that each model was making predictions on. You'll also write various functions to help you easily calculate important evaluation metrics such as **_Precision_**, **_Recall_**, **_Accuracy_**, and **_F1-Score_**.

Let's start by reading in our dataset. You'll find the dataset stored in `'model_performance.csv'`. In the cell below, use pandas to read this dataset into a DataFrame, and inspect the head.

In [16]:
from JMI_MVM import *
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

In [18]:
df = pd.read_csv('model_performance.csv')
df.head()

Unnamed: 0,Model 1 Predictions,Model 2 Predictions,Model 3 Predictions,Labels
0,1,1,1,1
1,1,1,1,1
2,1,1,1,1
3,0,1,1,0
4,0,0,1,1


The dataset consists of model predictions from 3 different models, as well as the corresponding labels for row in the dataset. 

In the cell below, store each of the following predictions and labels in separate variables.

In [22]:
model1_preds = df.iloc[:,0]
model2_preds = df.iloc[:,1]
model3_preds = df.iloc[:,2]
labels = df.iloc[:,3]

Good! Now, let's get started by building a confusion matrix!

## Confusion Matrix

In the cell below, complete the `conf_matrix` function.  This function should:

* Take in 2 arguments: 
    * `y_true`, an array of labels
    * `y_pred`, an array of model predictions
* Return a Confusion Matrix in the form of a dictionary, where the keys are `'TP', 'TN', 'FP', 'FN'`. 

In [56]:
def conf_matrix(y_true, y_pred):
    TP = 0
    TN = 0
    FP = 0
    FN = 0
    for i, yi in enumerate(y_pred):
        if yi==1:
            if yi == y_true[i]:
                TP+=1
            elif yi != y_true[i]:
                FP+=1
        elif yi==0:
            if yi == y_true[i]:
                TN+=1
            elif yi != y_true[i]:
                FN+=1
    results = {'TP':TP,
               'TN':TN,
               'FP':FP,
               'FN':FN}
#     from IPython.display import display
#     display(results)
    return results
            

Great! Now, let's double check that our function was created correctly by creating confusion matrices for each of our 3 models. Expected outputs have been provided for you to check your results against.

In [57]:
labels.shape, model1_preds.shape

((10000,), (10000,))

In [58]:
# Model 1 Expected Output: {'TP': 6168, 'TN': 2654, 'FP': 346, 'FN': 832}
model1_confusion_matrix = conf_matrix(labels, model1_preds)
model1_confusion_matrix

{'TP': 6168, 'TN': 2654, 'FP': 346, 'FN': 832}

In [59]:
# Model 2 Expected Output: {'TP': 3914, 'TN': 1659, 'FP': 1341, 'FN': 3086}
model2_confusion_matrix = conf_matrix(labels, model2_preds)
model2_confusion_matrix

{'TP': 3914, 'TN': 1659, 'FP': 1341, 'FN': 3086}

In [60]:
# Model 3 Expected Output: {'TP': 5505, 'TN': 2319, 'FP': 681, 'FN': 1495}
model3_confusion_matrix = conf_matrix(labels, model3_preds)
model3_confusion_matrix

{'TP': 5505, 'TN': 2319, 'FP': 681, 'FN': 1495}

## Checking Our Work with sklearn

To check our work, let's make use the the `confusion_matrix()` function found in `sklearn.metrics` to create some confusion matrices and make sure that sklearn's results match up with our own.

In the cells below, import the `confusion_matrix()` function, use it to create a confusion matrix for each of our models, and then compare the results with the confusion matrices we created above. 

In [61]:
from sklearn.metrics import confusion_matrix

model1_sk_cm = conf_matrix(labels, model1_preds)
model1_sk_cm

{'TP': 6168, 'TN': 2654, 'FP': 346, 'FN': 832}

In [62]:
model2_sk_cm = conf_matrix(labels, model2_preds)
model2_sk_cm

{'TP': 3914, 'TN': 1659, 'FP': 1341, 'FN': 3086}

In [63]:
model3_sk_cm = conf_matrix(labels, model3_preds)
model3_sk_cm

{'TP': 5505, 'TN': 2319, 'FP': 681, 'FN': 1495}

## (Optional) Visualizing Confusion Matrices

In the cells below, use the visualization function shown in the **_Confusion Matrices_** lesson to visualize each of the confusion matrices created above. 

In [70]:

# # def show_cf(y_true, y_pred, class_names=None, model_name=None):
# def show_cf(cf, class_names=None, model_name=None):
    
#     import numpy as np
#     import itertools
#     import matplotlib.pyplot as plt

# #     cf = confusion_matrix(y_true, y_pred)
#     plt.imshow(cf, cmap=plt.cm.Blues)
    
#     if model_name:
#         plt.title("Confusion Matrix: {}".format(model_name))
#     else:
#         plt.title("Confusion Matrix")
#     plt.ylabel('True Label')
#     plt.xlabel('Predicted Label')
    
#     class_names = set(y_true)
#     tick_marks = np.arange(len(class_names))
#     if class_names:
#         plt.xticks(tick_marks, class_names)
#         plt.yticks(tick_marks, class_names)
    
#     thresh = cf.max() / 2.
    
#     for i, j in itertools.product(range(cf.shape[0]), range(cf.shape[1])):
#         plt.text(j, i, cf[i, j], horizontalalignment='center', color='white' if cf[i, j] > thresh else 'black')

#     plt.colorbar()

# # show_cf(example_labels, example_preds)

In [73]:
# show_cf(model1_sk_cm)

## Calculating Evaluation Metrics

Now, we'll use our newly created confusion matrices to calculate some evaluation metrics. 

As a reminder, here are the equations for each evaluation metric we'll be calculating in this lab:

### Precision

$$Precision = \frac{\text{Number of True Positives}}{\text{Number of Predicted Positives}}$$

### Recall

$$Recall = \frac{\text{Number of True Positives}}{\text{Number of Actual Total Positives}}$$

### Accuracy

$$Accuracy = \frac{\text{Number of True Positives + True Negatives}}{\text{Total Observations}}$$

### F1-Score

$$F1-Score = 2\ \frac{Precision\ x\ Recall}{Precision + Recall}$$

In each of the cells below, complete the function to calculate the appropriate evaluation metrics. Use the output to fill in the following table: 

|  Model  | Precision | Recall | Accuracy | F1-Score |
|:-------:|:---------:|:------:|:--------:|:--------:|
| Model 1 |     0.94688363524716      |    0.8811428571428571    |     0.8822     |     0.9128311380790292     |
| Model 2 |     0.744814462416746      |    0.5591428571428572    |     0.5573     |    0.6387596899224806      |
| Model 3 |    0.8899127061105723      |   0.7864285714285715     |    0.7824      |     0.8349764902168968     |

**_QUESTION:_** Which model performed the best? How do arrive at your answer?

In [74]:
def precision(confusion_matrix):
    return confusion_matrix['TP'] / (confusion_matrix['TP'] + confusion_matrix['FP'])
print(precision(model1_confusion_matrix)) # Expected Output: 0.94688363524716
print(precision(model2_confusion_matrix)) # Expected Output: 0.744814462416746
print(precision(model3_confusion_matrix)) # Expected Output: 0.8899127061105723

0.94688363524716
0.744814462416746
0.8899127061105723


In [75]:
def recall(confusion_matrix):
    return confusion_matrix['TP'] / (confusion_matrix['TP'] + confusion_matrix['FN'])


print(recall(model1_confusion_matrix)) # Expected Output: 0.8811428571428571
print(recall(model2_confusion_matrix)) # Expected Output: 0.5591428571428572
print(recall(model3_confusion_matrix)) # Expected Output: 0.7864285714285715

0.8811428571428571
0.5591428571428572
0.7864285714285715


In [76]:
def accuracy(confusion_matrix):
    return (confusion_matrix['TP'] + confusion_matrix['TN']) / sum(confusion_matrix.values())


print(accuracy(model1_confusion_matrix)) # Expected Output: 0.8822
print(accuracy(model2_confusion_matrix)) # Expected Output: 0.5573
print(accuracy(model3_confusion_matrix)) # Expected Output: 0.7824

0.8822
0.5573
0.7824


In [77]:
def f1(confusion_matrix):
    precision_score = precision(confusion_matrix)
    recall_score = recall(confusion_matrix)
    numerator = precision_score * recall_score
    denominator = precision_score + recall_score
    return 2 * (numerator / denominator)

print(f1(model1_confusion_matrix)) # Expected Output: 0.9128311380790292
print(f1(model2_confusion_matrix)) # Expected Output: 0.6387596899224806
print(f1(model3_confusion_matrix)) # Expected Output: 0.8349764902168968

0.9128311380790292
0.6387596899224806
0.8349764902168968


Great Job! Let's check our work with sklearn. 

## Calculating Metrics with sklearn

Each of the metrics we calculated above are also available inside the `sklearn.metrics` module.  

In the cell below, import the following functions:

* `precision_score`
* `recall_score`
* `accuracy_score`
* `f1_score`

Then, use the `labels` and the predictions from each model (not the confusion matrices) to double check the performance of our functions above. 

In [81]:
# Import everything needed here first!


preds = [model1_preds, model2_preds, model3_preds]

for ind, i in enumerate(preds):
    print('-'*40)
    print('Model {} Metrics:'.format(ind + 1))
    print('Precision: {}'.format(precision_score(labels,i)))
    print('Recall: {}'.format(recall_score(labels,i)))
    print('Accuracy: {}'.format(accuracy_score(labels,i)))
    print('F1-Score: {}'.format(f1_score(labels,i)))

----------------------------------------
Model 1 Metrics:
Precision: 0.94688363524716
Recall: 0.8811428571428571
Accuracy: 0.8822
F1-Score: 0.9128311380790292
----------------------------------------
Model 2 Metrics:
Precision: 0.744814462416746
Recall: 0.5591428571428572
Accuracy: 0.5573
F1-Score: 0.6387596899224806
----------------------------------------
Model 3 Metrics:
Precision: 0.8899127061105723
Recall: 0.7864285714285715
Accuracy: 0.7824
F1-Score: 0.8349764902168968


## Classification Reports

Remember that table that you filled out above? It's called a **_Classification Report_**, and it turns out that sklearn can even create one of those for you! This classification report even breaks down performance by individual class predictions for your model. 

In closing, let's create some and interpret some classification reports using sklearn. Like everything else we've used this lab, you can find the `classification_report()` function inside the `sklearn.metrics` module.  This function takes in two required arguments: labels, and predictions. 

Complete the code in the cell below to create classification reports for each of our models. 

In [83]:
# Import classification_report below!
from sklearn.metrics import classification_report
for ind, i in enumerate(preds):
    print('-'*40)
    print("Model {} Classification Report:".format(ind + 1))
    print(classification_report(labels,i))

----------------------------------------
Model 1 Classification Report:
              precision    recall  f1-score   support

           0       0.76      0.88      0.82      3000
           1       0.95      0.88      0.91      7000

   micro avg       0.88      0.88      0.88     10000
   macro avg       0.85      0.88      0.87     10000
weighted avg       0.89      0.88      0.88     10000

----------------------------------------
Model 2 Classification Report:
              precision    recall  f1-score   support

           0       0.35      0.55      0.43      3000
           1       0.74      0.56      0.64      7000

   micro avg       0.56      0.56      0.56     10000
   macro avg       0.55      0.56      0.53     10000
weighted avg       0.63      0.56      0.58     10000

----------------------------------------
Model 3 Classification Report:
              precision    recall  f1-score   support

           0       0.61      0.77      0.68      3000
           1       0.

## Summary

In this lab, we manually calculated various evaluation metrics to help us evaluate classifier performance, and we also made use of preexisting tools inside of sklearn for the same purpose. 