In this notebook, we are going discuss about methods for evaluating how a model performs - **evaluation metrics**

### Confusion matrix

Below, we have an example of what is called a **confusion matrix**, it represents a table that describes how the model performed in terms of classfying the data points. For example, instead of only knowing that a point is incorrectly classified we would like to know where does it actually fall compared to the line. We would like to know what **type** of error are we dealing with.

![Confusion matrix](images/confusion_matrix.png)

Let's think about the graph for a minute. How many True Positives, True Negatives, False Positives, and False Negatives, are in the model above? What we have is:
- 6 True Positives (correctly classified)
- 5 True Negatives (correctly classified)
- 2 False Positives 
- 1 False Negatives

Sometimes in the literature, you'll see False Positives and False Negatives as Type 1 and Type 2 errors. Here is the correspondence:

![Confusion matrix medical](images/confusion_matrix_medical.png)

- **Type 1 Error** (Error of the first kind, or False Positive): In the medical example, this is when we misdiagnose a healthy patient as sick.
- **Type 2 Error** (Error of the second kind, or False Negative): In the medical example, this is when we misdiagnose a sick patient as healthy.

### Accuracy

This is one of the ways of measuring how good a model is. To better visualize how it works, let's imagine the confusion table again, besides the types we also have the counting for each category. The way accuracy is essentially as a **ratio** of all correctly classified points and total points:

$$ Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$$ 

where: TP = True positive; FP = False positive; TN = True negative; FN = False negative

![Accuracy](images/accuracy.png)

### When accuracy is not enough?

While accuracy is a simple way of evaluating the performance it tends to work very poorly when the data is highly skewed failing to capture the incorrectly classified points in a useful manner.

### Precision 

"Precision refers to the closeness of two or more measurements to each other. Using the example above, if you weigh a given substance five times, and get 3.2 kg each time, then your measurement is very precise. Precision is independent of accuracy. You can be very precise but inaccurate, as described above. You can also be accurate but imprecise." $P = \frac{TP}{TP + FP}$

![Precision](images/precision_example.png)


### Recall

Recall is the opposite measure of precision, out of the points labelled as positive how many of them were actually did we correctly predicted? $R = \frac{TP}{TP + FN}$

![Recall](images/recall_example.png)

### Receiver operating characteristic (ROC)

From [Wikipedia](https://en.wikipedia.org/wiki/Receiver_operating_characteristic): *An ROC space is defined by FPR and TPR as x and y axes, respectively, which depicts relative trade-offs between true positive (benefits) and false positive (costs)*. 

False positive rate: $FPR = \frac{FP}{FP + TN}$

True positive rate: $TPR = \frac{TP}{TP + FN}$

![Area under ROC](images/area_roc.png)

Area under ROC (AUC) can also go below 0.5 all the way to 0. Rule of thumb, the closer to one, the better.