### What is Confusion Matrix and why you need it?

A __confusion matrix__ is a table that is often used to describe the __performance of a classification model (or "classifier")__ on a set of test data for which the true values are known. 

It is a table with 4 different combinations of predicted and actual values.

![](images\cm_1.PNG)

> It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.

Let’s understand TP, FP, FN, TN in terms of pregnancy analogy.

![](images\cm_2.PNG)



> Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.

![](images\cm_3.PNG)


1. __True Positive__: You predicted positive and it’s true.
    You predicted that a woman is pregnant and she actually is.
2. __True Negative__: You predicted negative and it’s true.
    You predicted that a man is not pregnant and he actually is not.
3. __False Positive: (Type 1 Error)__ You predicted positive and it’s false.
    You predicted that a man is pregnant but he actually is not.
4. __False Negative: (Type 2 Error)__ You predicted negative and it’s false.
    You predicted that a woman is not pregnant but she actually is.


![](images\cm_4.PNG)

### Recall - Out of all the actual positive values, how much we predicted correctly. It should be high as possible.

$$ Recall = \frac{TP}{TP+FN} $$

### Precision - Out of all the positive classes we have predicted correctly, how many are actually positive.

$$ Precision = \frac{TP}{TP+FP} $$

### Accuracy - Out of all the classes, how much we predicted correctly. It should be high as possible.

$$ Accuarcy = \frac{TP+TN}{Actual} $$

### F-measure 
It is difficult to compare two models with low precision and high recall or vice versa. So to make them comparable, we use F-Score. F-score helps to measure Recall and Precision at the same time. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values more.

$$ F-measure = \frac{2*Recall*Precision}{Recall + Precision} $$

***

## Importance of classification matrix

__Classification accuracy__ is the ratio of correct predictions to total predictions made.

The main problem with classification accuracy is that it hides the detail you need to better understand the performance of your classification model. There are two examples where you are most likely to encounter this problem:

1. When your data has more than 2 classes. With 3 or more classes you may get a classification accuracy of 80%, but you don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model.
2. When your data does not have an even number of classes. You may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and you can achieve this score by always predicting the most common class value.

In [1]:
from sklearn.metrics import confusion_matrix

In [2]:
expected = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0]
predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0]

In [3]:
results = confusion_matrix(expected, predicted)
print(results)

[[4 2]
 [1 3]]


### Reference:

1. [Understanding Confusion Matrix](https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62)
2. [What is a Confusion Matrix in Machine Learning](https://machinelearningmastery.com/confusion-matrix-machine-learning/)
3. [Data school confusion matrix](https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/)