### What is Confusion Matrix and why you need it?

A __confusion matrix__ is a table that is often used to describe the __performance of a classification model (or "classifier")__ on a set of test data for which the true values are known. 

Note: Used for __classification model__ and can be computed only when __true values are known__ (i.e, we have train-test split).

The size of confusion matrix nxn depends on the number of labels n we have. So for, n - 3 labels we will have 3x3 matrix. For n = 10, we will have 10x10 matrix.

For binary classifcation, it is a table with 4 different combinations of predicted and actual values.

> Layout of confusion matrix can be different based on the software we are using.

![](images\cm_1.PNG)

Imp pts:
1. These TP, TN, FP and FN are whole numbers and not %ges or fractions.
2. Must be binary classifiders and must be assignee +ve or -ve. For labels such as blue and green, take one as +ve and other as -ve.
3. For >2 classes, do not use terminology TP, TN, FP and FN. As it will cause confusion.



> It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.


### True +ve, True -ve, False +ve and False -ve - Only for binary classifiers

Let’s understand TP, FP, FN, TN in terms of pregnancy analogy. For better understanding for yourself insert predicted between the 2 terms. Example: True _predicted_ positive, True _predicted_ negative, etc.

Our aim for a classifier must be to always maximise TP and TN.

![](images\cm_2.PNG)

> Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.

![](images\cm_3.PNG)


1. __True Positive__: You predicted positive and it’s true.
    You predicted that a woman is pregnant and she actually is.
2. __True Negative__: You predicted negative and it’s true.
    You predicted that a man is not pregnant and he actually is not.
3. __False Positive: (Type 1 Error)__ You predicted positive and it’s false.
    You predicted that a man is pregnant but he actually is not.
4. __False Negative: (Type 2 Error)__ You predicted negative and it’s false.
    You predicted that a woman is not pregnant but she actually is.


![](images\cm_4.PNG)

### Recall or True +ve rate or Sensitivity
When its actually True how often does it predict +ve.   
Out of all the actual positive values, how much we predicted correctly. It should be high as possible.

$$ Recall = \frac{TP}{TP+FN} $$ or
$$ = \frac{TP}{total(true)}$$

### True -ve rate or Specificity
When its actually False, how often does it predict -ve.

$$ Specificity = \frac{TN}{FP+TN} $$ or
$$ = \frac{TN}{total(False)}$$

### Precision 
When we predict +ve, how often it is correct?  
Out of all the positive classes we have predicted correctly, how many are actually positive.

$$ Precision = \frac{TP}{TP+FP} $$

### Classification Accuracy 
Out of all the classes, how much we predicted correctly. It should be high as possible. Overall how often is the clasifier correct?

$$ Accuarcy = \frac{TP+TN}{total} $$

For more than 2 classes, accuracy is calculated based on the sum of count preductions along the diagonal.


### Misclassification rate 
Overall, how often is classifer wrong?

 $$ Misclassification-rate = \frac{FP+FN}{total} $$ or,
 $$ = 1- accuracy$$




### F-measure / F1-score
It is difficult to compare two models with low precision and high recall or vice versa. So to make them comparable, we use F-Score. F-score helps to measure Recall and Precision at the same time. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values more.

$$ F-measure = \frac{2*Recall*Precision}{Recall + Precision} $$

##### Example 1

![](images\f1_1.PNG)

##### Example 2

![](images\f1_2.PNG)

***

## Importance of classification matrix

__Classification accuracy__ is the ratio of correct predictions to total predictions made.

The main problem with classification accuracy is that it hides the detail you need to better understand the performance of your classification model. There are two examples where you are most likely to encounter this problem:

1. When your data has more than 2 classes. With 3 or more classes you may get a classification accuracy of 80%, but you don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model.
2. When your data does not have an even number of classes. You may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and you can achieve this score by always predicting the most common class value.

In [1]:
from sklearn.metrics import confusion_matrix

In [2]:
expected = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0]
predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0]

In [3]:
results = confusion_matrix(expected, predicted)
print(results)

[[4 2]
 [1 3]]


In [4]:
y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])

array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]], dtype=int64)

In sklearns layout of confusion matrix, actual values are aligned on the left hand side and the predicted values on the top. And labels are arranged in alphabetical order.

![](images/cm_sk.PNG)

In the binary case, we can extract true positives, etc as follows using `confusion_matrix(y_true, y_pred).ravel()`:

In [5]:
tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
tn, fp, fn, tp

(0, 2, 1, 1)

##### Confusion matrix and ROC curves are tool which aid in evaluation of a classifier, they are not the evaluation metrics.

***

## How to calculate the various metrics like recall and precision with 3 classes (eg: iris dataset)?

In [6]:
from sklearn.metrics import classification_report

In [7]:
y_true = [0,1,2,2,2]
y_pred = [0,0,2,2,1]
target_names = ['class 0', 'class 1', 'class 2']

In [9]:
print(classification_report(y_true, y_pred, target_names = target_names))

              precision    recall  f1-score   support

     class 0       0.50      1.00      0.67         1
     class 1       0.00      0.00      0.00         1
     class 2       1.00      0.67      0.80         3

    accuracy                           0.60         5
   macro avg       0.50      0.56      0.49         5
weighted avg       0.70      0.60      0.61         5



scikit-learn calculates the precision, recall and f1-score for all the three values.

- __class 0 recall is 1__ i.e, we have a zero in y_true amd it has been correctly predicted in y_pred.
- __class 0 precision - When 0 was predicted, how often was it correct? - 50%__ i.e, 0 predicted twice but only 1 zero in true values. 

![](images/conf_bus.PNG)

### Reference:

1. [Understanding Confusion Matrix](https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62)
2. [What is a Confusion Matrix in Machine Learning](https://machinelearningmastery.com/confusion-matrix-machine-learning/)
3. [Data school confusion matrix](https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/)