# Confusion Matrix
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix.
A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm.
It allows easy identification of confusion between classes e.g. one class is commonly mislabeled as the other. Most performance measures are computed from the confusion matrix.

It gives us insight not only into the errors being made by a classifier but more importantly the types of errors that are being made.

![image1](Assets/Confusion_Matrix1_1.png)

Here,
* Class 1 : Positive
* Class 2 : Negative

### Definition of the Terms:
* Positive (P) : Observation is positive (for example: is an apple).
* Negative (N) : Observation is not positive (for example: is not an apple).
* True Positive (TP) : Observation is positive, and is predicted to be positive.
* False Negative (FN) : Observation is positive, but is predicted negative.
* True Negative (TN) : Observation is negative, and is predicted to be negative.
* False Positive (FP) : Observation is negative, but is predicted positive.

### Classification Rate/Accuracy:
Classification Rate or Accuracy is given by the relation:
![image2](Assets/Confusion_Matrix2_2.png)

However, there are problems with accuracy. It assumes equal costs for both kinds of errors. A 99% accuracy can be excellent, good, mediocre, poor or terrible depending upon the problem.

### Recall
Recall can be defined as the ratio of the total number of correctly classified positive examples divide to the total number of positive examples. High Recall indicates the class is correctly recognized (small number of FN).

Recall is given by the relation:

![image3](Assets/Confusion_Matrix3_3.png)

### Precision
To get the value of precision we divide the total number of correctly classified positive examples by the total number of predicted positive examples. High Precision indicates an example labeled as positive is indeed positive (small number of FP).
Precision is given by the relation:

![image4](Assets/Confusion_Matrix4_4.png)

### High recall, low precision
This means that most of the positive examples are correctly recognized (low FN) but there are a lot of false positives.

### Low recall, high precision
This shows that we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP)

### F-measure 
Since we have two measures (Precision and Recall) it helps to have a measurement that represents both of them. We calculate an F-measure which uses Harmonic Mean in place of Arithmetic Mean as it punishes the extreme values more.
The F-Measure will always be nearer to the smaller value of Precision or Recall.

![image5](Assets/Confusion_Matrix5_5.png)

Let’s consider an example now, in which we have infinite data elements of class B and a single element of class A and the model is predicting class A against all the instances in the test data.
Here,
<br>
Precision : 0.0
<br>
Recall : 1.0
<br>
Now
<br>
Arithmetic mean: 0.5
<br>
Harmonic mean: 0.0
<br>
When taking the arithmetic mean, it would have 50% correct. Despite being the worst possible outcome! While taking the harmonic mean, the F-measure is 0.



Example to interpret confusion matrix:

![image6](Assets/Confusion_Matrix6.png)

For the simplification of the above confusion matrix i have added all the terms like TP,FP,etc and the row and column totals in the following image:

![image7](Assets/Confusion_Matrix7.png)

Now,
<br>
Classification Rate/Accuracy:
<br>
Accuracy = (TP + TN) / (TP + TN + FP + FN)= (100+50) /(100+5+10+50)= 0.90
<br>
Recall: Recall gives us an idea about when it’s actually yes, how often does it predict yes.
<br>
Recall=TP / (TP + FN)=100/(100+5)=0.95
<br>
Precision: Precsion tells us about when it predicts yes, how often is it correct.
<br>
Precision = TP / (TP + FP)=100/ (100+10)=0.91
<br>
F-measure:
<br>
Fmeasure=(2$\times$Recall\timesPrecision)/(Recall+Presision)=
<br>
(2$\times$0.95$\times$0.91)/(0.91+0.95)=0.92

### Creating a confusion matrix in Python

Below is the implementation of the confusion matrix with the help of sklearn library.
<br>
Please Note 
The 'acutual' and 'predicted' variables in the below code is used just for this example. You can replace the data after creating a machine learning model with the original data and results predicted by the model.

In [1]:
from sklearn.metrics import confusion_matrix 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 
actual = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0] 
predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0] 
results = confusion_matrix(actual, predicted) 
print('Confusion Matrix :')
print(results) 
print('Accuracy Score :',accuracy_score(actual, predicted))
print('Report : ')
print(classification_report(actual, predicted))

Confusion Matrix :
[[4 2]
 [1 3]]
Accuracy Score : 0.7
Report : 
              precision    recall  f1-score   support

           0       0.80      0.67      0.73         6
           1       0.60      0.75      0.67         4

    accuracy                           0.70        10
   macro avg       0.70      0.71      0.70        10
weighted avg       0.72      0.70      0.70        10



References :
<br>
https://machinelearningmastery.com/confusion-matrix-machine-learning/
<br>
http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/