### Confusion Matrix

1) It is a performance metric for classification problem statements.<br>
2) It is a square matrix made up of 4 terms - TP,TN,FP,FN<br>
3) If target variable has n categories then shape of the confusion matrix will be (n,n)<br>

#### Terminologies

<b>1) TP (True Positive) -</b><br>
Actual value is positive, ML model also predicted a positive value.

<b>2) FN (False Negative) - </b><br>
Actual value is positive, ML model predicted a negative value

<b>3) FP (False Positive) - </b><br>
Actual value is negative, ML model predicted a positive value

<b>4) TN (True Negative) - </b><br>
Actual value is negative, ML model also predicted a negative value

#### Note
Sum of all the actual positive values/cases = TP + FN<br>
Sum of all the actual negative values/cases = FP + TN <br>
Sum of all the positively predicted values/cases = TP + FP<br>
Sum of all the negatively predicted values/cases = FN + TN<br>

In [1]:
# x = ['Age','Gender','BMI','Body_Weight','Blood_Glucose_Level']
# y = Diabetic (0) or Non-diabetic(1)

In [3]:
# Binary Classification Example
# 0 - Positive (True), 1-Negative(False)
y_true = [1,0,1,1,0,0,1,0,0,1,0,1,1,0,1,0,0,1]  # Actual values
y_pred = [0,1,0,1,0,0,1,1,0,1,1,0,0,1,0,1,0,0]  # Predicted values 
print(len(y_true),len(y_pred))

18 18


### Classification Metrics

<b>1) Precision - TP/(TP+FP), TN/(TN+FN)</b><br>
From all the positively predicted cases, how many are actually positive<br>
From all the negatively predicted cases, how many are actually negative

<b>2) Recall - TP/(TP+FN), TN/(TN+FP)</b><br>
From all the actual positive cases, how many has the ML model predicted positive<br>
From all the actual negative cases, how many has the ML model predicted negative<br>

<b>3) F1-Score - 2 * Precision * Recall /( Precision + Recall)</b><br>
It is the Harmonic mean between Precision and Recall<br>

<b>4) Accuracy - (TP+TN)/(TP+FN+FP+TN) </b><br>
Out of all the values, how many the model has predicted correctly

In [4]:
print(y_true)
print(y_pred)

[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1]
[0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0]


In [9]:
from sklearn.metrics import confusion_matrix,classification_report

In [7]:
cm = confusion_matrix(y_true,y_pred)
print(cm)
# Actual value = 0, predicted value is 0 => TP = 4
# Actual value = 0, predicted value is 1 => FN = 5
# Actual value = 1, predicted value is 0 => FP = 6
# Actual value = 1, predicted value is 1 => TN = 3
# [TP=4 FN=5]
# [FP=6 TN=3]

[[4 5]
 [6 3]]


In [10]:
print(classification_report(y_true,y_pred))

              precision    recall  f1-score   support

           0       0.40      0.44      0.42         9
           1       0.38      0.33      0.35         9

    accuracy                           0.39        18
   macro avg       0.39      0.39      0.39        18
weighted avg       0.39      0.39      0.39        18



In [12]:
# Precision - TP/(TP+FP), TN/(TN+FN)
# Recall - TP/(TP+FN), TN/(TN+FP)
# F1-Score - 2 * Precision * Recall /( Precision + Recall)
# Acc = (TP+TN)/(TP+TN+FP+FN)

### Validation

In [11]:
# [TP=4 FN=5]
# [FP=6 TN=3]
print(cm)

[[4 5]
 [6 3]]


In [14]:
pre0 = 4/(4+6)
pre1 = 3/(3+5)
rec0 = 4/(4+5)
rec1 = 3/(3+6)
print('Pre0',pre0)
print('Pre1',pre1)
print('Rec0',rec0)
print('Rec1',rec1)

Pre0 0.4
Pre1 0.375
Rec0 0.4444444444444444
Rec1 0.3333333333333333


In [15]:
f1s0 = 2*pre0*rec0/(pre0+rec0)
f1s1 = 2*pre1*rec1/(pre1+rec1)
print('F1_Score0',f1s0)
print('F1_Score1',f1s1)

F1_Score0 0.4210526315789474
F1_Score1 0.35294117647058826


In [16]:
acc = (4+3)/(4+5+6+3)
print('Accuracy',acc)

Accuracy 0.3888888888888889


#### 3 class classification

In [20]:
y_true = [1,0,1,2,0,2,1,0,2,1,0,1,1,2,1,2,2,1]
y_pred = [0,1,2,1,0,2,1,1,0,1,1,0,2,1,0,1,2,0]
print(len(y_true),len(y_pred))

18 18


#### Exercise
1) Generate Confusion Matrix<br>
2) Validate for Precision, Recall, Accuracy and F1-score from classification_report

In [5]:
#### Pre = TP/(TP+FP)
# Rec = TP/(TP+FN)
