<a href="https://colab.research.google.com/github/sonudk/Introduction_to_AI_and_IoT-/blob/master/Evaluation_Metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Confusion Matrix

In [0]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve
import numpy as np


expected = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0]
predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0]
results = confusion_matrix(expected, predicted)
print(results)

[[4 2]
 [1 3]]


<img src=https://cdn-images-1.medium.com/max/800/1*OhEnS-T54Cz0YSTl_c3Dwg.jpeg
alt="Confusion matrix!" />


### Accuracy
$$
\text{Accuracy} = \frac{TP}{\text{Total}} = \frac{TP+TN}{TP+FP+TN+FN}
$$

In [0]:
accuracy = (3+4)/(4+2+1+3)
accuracy

0.7

But there are problems associated with accuracy metric. Consider this example.

In [0]:
cm = np.array([[990, 0], [10, 0]])
print(cm)

accuracy = (990+0)/(1000)
print('accuracy is', accuracy )

[[990   0]
 [ 10   0]]
accuracy is 0.99


Here, accuracy is 99% but most of the samples are true negatives. <br>
If this is a case of a sick patient with a contagious disease, then this classifier is not good as it classifies true positives as false negatives and the cost of such a mis-classification is very high.

### Precision
$$
\text{Precision} = \frac{TP}{\text{Predicted Positives}} = \frac{TP}{TP+FP}
$$

- Precision talks about how precise/accurate your model is out of those predicted positive, how many of them are actual positive. 
- Precision is a good measure to determine, when the costs of False Positive is high. <br>
  Eg., For Eg., in Email spam detection, if a non-spam email (True Negative) has been classified as spam email (False positive), then the recipient
   may lose important information. <br>
   <img src=https://cdn-images-1.medium.com/max/800/1*PULzWEven_XAZjiMNizDCg.png
alt="Confusion matrix!" />
   
   

In [0]:
precision = 4/(4+1)
precision

0.6

### Recall
$$
\text{Recall} = \frac{TP}{\text{Actual Positives}} = \frac{TP}{TP+FN}
$$
 <img src=https://cdn-images-1.medium.com/max/800/1*BBhWQC-m0CLN4sVJ0h5fJQ.jpeg
alt="Confusion matrix!" />
- Recall shall be the model metric we use to select our best model when there is a high cost associated with False Negative <br>
- For eg, in sickness detection, if a sick patient (True positive) is predicted as not - sick (False Negative), the associated cost may be very high if the disease is contagious. <br>

In [0]:
recall = 3/(4+2)
recall

0.5

### F - score
$F_1$ score is the harmonic mean of precision and recall
$$
F_1 = \left(\frac{recall^{-1} + precision^{-1}}{2}\right)^{-1} = 2 \cdot \frac{precision.recall}{precision+recall}
$$
- F1 Score is needed when you want to seek a balance between Precision and Recall.
- We saw that accuracy is not useful when classes are unevenly balanced - No. of True negative samples is very high. <br>
  In such a case, F1 score is a better measure.

In [0]:
f_1_score = 2*(0.6)*(0.5)/(0.6+0.5)
f_1_score

0.5454545454545454

### False Acceptance Rate
$$
FAR = \frac{FP}{TP + FP} 
$$

- False acceptance rate is the measure of the likelihood that a system will classify a true negative sample as a false positive.
- A high $FAR$ is extremely harmful when the cost of a false positive is very high.
- For eg., in biometric systems that grant authorization to users, giving access to an unauthorized person (true negative) can be devastating.

### False Rejection Rate
$$
FRR = \frac{FN}{FN + TN}
$$
- False rejection rate is the measure of the likelihood that a system will classify a true positive sample as a false negative.
- A high FRR is harmful when the cost of a false negative is very high. 