# Precision and Recall in Multiclass Classification
Ref <https://towardsdatascience.com/multi-class-metrics-made-simple-part-i-precision-and-recall-9250280bddc2>

In [2]:
from sklearn.metrics import confusion_matrix,     \
                            accuracy_score,       \
                            precision_score,      \
                            recall_score,         \
                            classification_report

In [3]:
C = "cat"
F = "fish"
H = "hen"

![boaz_shmueli.png](boaz_shmueli.png)

In [4]:
y_true = [C for _ in range(6)] + [F for _ in range(10)] + [H for _ in range(9)]
y_pred = [C for _ in range(4)] + [F, H] + \
         [C for _ in range(6)] + [F, F, H, H] + \
         [C for _ in range(3)] + [H for _ in range(6)]

In [5]:
# Note how sklearn and the figure above put truth and prediction differently
confusion_matrix(y_true, y_pred)

array([[4, 1, 1],
       [6, 2, 2],
       [3, 0, 6]])

In [6]:
# Not formated for Jupyter Notebook output, so better use `print()`.
print(classification_report(y_true, y_pred, digits=3))

              precision    recall  f1-score   support

         cat      0.308     0.667     0.421         6
        fish      0.667     0.200     0.308        10
         hen      0.667     0.667     0.667         9

    accuracy                          0.480        25
   macro avg      0.547     0.511     0.465        25
weighted avg      0.581     0.480     0.464        25



## `precision`, `recall`
Take `fish` for example.

In [7]:
# precision
2 / (1+2+0)

0.6666666666666666

In [8]:
# recall
2 / (6+2+2)

0.2

## `accuracy` Equals `0.480`
Where comes this number?

In [20]:
import numpy as np

In [33]:
EPSILON = 1e-6

In [37]:
precisions = [0.308, 0.667, 0.667]
recalls = [0.667, 0.200, 0.667]
support = [6, 10, 9]
precisions = np.array(precisions)
recalls = np.array(recalls)
support = np.array(support)
f1s = 2 * (precisions * recalls) / (precisions + recalls + EPSILON)
f1s

array([0.42140675, 0.30772744, 0.6669995 ])

## `micro avg`?

## `macro avg`
This is just normal average.

In [13]:
(0.308 + 0.667*2) / 3

0.5473333333333333

In [26]:
np.sum(precisions) / len(precisions)

0.5473333333333333

In [14]:
(0.667*2 + 0.2) / 3

0.5113333333333333

In [27]:
np.sum(recalls) / len(recalls)

0.5113333333333333

In [39]:
np.sum(f1s) / len(f1s)

0.46537789644768224

## `weighted avg`
Weighted average must be calculated based on the `support`, which is just the number of occurrances of each class in `y_true`. (In this particular case, six cats, ten fishes and nine hens.)

Let's verify the numerical value of `weighted avg`'s.

In [24]:
# This should give the weighted avg for precision
np.dot(precisions, support / np.sum(support))

0.58084

In [25]:
# This should give the weighted avg for recall
np.dot(recalls, support / np.sum(support))

0.4802

In [35]:
np.dot(f1s, support / np.sum(support))

0.4643484161731828