# Evaluation metrics for Multi-class classification problems

We can easily extend our knowledge about [evaluation metrics for Binary classification problems](1_Binary_Classification_Evaluation_Metrics.ipynb) we learnt already, which were:
+ Accuracy score
+ Precision
+ Recall
+ AUC
+ Log loss

First, what is **Multi-class classification** problems?  
In Binary classification problems, we were classifying a sample into 2 classes (Pneumothorax or non-Pneumothorax, etc.). Here, we'll classify a given sample out of more than 2 classes. For example, in IRIS dataset, we need to classify a sample out of 3 species, namely Setosa, Versicolour and Virginica.

**What are the different Evaluation metrics for these type of problems?**  
Let me tell you that the concepts like Precision, Recall, etc. remains same and we just use them and compute average or weighted average, etc. So, following are the metrics for Precision:
+ Macro averaged precision
+ Micro averaged precision
+ Weighted precision

Similar concepts are present for Recall, F1-Score, etc.

Let's deep-dive into Precision related evaluation metrics.

## Macro averaged precision

It is defined as compute precision for all classes individually and then average them.  
This can be better understand in the code.

In [1]:
import numpy as np
from eval_metrics import * # Python file that stores all functions created in Binary_Classification Notebook

In [2]:
def macro_average_precision(y_true, y_pred):
    """
    Compute the Macro average precision for Multi-class classification problem.
    
    :param y_true: Actual target values
    :param y_pred: Predicted values from the model
    :returns macro-averaged precision for given values.
    """
    
    # Get number of classes in y_true
    classes = list(np.unique(y_true))
    
    precision = 0
    
    for class_ in classes:
        temp_true = [1 if yt == class_ else 0 for yt in y_true]
        temp_pred = [1 if yp == class_ else 0 for yp in y_pred]
        
        tp = true_positive(temp_true, temp_pred)
        fp = false_positive(temp_true, temp_pred)
        
        temp_precision = tp/(tp+fp)
        precision += temp_precision
    
    precision /= len(classes)
    return precision

## Micro average precision

It is defined as calculate classwise TP and FP and then use that to calculate overall precision.

In [3]:
def micro_average_precision(y_true, y_pred):
    """
    Compute the Micro average precision for Multi-class classification problem.
    
    :param y_true: Actual target values
    :param y_pred: Predicted values from the model
    :returns micro-averaged precision for given values.
    """
    
    classes = list(np.unique(y_true))
    
    tp = 0
    fp = 0
    
    for class_ in classes:
        temp_true = [1 if yt == class_ else 0 for yt in y_true]
        temp_pred = [1 if yp == class_ else 0 for yp in y_pred]
        
        temp_tp = true_positive(temp_true, temp_pred)
        temp_fp = false_positive(temp_true, temp_pred)
        
        tp += temp_tp
        fp += temp_fp
    
    return tp / (tp + fp)

## Weighted precision

It is defined same as macro but in this case, it is weighted average depending on the number of items in each class.

In [4]:
from collections import Counter    

In [5]:
def weighted_average_precision(y_true, y_pred):
    """
    Compute the Weighted average precision for Multi-class classification problem.
    
    :param y_true: Actual target values
    :param y_pred: Predicted values from the model
    :returns weighted-averaged precision for given values.
    """
    
    cnt_classes = Counter(y_true)
    
    precision = 0
    
    for class_, cnt in cnt_classes.items():
        temp_true = [1 if yt == class_ else 0 for yt in y_true]
        temp_pred = [1 if yp == class_ else 0 for yp in y_pred]
        
        tp = true_positive(temp_true, temp_pred)
        fp = false_positive(temp_true, temp_pred)
        temp_precision = tp / (tp + fp)
        
        precision += temp_precision * cnt
    
    return precision / len(y_true)

Let's compare our implementations with `sklearn` package results

In [6]:
targets = [0, 1, 2, 0, 1, 2, 0, 2, 2]
preds   = [0, 2, 1, 0, 2, 1, 0, 0, 2]

print(f'Macro    Precision: {macro_average_precision(targets, preds)}')
print(f'Micro    Precision: {micro_average_precision(targets, preds)}')
print(f'Weighted Precision: {weighted_average_precision(targets, preds)}')

Macro    Precision: 0.3611111111111111
Micro    Precision: 0.4444444444444444
Weighted Precision: 0.39814814814814814


In [7]:
from sklearn.metrics import precision_score

print(f'Macro    Precision: {precision_score(targets, preds, average="macro")}')
print(f'Micro    Precision: {precision_score(targets, preds, average="micro")}')
print(f'Weighted Precision: {precision_score(targets, preds, average="weighted")}')

Macro    Precision: 0.3611111111111111
Micro    Precision: 0.4444444444444444
Weighted Precision: 0.39814814814814814
