## Precision, Recall, F-measure, Support ==> multiclass

[Machine Learning with Imbalanced Data - Course](https://www.trainindata.com/p/machine-learning-with-imbalanced-data)

- **Precision** = tp / (tp + fp)

- **Recall** = tp / (tp + fn)

- **F1** = 2 * (precision * recall) / (precision + recall)

- **Support** = Number of cases on each class


By default, sklearn determines the class as the observation with the highest probability value. In this case, it does not depend on a specific threshold value.

In this notebook, we will obtain the values of the metrics:

- per class
- macro averaged
- micro averaged

In [1]:
import numpy as np
import pandas as pd

from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from sklearn.metrics import (
    precision_recall_fscore_support,
    accuracy_score,
    balanced_accuracy_score,
)

## Load data (multiclass)

In [2]:
# load data
data = load_wine()

data = pd.concat([
    pd.DataFrame(data.data, columns=data.feature_names),
    pd.DataFrame(data.target, columns=['target']),
    ], axis=1)

data.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0


In [3]:
# target distribution:
# multiclass and (fairly) balanced

data.target.value_counts(normalize=True)

target
1    0.398876
0    0.331461
2    0.269663
Name: proportion, dtype: float64

In [4]:
# separate dataset into train and test

X_train, X_test, y_train, y_test = train_test_split(
    data.drop(labels=['target'], axis=1),  # drop the target
    data['target'],  # just the target
    test_size=0.3,
    random_state=0)

X_train.shape, X_test.shape

((124, 13), (54, 13))

## Train ML models

### Random Forests

Produce probability vectors for each class.

In [5]:
# set up the model
rf = RandomForestClassifier(n_estimators=10, random_state=39, max_depth=1, n_jobs=4)

# train the model
rf.fit(X_train, y_train)

# produce the predictions (as probabilities)
y_train_rf = rf.predict_proba(X_train)
y_test_rf = rf.predict_proba(X_test)

# note that the predictions are an array of 3 columns

# first column: the probability of an observation of being of class 0
# second column: the probability of an observation of being of class 1
# third column: the probability of an observation of being of class 2

y_test_rf[0:10, :]

array([[0.59291486, 0.35444264, 0.0526425 ],
       [0.12139867, 0.33577091, 0.54283043],
       [0.30482504, 0.55905479, 0.13612017],
       [0.52711941, 0.38876082, 0.08411977],
       [0.27876443, 0.50875176, 0.21248381],
       [0.34573413, 0.49743863, 0.15682724],
       [0.51144556, 0.3911751 , 0.09737934],
       [0.034061  , 0.34869659, 0.6172424 ],
       [0.22335574, 0.59578725, 0.18085702],
       [0.22335574, 0.59578725, 0.18085702]])

In [6]:
# The final prediction is that of the biggest probabiity

rf.predict(X_test)[0:10]

array([0, 2, 1, 0, 1, 1, 0, 2, 1, 1])

### Logistic Regression

The Logistic regression also support multiclass targets.

In [7]:
# set up the model
logit = LogisticRegression(
    random_state=0, multi_class='multinomial', max_iter=100,
)

# train
logit.fit(X_train, y_train)

# obtain the probabilities
y_train_logit = logit.predict_proba(X_train)
y_test_logit = logit.predict_proba(X_test)

# note that the predictions are an array of 3 columns

# first column: the probability of an observation of being of class 0
# second column: the probability of an observation of being of class 1
# third column: the probability of an observation of being of class 2

y_test_logit[0:10, :]

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


array([[9.91180082e-01, 1.25239040e-03, 7.56752742e-03],
       [2.39665961e-06, 9.24182544e-08, 9.99997511e-01],
       [6.72505554e-03, 9.91780725e-01, 1.49421954e-03],
       [9.85740546e-01, 2.01772702e-03, 1.22417272e-02],
       [4.71428271e-03, 9.93594199e-01, 1.69151866e-03],
       [7.83097269e-04, 9.92144233e-01, 7.07266924e-03],
       [9.99369417e-01, 1.30593135e-04, 4.99990091e-04],
       [7.52750289e-04, 5.63891582e-05, 9.99190861e-01],
       [1.56499288e-02, 9.81839514e-01, 2.51055713e-03],
       [4.08763622e-03, 9.94374297e-01, 1.53806686e-03]])

In [8]:
# The final prediction is that of the biggest probabiity

logit.predict(X_test)[0:10]

array([0, 2, 1, 0, 1, 1, 0, 2, 1, 1])

In [9]:
# For the rest of the notebook, we work with the class predictions

y_rf_pred = rf.predict(X_test)

y_logit_pred = logit.predict(X_test)

## Metrics

### For each class

In [10]:
p, r, f, s = precision_recall_fscore_support(
    y_test,
    y_rf_pred,
    labels=[0,1,2], # the labels for which we want the metrics determined
    average=None, # when None, returns a metric per label
)

print('Precision: ', p)
print('Recall: ', r)
print('f score: ', f)
print('Support: ', s)
print()

Precision:  [1.         0.86956522 0.85714286]
Recall:  [0.89473684 0.90909091 0.92307692]
f score:  [0.94444444 0.88888889 0.88888889]
Support:  [19 22 13]



### Macro

Take the average of the individual metrics

In [11]:
p, r, f, s = precision_recall_fscore_support(
    y_test,
    y_rf_pred,
    labels=[0,1,2], # the labels for which we want the metrics determined
    average='macro', # take the average of the metrics
)

print('Precision: ', p)
print('Recall: ', r)
print('f score: ', f)
print('Support: ', s)
print()

Precision:  0.9089026915113871
Recall:  0.9089682247576985
f score:  0.9074074074074074
Support:  None



In [12]:
# For precision this is the same as:

(1. + 0.86956522 + 0.85714286) / 3

0.9089026933333333

### Weighed average

Takes the average of each metric weighted by the support.

In [13]:
p, r, f, s = precision_recall_fscore_support(
    y_test,
    y_rf_pred,
    labels=[0,1,2], # the labels for which we want the metrics determined
    average='weighted', # take the average of the metrics
)

print('Precision: ', p)
print('Recall: ', r)
print('f score: ', f)
print('Support: ', s)
print()

Precision:  0.9124683689901082
Recall:  0.9074074074074074
f score:  0.9084362139917695
Support:  None



In [14]:
# For precision this is the same as:

(19 * 1. + 22 * 0.86956522 + 13 * 0.85714286) / (19+22+13)

0.9124683707407407

### Micro

Collective average of TP, FP anf FN. 

In [15]:
# we determine the metrics for each one of the classes
# just like we did in the intro video

p, r, f, s = precision_recall_fscore_support(
    y_test, y_rf_pred, labels=[0,1,2], average='micro',
)

print('Precision: ', p)
print('Recall: ', r)
print('f score: ', f)
print('Support: ', s)
print()

Precision:  0.9074074074074074
Recall:  0.9074074074074074
f score:  0.9074074074074074
Support:  None



## Model comparison

### Recall, Precision and f score

In [16]:
# random forests

p, r, f, s = precision_recall_fscore_support(
    y_test,
    y_rf_pred,
    labels=[0,1,2], # the labels for which we want the metrics determined
    average='macro', # take the average of the metrics
)

print('Precision: ', p)
print('Recall: ', r)
print('f score: ', f)
print('Support: ', s)
print()

Precision:  0.9089026915113871
Recall:  0.9089682247576985
f score:  0.9074074074074074
Support:  None



In [17]:
# logistic regression

p, r, f, s = precision_recall_fscore_support(
    y_test,
    y_logit_pred,
    labels=[0,1,2], # the labels for which we want the metrics determined
    average='macro', # take the average of the metrics
)

print('Precision: ', p)
print('Recall: ', r)
print('f score: ', f)
print('Support: ', s)
print()

Precision:  0.9472049689440993
Recall:  0.9497607655502392
f score:  0.9469135802469135
Support:  None



The Logistic regression seems to be working a bit better on average.

### Accuracy

In [18]:
print('Accuracy Random Forest test:', accuracy_score(y_test, rf.predict(X_test)))
print('Accuracy Logistic Regression test:', accuracy_score(y_test, logit.predict(X_test)))

Accuracy Random Forest test: 0.9074074074074074
Accuracy Logistic Regression test: 0.9444444444444444


In [19]:
print('Balanced accuracy, Random Forest test:', balanced_accuracy_score(y_test, rf.predict(X_test)))
print('Balanced accuracy, Regression test:',  balanced_accuracy_score(y_test, logit.predict(X_test)))

Balanced accuracy, Random Forest test: 0.9089682247576985
Balanced accuracy, Regression test: 0.9497607655502392
