# Metrics + Intro to training and evaluation

## Metrics in classification

The most common metrics for classification problems are:
- **Accuracy**: the proportion of correct predictions
- **Precision**: the proportion of true positive predictions among all positive predictions (popular: the ability to precisely identify instances of a class)
- **Recall**: the proportion of true positive predictions among all actual positive instances (popular: the ability to find all instances of a class)
- **F1 score**: the harmonic mean of precision and recall (popular: the best of both worlds)
- **Macro and micro**: terms used to specify how to calculate precision and recall for multiclass classification problems. Macro precision is the average of precision for each class, while micro precision is the proportion of true positive predictions among all positive predictions, regardless of class.
- **Balanced accuracy**: the same as macro average recall
- **Confusion matrix**: a table showing the number of true positive, true negative, false positive, and false negative predictions
- **Classification report**: a summary of precision, recall, F1 score, and support for each class

Metrics in scikit-learn: https://scikit-learn.org/stable/modules/model_evaluation.html

### Concepts

| **Term**               | **Description**                                                                                       |
|------------------------|-------------------------------------------------------------------------------------------------------|
| **True Positive (TP)** | The model correctly predicted the positive class. Example: The model predicted "yes" and the actual label was also "yes".   |
| **False Positive (FP)** | The model incorrectly predicted the positive class. Example: The model predicted "yes" but the actual label was "no". This is also called a **Type I error**.   |
| **True Negative (TN)** | The model correctly predicted the negative class. Example: The model predicted "no" and the actual label was also "no".     |
| **False Negative (FN)** | The model incorrectly predicted the negative class. Example: The model predicted "no" but the actual label was "yes". This is also called a **Type II error**.    |


### Confusion matrix

Binary classification:

The scikit-learn convention and more normal in classic statistics is to have TN in the top left corner.

|                          | **Predicted Negative** | **Predicted Positive** |
|--------------------------|------------------------|------------------------|
| **Actual Negative**      | True Negative (TN)     | False Positive (FP)    |
| **Actual Positive**      | False Negative (FN)    | True Positive (TP)     |

If you focus on the True Positive you will quite often see the confusion matrix written as:

|                   | **Predicted Positive** | **Predicted Negative** |
|-------------------|------------------------|------------------------|
| **Actual Positive**   | True Positive (TP)      | False Negative (FN)     |
| **Actual Negative**   | False Positive (FP)     | True Negative (TN)      |


Here we have presented the confusion matrix with actual (true) values in rows and predicted values in columns. This is the most common way of presenting the confusion matrix in machine learning. However, quite often you will see the predicted values presented in the rows and the actual values in the columns.


Multiclass:

In multiclass it is standard to have the TP for each class in the diagonal.

|                  | Predicted Class 0 | Predicted Class 1 | Predicted Class 2 |
|------------------|-------------------|-------------------|-------------------|
| **Actual Class 0** | TP (Class 0)       | FN (Class 0 as Class 1) | FN (Class 0 as Class 2) |
| **Actual Class 1** | FP (Class 1 as Class 0) | TP (Class 1)       | FN (Class 1 as Class 2) |
| **Actual Class 2** | FP (Class 2 as Class 0) | FP (Class 2 as Class 1) | TP (Class 2)       |


**Important**: Be aware of the different ways of presenting the confusion matrix.

### Precision
$$
\text{Precision} = \frac{TP}{TP + FP}
$$

### Recall
$$
\text{Recall} = \frac{TP}{TP + FN}
$$

### F1 Score
$$
F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$$

### Accuracy
$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

## Example 1 - a binary classification of breast cancer

The goal is to detect breast cancer (malignant or benign) based on features such as the mean radius, mean texture, and mean perimeter of the cell nuclei. The dataset is available in scikit-learn. The negative class (0) is malignant.

Load and split data

In [None]:
# Step 1: Import libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    ConfusionMatrixDisplay,
    accuracy_score,
    confusion_matrix,
    precision_score,
    recall_score,
)
from sklearn.model_selection import train_test_split

# Step 2: Load the Breast Cancer dataset
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target  # type: ignore

# NOTE: dataset returned from scikit learn is in numpy array format, not pandas dataframe format

# Step 3: Split the dataset into training and testing sets
# Parameters set in train_test_split:
# stratify=y: This is used to ensure that the training and testing sets have approximately the same percentage of samples of each target class as the complete set.
# shuffle=True: This is used to shuffle the data before splitting it, making sure that the data is not ordered in any way when creating the training and testing sets.

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y, shuffle=True
)


Inspects the numpy format of the data

In [None]:
# Inspects the first 3 rows of the training set
X_train[:3]

In [None]:
# inspects the first 10 values of the target variable in the training set
y_train[:10]

Inspect the label numbers in each class. For numpy arrays we cannot use the classic value_counts() method we use for a pandas DataFrame.

In [None]:
np.unique(y_test, return_counts=True)

In [None]:
np.unique(y_train, return_counts=True)

In [None]:
print(cancer.DESCR)

Train a model

In [None]:
# Step 4: Train the Random Forest model
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Step 5: Make predictions
y_pred = clf.predict(X_test)

In [None]:
# inspects the first 10 predictions
y_pred[:10]

Evaluate

In [None]:
# Step 6: Compute confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)

# Visualise the confusion matrix in the standard scikit-learn way
disp = ConfusionMatrixDisplay(
    confusion_matrix=conf_matrix, display_labels=cancer.target_names
)
disp.plot(cmap="Blues")
plt.title("Confusion Matrix (Standard scikit-learn Layout)")
plt.show()

print("Confusion Matrix (standard layout in scikit-learn):")
print(
    pd.DataFrame(
        conf_matrix, columns=["Malignant", "Benign"], index=["Malignant", "Benign"]
    )
)

In [None]:
# Step 7: Calculate and display accuracy, precision, and recall
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="binary")
recall = recall_score(y_test, y_pred, average="binary")

print("\nAccuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)

You can calculate the metrics manually or use the classification_report function in scikit-learn.

In [None]:
# make classification report
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred, target_names=cancer.target_names))

Recap - Binary confusion matrix in Scikit-learn


|                          | **Predicted Negative** | **Predicted Positive** |
|--------------------------|------------------------|------------------------|
| **Actual Negative**      | True Negative (TN)     | False Positive (FP)    |
| **Actual Positive**      | False Negative (FN)    | True Positive (TP)     |

Manual calculations of metrics for binary classification to see what is happening under the hood.

In [None]:
# Accuracy: TP + TN / (TP + TN + FP + FN)
(102 + 58) / (102 + 58 + 6 + 5)

In [None]:
# Precision of positive class: TP / (TP + FP)
102 / (102 + 6)

In [None]:
# Recall of positive class: TP / (TP + FN)
102 / (102 + 5)

## Example 2 - classifying iris flowers - multiclasses

Load the data

In [None]:
# Step 1: Import libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    precision_score,
    recall_score,
)
from sklearn.model_selection import train_test_split

# Step 2: Load the Iris dataset: https://archive.ics.uci.edu/dataset/53/iris
iris = load_iris()
X, y = iris.data, iris.target  # type: ignore

# Step 3: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y, shuffle=True
)


In [None]:
iris.feature_names

In [None]:
iris.target_names

In [None]:
X.shape

In [None]:
print(iris.DESCR)

Train the model

In [None]:
# Step 4: Train the Random Forest model
# clf = RandomForestClassifier(random_state=42)
clf = LogisticRegression(random_state=42, max_iter=500)
clf.fit(X_train, y_train)

# Step 5: Make predictions
y_pred = clf.predict(X_test)

Calculate metrics

In [None]:
# Step 6: Compute confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(pd.DataFrame(conf_matrix, columns=iris.target_names, index=iris.target_names))

In [None]:
# Step 7: Calculate and display accuracy, precision, and recall
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="macro")
recall = recall_score(y_test, y_pred, average="macro")

print("\nAccuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)

In [None]:
# classification report for comparison
from sklearn.metrics import ConfusionMatrixDisplay, classification_report

print(classification_report(y_test, y_pred, target_names=iris.target_names))

In [None]:
# Step 8: Compute confusion matrix
import matplotlib.pyplot as plt

conf_matrix = confusion_matrix(y_test, y_pred)

# Visualise the confusion matrix in the standard scikit-learn way
disp = ConfusionMatrixDisplay(
    confusion_matrix=conf_matrix, display_labels=iris.target_names
)
disp.plot(cmap="Blues")
plt.title("Confusion Matrix (Standard scikit-learn Layout)")
plt.show()