üìù **Author:** Amirhossein Heydari - üìß **Email:** <amirhosseinheydari78@gmail.com> - üìç **Origin:** [mr-pylin/pytorch-workshop](https://github.com/mr-pylin/pytorch-workshop)

---


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [Metric](#toc2_)    
  - [Built-in Metrics](#toc2_1_)    
    - [Classification Tasks](#toc2_1_1_)    
    - [Regression Tasks](#toc2_1_2_)    
  - [Custom Metrics](#toc2_2_)    
    - [Example 1: Custom Accuracy](#toc2_2_1_)    
    - [Example 2: Attack Success Rate](#toc2_2_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
from torchmetrics import Metric
from torchmetrics.classification import (
    MulticlassAccuracy,
    MulticlassAUROC,
    MulticlassConfusionMatrix,
    MulticlassF1Score,
    MulticlassPrecision,
    MulticlassRecall,
    MulticlassROC,
)
from torchmetrics.regression import (
    CosineSimilarity,
    MeanAbsoluteError,
    MeanAbsolutePercentageError,
    MeanSquaredError,
)

In [None]:
# set a seed for deterministic results
seed = 42
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# <a id='toc2_'></a>[Metric](#toc0_)

- **Metrics** evaluate model performance by computing key statistics such as **accuracy**, **precision**, **recall**, and **loss**.  
- `torchmetrics` provides a **modular** and **efficient** way to compute metrics for **PyTorch** models.  
- It supports both **batch-wise computation** and **distributed training** (e.g., multi-GPU).  

üõ† **Using Metrics**:

- Metrics in `torchmetrics` are subclasses of `torch.nn.Module` and can be used just like any other PyTorch module.  
- They should be updated with each batch and `compute()` should be called at the end of an epoch.  

üìù **Docs**:

- Welcome to TorchMetrics: [lightning.ai/docs/torchmetrics/stable/](https://lightning.ai/docs/torchmetrics/stable/)


## <a id='toc2_1_'></a>[Built-in Metrics](#toc0_)

- `torchmetrics` provides several built-in metrics for tasks like **classification**, **regression**, **clustering**, **detection**, **segmentation**, ... .  
- These metrics are optimized for **PyTorch** and support **GPU acceleration** and **distributed training**.

üõ† **Common Built-in Metrics**:

<table style="width: 48%; float: left; margin-right: 2%;">
  <thead>
    <tr>
      <th>Task</th>
      <th>Metric</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td rowspan="7">Classification</td>
      <td style="font-family: monospace;">Accuracy()</td>
      <td>Correct predictions ratio</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">Precision()</td>
      <td>TP / (TP + FP)</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">Recall()</td>
      <td>TP / (TP + FN)</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">F1Score()</td>
      <td>Harmonic mean of precision & recall</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">AUROC()</td>
      <td>ROC curve AUC</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">ROC()</td>
      <td>Receiver Operating Characteristic curve</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">ConfusionMatrix()</td>
      <td>Tabular summary of predictions</td>
    </tr>
    <tr>
    <tr>
      <td rowspan="2">Detection</td>
      <td style="font-family: monospace;">MeanAveragePrecision()</td>
      <td>mAP for object detection</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">IntersectionOverUnion()</td>
      <td>IoU metric for object detection</td>
    </tr>
  </tbody>
</table>

<table style="width: 48%; float: left;">
  <thead>
    <tr>
      <th>Task</th>
      <th>Metric</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td rowspan="4">Regression</td>
      <td style="font-family: monospace;">MeanSquaredError()</td>
      <td>Avg. squared error</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">MeanAbsoluteError()</td>
      <td>Avg. absolute error</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">CosineSimilarity()</td>
      <td>Measure of angle similarity</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">MeanAbsolutePercentageError()</td>
      <td>Mean absolute percentage error</td>
    </tr>
    <tr>
      <td rowspan="3">Clustering</td>
      <td style="font-family: monospace;">AdjustedRandScore()</td>
      <td>Similarity of clusters</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">RandScore()</td>
      <td>Random clustering similarity</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">FowlkesMallowsIndex()</td>
      <td>Precision-recall similarity</td>
    </tr>
    <tr>
      <td rowspan="2">Segmentation</td>
      <td style="font-family: monospace;">Dice()</td>
      <td>Dice coefficient (F1 score)</td>
    </tr>
    <tr>
      <td style="font-family: monospace;">MeanIoU()</td>
      <td>Avg. IoU across classes</td>
    </tr>
  </tbody>
</table>

üìù **Docs**:

- All TorchMetrics: [lightning.ai/docs/torchmetrics/stable/all-metrics.html](https://lightning.ai/docs/torchmetrics/stable/all-metrics.html)

### <a id='toc2_1_1_'></a>[Classification Tasks](#toc0_)

In [None]:
# load iris dataset
iris_df = pd.read_csv(
    r"https://raw.githubusercontent.com/mr-pylin/datasets/refs/heads/main/data/tabular-data/iris/dataset.csv",
    encoding="utf-8",
)

# meta-data
classes = iris_df["class"].unique()
class_to_idx = {l: i for i, l in enumerate(classes)}

# split dataset into features & labels
X, y = iris_df.iloc[:, :4].values, iris_df.iloc[:, 4].values
y = np.array([class_to_idx[l] for l in y])

# convert numpy.ndarray to torch.Tensor
X = torch.from_numpy(X.astype(np.float32))
y = torch.from_numpy(y.astype(np.int64))

# create DataLoader
dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=15, shuffle=True)

# log
print(f"X.shape             : {X.shape}")
print(f"X.dtype             : {X.dtype}")
print(f"y.shape             : {y.shape}")
print(f"y.dtype             : {y.dtype}")
print(f"len(dataset)        : {len(dataset)}")
print(f"dataset[0][0].shape : {dataset[0][0].shape}")
print(f"dataset[0][1].shape : {dataset[0][1].shape}")

In [None]:
# create a simple model
model = nn.Linear(X.shape[1], len(classes))

# criterion and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

# log
print(model)

In [52]:
# initialize metrics
accuracy = MulticlassAccuracy(len(classes), top_k=1)
recall = MulticlassRecall(len(classes), top_k=1, average=None)
precision = MulticlassPrecision(len(classes), top_k=1, average=None)
f1_score = MulticlassF1Score(len(classes), top_k=1, average=None)
roc = MulticlassROC(len(classes), average=None)
auroc = MulticlassAUROC(len(classes), average=None)
confusion_matrix = MulticlassConfusionMatrix(len(classes))

In [None]:
# training loop
epochs = 1
for epoch in range(epochs):
    for batch_idx, (x, y_true) in enumerate(dataloader):

        y_pred = model(x)

        loss = criterion(y_pred, y_true)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        # update metrics
        accuracy.update(y_pred, y_true)
        recall.update(y_pred, y_true)
        precision.update(y_pred, y_true)
        f1_score.update(y_pred, y_true)
        roc.update(y_pred, y_true)
        auroc.update(y_pred, y_true)
        confusion_matrix.update(y_pred, y_true)

    # log
    print(f"epoch {epoch+1}/{epochs}")
    print(f"  -> accuracy  : {accuracy.compute():.4f}")
    print(f"  -> recall    : {recall.compute()}")
    print(f"  -> precision : {precision.compute()}")
    print(f"  -> f1 score  : {f1_score.compute()}")
    print(f"  -> auroc     : {auroc.compute()}")
    print(f"  -> confusion matrix:\n{confusion_matrix.compute()}")

    # plot ROC curve [for each class]
    fpr, tpr, _ = roc.compute()
    fig, axs = plt.subplots(nrows=1, ncols=len(classes), figsize=(len(classes) * 4, len(classes)), layout="compressed")
    fig.suptitle("Area Under ROC (One-vs-Rest Approach)")
    for i in range(len(classes)):
        axs[i].plot(fpr[i], tpr[i], color="blue")
        axs[i].set(xlabel="FPR", ylabel="TPR", title=f"Class {i}")
    plt.show()

    # reset metrics for next epoch
    accuracy.reset()
    recall.reset()
    precision.reset()
    f1_score.reset()
    roc.reset()
    auroc.reset()
    confusion_matrix.reset()

### <a id='toc2_1_2_'></a>[Regression Tasks](#toc0_)

In [None]:
# load boston dataset
boston_df = pd.read_csv(
    "https://raw.githubusercontent.com/mr-pylin/datasets/refs/heads/main/data/tabular-data/boston-housing/dataset.csv",
    encoding="utf-8",
)

X = torch.tensor(boston_df.drop(columns=["MEDV"]).values, dtype=torch.float32)
y = torch.tensor(boston_df["MEDV"].values, dtype=torch.float32).view(-1, 1)

# create DataLoader
dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=15, shuffle=True)

# log
print(f"X.shape             : {X.shape}")
print(f"X.dtype             : {X.dtype}")
print(f"y.shape             : {y.shape}")
print(f"y.dtype             : {y.dtype}")
print(f"len(dataset)        : {len(dataset)}")
print(f"dataset[0][0].shape : {dataset[0][0].shape}")
print(f"dataset[0][1].shape : {dataset[0][1].shape}")

In [None]:
# create a simple model
num_output = 1
model = nn.Linear(X.shape[1], num_output)

# criterion and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

# log
print(model)

In [56]:
# initialize metrics
mae = MeanAbsoluteError()
mse = MeanSquaredError()
cs = CosineSimilarity()
mape = MeanAbsolutePercentageError()

In [None]:
# training loop
epochs = 1
for epoch in range(epochs):
    for batch_idx, (x, y_true) in enumerate(dataloader):

        y_pred = model(x)

        loss = criterion(y_pred, y_true)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        # update metrics
        mae.update(y_pred, y_true)
        mse.update(y_pred, y_true)
        cs.update(y_pred, y_true)
        mape.update(y_pred, y_true)

    # log
    print(f"epoch {epoch+1}/{epochs}")
    print(f"  -> mae  : {mae.compute():.4f}")
    print(f"  -> mse  : {mse.compute():.4f}")
    print(f"  -> cs   : {cs.compute():.4f}")
    print(f"  -> mape : {mape.compute():.4f}")

    # reset metrics for next epoch
    mae.reset()
    mse.reset()
    cs.reset()
    mape.reset()

## <a id='toc2_2_'></a>[Custom Metrics](#toc0_)

- `torchmetrics` allows defining **custom metrics** by subclassing `torchmetrics.Metric`.  
- Custom metrics provide flexibility to compute **task-specific** evaluation criteria.  
- They support **automatic accumulation** across batches and **distributed computation**.

üõ† **Creating a Custom Metric**:

- **Inherit** from `torchmetrics.Metric`.  
- **Define internal states** using `self.add_state()`.  
- **Implement `update()`** to process batch-level data.  
- **Implement `compute()`** to aggregate results.


### <a id='toc2_2_1_'></a>[Example 1: Custom Accuracy](#toc0_)

In [None]:
class CustomAccuracy(Metric):
    def __init__(self):
        super().__init__()
        # add state variables to track the number of correct predictions and total predictions
        self.add_state("correct", default=torch.tensor(0), dist_reduce_fx="sum")
        self.add_state("total", default=torch.tensor(0), dist_reduce_fx="sum")

    def update(self, preds: torch.Tensor, target: torch.Tensor):
        # assuming preds are logits or probabilities, apply argmax for predicted classes
        preds = preds.argmax(dim=1)

        # update the correct and total counters
        self.correct += (preds == target).sum()
        self.total += target.size(0)

    def compute(self):
        return self.correct.float() / self.total

### <a id='toc2_2_2_'></a>[Example 2: Attack Success Rate](#toc0_)

‚ÑπÔ∏è **Learn more**:

- BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain [[ **pdf** ](https://arxiv.org/pdf/1708.06733)]


In [None]:
class AttackSuccessRate(Metric):
    def __init__(self, target_index: int):
        super().__init__()
        self.target_class = target_index
        self.add_state("success", default=torch.tensor(0), dist_reduce_fx="sum")
        self.add_state("total", default=torch.tensor(0), dist_reduce_fx="sum")

    def update(self, preds: torch.Tensor, poison_mask: torch.Tensor | None) -> None:

        if poison_mask is not None:
            preds = preds[poison_mask]

        preds = preds.argmax(dim=-1)

        self.success += (preds == self.target_class).sum()
        self.total += len(preds)

    def compute(self) -> torch.Tensor:
        return self.success.float() / self.total