# Machine Learning Laboratory
Institute of Imaging and Computer Vision, RWTH Aachen

Version SS2025

# Session 2: Loss and Metric


## Goals of this session

After this session, you will have an understanding of the following losses and/or metrics:

- MSE, MAE
- (Binary) CE

- Cosine Similarity

- Accuracy
- F-1 Score
- ROC
  
- Dice for segmentation
- *Optional: IoU for segmentation or detection

- SSIM
- *Optional: PSNR


**Note:** We will work with `pytorch` and `torchmetrics` in this session. You are allowed to use any torch [`Math operations (url)`](https://pytorch.org/docs/stable/torch.html#math-operations) in your own implementations and not allowed to use anything in `torch.nn.SomeLoss()` or `torchmetrics.SomeMetric()`. After implementing your version, you can call the losses from `torch.nn.SomeLoss()` or `torchmetrics.SomeMetric()` to verify your numerical results.



**>> Time Management**
  
**>> There are 7 mandatory losses/metrics and you can plan a 15min time slot for each one on average (note that the complexities of them have a high variance). Not as fast as expected? No problem at all: you will have ca. 60min buffer time. Still not completed? You can continue with the last piece at home.**


### 1. MSE, MAE

We can then calculate the Mean Sqaured Error (MSE) loss (i.e., squared L2 norm) or Mean Absolute Error (MAE) loss (i.e., L1 norm) to obtain the difference between our predictions and the ground-truths. Here're the formulae:

$ MAELoss_n = |x_n - y_n|$

$ MSELoss_n = (x_n - y_n)^2$

These losses are usually used for regression tasks.

**TASK:** Implement MSE and MAE losses by your own and check the results with pytorch. Average the results if they are multi-dimensional with `mean()`.

In [None]:
# MSE
import torch

input = torch.randn(100, )
target = torch.randn(100, )

# Do not use torch.nn.SomeLoss()
# Your Code Here

# MAEloss =
# MSEloss = 
# print(f'MAE: {MAEloss}, MSE: {MSEloss}')


In [None]:
# MSE 
# Use torch.nn.SomeLoss()
# Your Code Here

# MAEloss =
# MSEloss = 
# print(f'MAE: {MAEloss}, MSE: {MSEloss}')


### 2. (Binary) Cross Entropy (CE)



You will implement CE loss for a classification task. Say there're 10 possible classes and we can use a so-called one-hot vector to represent the ground-truth: 0 means "not this class" and 1 means "yes it's this class". Correspondingly, we can use a vector containing probabilities for the predictions. This implies that the sum of the probability vector is 1 and each element is in [0,1]. In the following example, we have 5 predictions such that our complete results are of 5x10.

$\displaystyle CELoss = -\sum_{c=1}^C \log \frac{\exp (x_c)}{\sum_{i=1}^C \exp (x_i)} y_c$

In [None]:
# CE
# input is of size N x C = 5 x 10. Note that they are not probabilities yet!
input = torch.randn(5, 10)  # * 100
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 9, 0, 0, 4])

# Do not use torch.nn.SomeLoss()
# Your Code Here

# CEloss = 
# print(f'CE: {CEloss}')



Now the torch version.

In [None]:
# CE
# Use torch.nn.SomeLoss()
# Your Code Here

# CEloss = 
# print(f'CE: {CEloss}')


**TASK**: Cross Entropy loss is implemented in PyTorch as the combined `LogSoftmax()` on an input, followed by `NLLLoss()`. This means, by default, a softmax activation is already included in `CrossEntropyLoss()`. Please calculate CE with these two functions and check the result. 


**Question**: Why `Softmax`+`CE` is better implemented as `LogSoftmax`+`NLL`?

**Answer**:

In [None]:
# CE
# Use torch.nn.SomeLoss()
# Your Code Here

# CEloss = 
# print(f'CE: {CEloss}')



**TASK:** Now you can multiply the `input` by 100, check the loss values of 3 versions of implementation. This is a useful trick called [LogSumExp](https://w.wiki/9hGY).

**Note:** As suggested by pytorch, it's better (faster) to use class indices in `target`.

**Optional:** Try out `BCELoss()` and `BCEWithLogitsLoss()`. Note the different activation function.

In [None]:
# BCE
# Your Code Here
input_bce = torch.randn(3, 2)
target_bce = torch.rand(3, 2)

# BCEloss = 
# print(f'BCE: {BCEloss}')

### 3. Cosine Similarity

$\displaystyle cos_{sim}(x,y) = \frac{x \cdot y}{||x|| \cdot ||y||} =
\frac{\sum_{i=1}^n x_i y_i}{\sqrt{\sum_{i=1}^n x_i^2}\sqrt{\sum_{i=1}^n y_i^2}}$

**Questions:** Describe the geometric interpretation of cosine simularity.

**Answer:**



**TASK**: We have 4 row vectors in the `target` and `preds` tensors. Calculate cosine similarity of each pair and show 4 cosine similarities. 

In [None]:
# Cosine Similarity
target = torch.tensor([[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4],[1, 2, 3, 4]], dtype=torch.float32)
preds = torch.tensor([[1, 2, 3, 4],[0.1, 0.2, 0.3, 0.4],[-1, -2, -3, -4],[-9, 1, 1, 1]], dtype=torch.float32)

# Do not use torch.nn.SomeLoss() or torchmetrics
# Your Code Here
# Do not use 'reduction'

# cos_sim =     # a tensor of size (4,)
# print(f'cos_sim: {cos_sim}')


Now the torchmetrics version.

In [None]:
# Cosine Similarity
import torchmetrics
# Use torchmetrics
# Your Code Here

# cos_sim = 
# print(f'cos_sim: {cos_sim}')


### 4. Accuracy

Say there're 2 deseases A and B and we want to diagnose them. Now we have a 3-class task: healthy, A, B. If we have 100 samples, where 80 are healthy, 12 have A and 8 have B.

$\displaystyle \text{Accuracy} = \frac{1}{N}\sum_i^N 1(y_i = \hat{y}_i)$

**TASK:** Calculate accuracy with your own implementation and torchmetrics for this multiclass task. Experiment with different average approaches.

In [None]:
# Accuracy
target = torch.tensor([0]*80 + [1]*12 + [2]*8)
preds = torch.tensor([0]*86 + [1]*10 + [2]*4)

# Do not use torchmetrics
# Your Code Here

# accuracy =
# print(f'accuracy: {accuracy}')


# Use torchmetrics
# Your Code Here

# accuracy =
# print(f'accuracy: {accuracy}')


**Question:** Which reduction (average approach) would you prefer and why?

**Answer:**

### 5. Receiver Operating Characteristic (ROC) Curve

We experiment with the **binary** ROC in the following. We define 11 thresholds from 0 to 1 with a step of 0.1 and calculate the true positive rate (TPR, aka **sensitivity** or **recall**) and false positive rate (FPR) w.r.t. each threshold.

In [None]:
# ROC
preds = torch.rand(100)
target = torch.randint(2, (100,))
thresholds = torch.linspace(1, 0, 11)   # 1., 0.9, 0.8, ..., 0.

# Do not use torch.nn.SomeLoss() or torchmetrics
# Your Code Here

# print(f"{'threshold':^10}{'fpr':^10}{'tpr':^10}")  # print header
# for th in thresholds:
# 
#     tpr = 
#     fpr = 
#     print(f"{th:^10.1f}{fpr:^10.4f}{tpr:^10.4f}")  # print results



Now the torchmetrics version. We can also plot the ROC curve effortlessly.

In [None]:
# ROC
%matplotlib inline

# Use torchmetrics
# Your Code Here
# fpr, tpr, thresh = 

# print(f"{'threshold':^10}{'fpr':^10}{'tpr':^10}") 
# for th,fpr_,tpr_ in zip(thresh, fpr, tpr):
#     print(f"{th:^10.1f}{fpr_:^10.4f}{tpr_:^10.4f}")

# fig_, ax_ = metric.plot(score=True)


Since the predictions are random, the AUROC should be around 0.5 ðŸŽ°

**Qestions:** What is AUROC?

**Answer:**

**Optional:** Experiment with multiclass and multilabel versions of ROC.

### 6. SÃ¸rensenâ€“Dice Coefficient

Thorvald **SÃ¸rensen** (1902-1973), Danish botanist and biologist. 

Lee R. **Dice** (1887-1977), American ecologist and geneticist.

We will use Dice score for segmentation tasks:
$\displaystyle \text{Dice} = \frac{2|X\bigcap Y|}{|X| + |Y|}$

Imagine we have a yorkshire picture and a bit wrongly predict the segmentation of it. Now we want to calculate the Dice score on the segmentation mask and the ground-truth mask. The background is 2 and the yorkshire is 1.

In [None]:
# DICE
from skimage import io
import matplotlib.pyplot as plt
import numpy as np

# Load image
yorkshire = io.imread('src/yorkshire_terrier_198.jpg')
gt = io.imread('src/yorkshire_gr.png')
pred = io.imread('src/yorkshire_pred.png')

f, axarr = plt.subplots(1,3, figsize=(15,40))
axarr[0].imshow(yorkshire)
axarr[1].imshow(gt)
axarr[2].imshow(pred)
plt.show()

**TASK:** Calculate Dice with ignored background. Again, first implement your own version and then check with torchmetrics.

In [None]:
# DICE
# Do not use torchmetrics
# Your Code Here

# def dice_(pred, gt, ignore_index=2):

#     intersect = 
#     union = 
#     return 2 * intersect / union


print(f'Dice: {dice_(pred, gt, ignore_index=2)}')

# Use torchmetrics
# Your Code Here

# dice =
# print(f'Dice: {dice(torch.tensor(pred), torch.tensor(gt))}')


### 7. Structural Similarity Index Measure (SSIM)




*Warning for [ailurophobia](https://en.wikipedia.org/wiki/Ailurophobia): there will be some cat pictures!*

We first illustrate some differently processed images and calculate SSIM of each of them to the original one. 

**Note** that we will use a Gaussian kernel ($11\times 11$) to calculate means and variances.

In [None]:
# SSIM
from skimage import io, transform
import matplotlib.pyplot as plt
import numpy as np

img = io.imread('src/cat.png') / 255.
h, w = img.shape[0], img.shape[1]
img_lowres = transform.resize(transform.resize(img, (int(h/3), int(w/3))), (h, w)) 
img_rot = transform.rotate(img, 180)
img_fade = img / 2.
img_flip = np.fliplr(img)
img_shift = np.zeros_like(img)
img_shift[20:, 20:, : ] = img[20:, 20:, : ]

f, axarr = plt.subplots(1,6, figsize=(15,40))
axarr[0].imshow(img)
axarr[1].imshow(img_lowres)
axarr[2].imshow(img_rot)
axarr[3].imshow(img_fade)
axarr[4].imshow(img_flip)
axarr[5].imshow(img_shift)

for ax,title in zip(axarr, ['Original', 'Low Resolution', 'Rotation 180', 'Low Intensity', 'Flip', 'Shift']):
    ax.axis('off')
    ax.set_title(title)
plt.show()



$ \displaystyle SSIM(x,y) = \frac{(2\mu_x\mu_y + C_1) + (2 \sigma _{xy} + C_2)} 
    {(\mu_x^2 + \mu_y^2+C_1) (\sigma_x^2 + \sigma_y^2+C_2)}$, where we define $11\times 11$ windows as $x$ and $y$ from two images and

${\displaystyle \mu _{x}}$ the pixel sample mean of ${\displaystyle x}$;

${\displaystyle \mu _{y}}$ the pixel sample mean of ${\displaystyle y}$; 

${\displaystyle \sigma _{x}^{2}}$ the variance of ${\displaystyle x}$; 

${\displaystyle \sigma _{y}^{2}}$ the variance of ${\displaystyle y}$; 

${\displaystyle \sigma _{xy}}$ the covariance of  ${\displaystyle x}$ and ${\displaystyle y}$; 

${\displaystyle c_{1}=(k_{1}L)^{2}}$, $ {\displaystyle c_{2}=(k_{2}L)^{2}}$ two variables to stabilize the division with weak denominator; 

${\displaystyle L}$ the dynamic range of the pixel-values; 

${\displaystyle k_{1}=0.01}$ and ${\displaystyle k_{2}=0.03}$ by default.


**Hints:** 
- `cv2.getGaussianKernel(size, sigma)` generates 1D gaussian kernel 
- `cv2.filter2D(input, -1, kernel)` operates the convolution on the input with the kernel
- $\mathrm{Var}(X) = \mathrm{E}[X^2] - \mathrm{E}[X]^2$
- Note that we will use a Gaussian kernel ($11\times 11$) to calculate means and variances

In [None]:
# SSIM
import cv2

# Your Code Here
# def SSIM_(img1, img2):
#     C1 = 
#     C2 = 

#     kernel_1D =
#     kernel_2D = kernel_1D @ kernel_1D.T

#     mu1 = 
#     mu2 = 

#     var1 = 
#     var2 = 
#     covar12 = 

#     ssim = ((2 * mu1 * mu2 + C1) * (2 * covar12 + C2)) / ((mu1**2 + mu2**2 + C1) * (var1 + var2 + C2))  # you are welcome ;)
#     return ssim.mean()


print(f'{"Low res:": <10} {SSIM_(img, img_lowres):.4f}')
print(f'{"Rotation:": <10} {SSIM_(img, img_rot):.4f}')
print(f'{"Intensity:": <10} {SSIM_(img, img_fade):.4f}')
print(f'{"Flip:": <10} {SSIM_(img, img_flip):.4f}')
print(f'{"Shift:": <10} {SSIM_(img, img_shift):.4f}')

Now call the SSIM from torchmetrics. Note that you have to convert the numpy arrays [W,H,C] to torch tensors [B,C,W,H].


In [None]:
# SSIM
# Use torchmetrics
# Your Code Here

# def np_whc_to_tensor_bcwh(img_np_whc):
# 
#     return img_tensor_bcwh

# img = np_whc_to_tensor_bcwh(img)
# img_lowres = np_whc_to_tensor_bcwh(img_lowres)
# img_rot = np_whc_to_tensor_bcwh(img_rot)
# img_fade = np_whc_to_tensor_bcwh(img_fade)
# img_flip = np_whc_to_tensor_bcwh(img_flip.copy()) 
# img_shift = np_whc_to_tensor_bcwh(img_shift)

# ssim =    # instantiate the ssim from torchmetrics


print(f'{"Low res:": <10} {ssim(img, img_lowres):.4f}')
print(f'{"Rotation:": <10} {ssim(img, img_rot):.4f}')
print(f'{"Intensity:": <10} {ssim(img, img_fade):.4f}')
print(f'{"Flip:": <10} {ssim(img, img_flip):.4f}')
print(f'{"Shift:": <10} {ssim(img, img_shift):.4f}')


**Question:** Summarize some characristics of SSIM based on these examples. Are the SSIM values in accordance with your intuitions?

**Answer:**

### Feedback Cell:
Let us know how you liked it. Any suggestions/ criticism are also welcome! 

Your feedbacks:

### Optional: PSNR
You can implement PSNR and calculate it with the image pairs.

In [None]:
# PSNR


### Acknowledgements

The cat picture is licensed with [CC BY-SA 2.0 DEED](https://creativecommons.org/licenses/by-sa/2.0/deed.en) from https://w.wiki/9hFn. The dog picture is licensed with [CC BY-SA 4.0 DEED](https://creativecommons.org/licenses/by-sa/4.0/) from [The Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/). The libraries used are numpy, pytorch, torchmetrics, opencv. 

Contributor(s): Yuli Wu.