In [1]:
import torch
from torch import Tensor
import torch.nn.functional as F
import numpy as np

# DICE evaluation metric
In the lab semantic segmentation, you have implemented IOU to evaluate the performance of the model. Here, you need to implement a similar evaluation metric called DICE or Sørensen–Dice coefficient, and it is formulated as: $$ DICE(X, X_{truth}) = \frac{2|X \cap X_{truth}|}{|X| + |X_{truth}|}$$ \
Compared to IOU, DICE is more sensitive to small differences in overlap due to the squared terms in the numerator and denominator, so it can be more informative when there's a need to discriminate between segmentations with subtle differences in overlap.

In [2]:
def DICE(inp : Tensor, tgt : Tensor):
    """
    Arguments: 
        inp: Predicted mask (batchsize, number of classes, width, height)
        tgt: Ground truth mask (batchsize, number of classes, width, height)
    Returns:
        Classwise Average of DICE coefficient
    """
    eps = 1e-5 # small number to add to denominator to avoid division by zero
    #YOUR CODE START HERE
    sum_dim = (-1, -2, -3)
    # calculation of intersection   
    inter = 2 *(inp * tgt).sum(dim=sum_dim)

    # calculate the sum of |inp| + |tgt|
    sets_sum = inp.sum(dim=sum_dim) + tgt.sum(dim=sum_dim)
    sets_sum = torch.where(sets_sum == 0, inter, sets_sum)

    # calcaute the dice    
    dice = (inter + eps) / (sets_sum + eps)
    
    # average the dice batchwise
    return dice.mean()

### Tests

In [3]:
prediction1 = Tensor([[0, 7, 5, 7, 2],
        [2, 4, 5, 9, 9],
        [2, 8, 5, 1, 8],
        [3, 6, 5, 2, 6],
        [3, 2, 9, 1, 1]]).unsqueeze(0).long()
mask1 = Tensor([[4, 2, 5, 0, 2],
        [8, 2, 9, 8, 5],
        [0, 8, 7, 9, 6],
        [8, 6, 5, 9, 1],
        [3, 2, 9, 0, 6]]).unsqueeze(0).long()
prediction2 = Tensor([[5, 7, 3, 3, 0],
        [0, 2, 8, 2, 7],
        [1, 7, 0, 9, 9],
        [7, 5, 2, 3, 4],
        [6, 0, 9, 0, 1]]).unsqueeze(0).long()
mask2 = Tensor([[4, 6, 8, 3, 0],
        [4, 4, 7, 2, 7],
        [0, 0, 4, 9, 9],
        [5, 2, 3, 3, 4],
        [3, 0, 0, 8, 2]]).unsqueeze(0).long()

#Tests
dice1 = DICE(F.one_hot(prediction1).permute(0, 3, 1, 2).float(), F.one_hot(mask1).permute(0, 3, 1, 2).float()).item()
dice2 = DICE(F.one_hot(prediction2).permute(0, 3, 1, 2).float(), F.one_hot(mask2).permute(0, 3, 1, 2).float()).item()

assert np.isclose(0.3200001120567322, dice1), 'incorrect dice 1!'
assert np.isclose(0.3600001037120819, dice2), 'incorrect dice 2!'
print("\033[92m All tests passed!")

[92m All tests passed!


## Open questions 

1. Based on their formulation, what are the **difference** among Cross-entropy, DICE, and IOU?  

Cross-entropy, DICE, and IOU each measure different aspects of model predictions based on their unique formulations: Cross-entropy loss computes the dissimilarity between the predicted probabilities and the true distribution using a logarithmic function, emphasizing the correct classification by punishing the deviations more heavily for incorrect probabilistic predictions. The DICE coefficient, with its formulation of 2∣X∩Y|/(|X|+|Y|), quantifies the similarity between two sets by considering the relative size of the intersection twice over the combined size of both sets, thereby giving more importance to the commonalities rather than the differences. In contrast, the IOU metric, expressed as |X∩Y|/|X∪Y|,measures the overlap between the predicted and true areas by taking a straightforward ratio of their intersection over their union, without giving additional weight to either the intersection or the size of each set. Thus, while Cross-entropy assesses prediction accuracy on a probabilistic scale, DICE and IOU focus on the spatial overlap, with DICE accentuating the intersection and IOU providing a balanced assessment of the overlap and union.

2. How might the choice of architecture, such as U-Net or FFN, affect the performance and application suitability of your semantic segmentation model?

The architecture chosen for a semantic segmentation model, like U-Net or a Feedforward Neural Network (FFN), plays a decisive role in the model’s performance and suitability for specific applications. U-Net, designed with convolutional and pooling layers followed by upsampling and concatenation, is adept at capturing detailed spatial contexts and fine-grained features, making it particularly suitable for tasks where precise localization and delineation of objects are critical. Its ability to retain resolution and leverage multi-scale information through skip connections allows for accurate segmentation even with limited training data. In contrast, an FFN, structured in layers that propagate information forward only, may not preserve the spatial relationships as effectively, potentially leading to less accuracy in segment-localization tasks. FFNs could be preferable in scenarios where the segmentation problem is less complex or does not require detailed boundary precision. The selection between these architectures should thus align with the complexity of the segmentation task and the level of detail required in the output.
