In a new python environment with python>=3.10

In [None]:
!pip install "torch_uncertainty[image] @ git+https://github.com/ENSTA-U2IS-AI/torch-uncertainty@dev"

In [None]:
# here are the training parameters
batch_size = 10
learning_rate =1e-3
weight_decay=2e-4
lr_decay_epochs=20
lr_decay=0.1
nb_epochs=50
# Skip training and load model locally.
# If never trained, set to False to train and save the model first.
skip_training = True # True to skip

In [None]:
import torch
import numpy as np
from einops import rearrange
from torchvision import tv_tensors
from torchvision.transforms import v2
from torchvision.transforms.v2 import functional as F

from torch_uncertainty.datasets import MUAD


import os
#My personal token
os.environ["HF_TOKEN"] = "HF_TOKEN_PLACEHOLDER"

train_transform = v2.Compose(
    [
        v2.Resize(size=(256, 512), antialias=True),
        v2.RandomHorizontalFlip(),
        v2.ToDtype(
            dtype={
                tv_tensors.Image: torch.float32,
                tv_tensors.Mask: torch.int64,
                "others": None,
            },
            scale=True,
        ),
        v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

val_transform = v2.Compose(
    [
        v2.Resize(size=(256, 512), antialias=True),
        v2.ToDtype(
            dtype={
                tv_tensors.Image: torch.float32,
                tv_tensors.Mask: torch.int64,
                "others": None,
            },
            scale=True,
        ),
        v2.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ]
)

train_set = MUAD(root="./data", target_type="semantic", version="small", split="train" , transforms=train_transform, download=True)
val_set = MUAD(root="./data", target_type="semantic", version="small", split="val" , transforms=val_transform, download=True)
test_set = MUAD(root="./data", target_type="semantic", version="small", split="test" , transforms=val_transform, download=True)

Let us see the first sample of the validation set. The first image is the input and the second image is the target (ground truth).

In [None]:
sample = train_set[0]
img, tgt = sample
img.size(), tgt.size()

Visualize a validation input sample (and RGB image)

In [None]:
# Undo normalization on the image and convert to uint8.
mean = torch.tensor([0.485, 0.456, 0.406], device=img.device)
std = torch.tensor([0.229, 0.224, 0.225], device=img.device)
img = img * std[:, None, None] + mean[:, None, None]
img = F.to_dtype(img, torch.uint8, scale=True)
F.to_pil_image(img)

Visualize the same image above but segmented (our goal)

In [None]:
from torchvision.utils import draw_segmentation_masks

tmp_tgt = tgt.masked_fill(tgt == 255, 21)
tgt_masks = tmp_tgt == torch.arange(22, device=tgt.device)[:, None, None]
img_segmented = draw_segmentation_masks(img, tgt_masks, alpha=1, colors=val_set.color_palette)
F.to_pil_image(img_segmented)

Below is the complete list of classes in MUAD, presented as:

1.   Class Name
2.   Train ID
3.   Segmentation Color in RGB format [R,G, B].

In [None]:
for muad_class in train_set.classes:
    class_name = muad_class.name
    train_id = muad_class.id
    color = muad_class.color
    print(f"Class: {class_name}, Train ID: {train_id}, Color: {color}")

Here is a more comprhensive review of the diffrent classes : (while training Non-labeled data will use train ID 21 and not 255)


| **class names**                       | **ID** |
|----------------------------------------|---------|
| road                                   | 0       |
| sidewalk                               | 1       |
| building                               | 2       |
| wall                                   | 3       |
| fence                                  | 4       |
| pole                                   | 5       |
| traffic light                          | 6       |
| traffic sign                           | 7       |
| vegetation                             | 8       |
| terrain                                | 9       |
| sky                                    | 10      |
| person                                 | 11      |
| rider                                  | 12      |
| car                                    | 13      |
| truck                                  | 14      |
| bus                                    | 15      |
| train                                  | 16      |
| motorcycle                             | 17      |
| bicycle                                | 18      |
| bear deer cow                          | 19      |
| garbage_bag stand_food trash_can       | 20      |


We will feed our DNN the first raw image of the road view and as target it will be the dark image below and not the colored one (second image)

In [None]:
im = F.to_pil_image(F.to_dtype(tgt, torch.uint8))
im

In [None]:
im.size
print(np.array(im))

**Why is the target image dark and what's the bright part ?** **(hint : print the numpy array)**

A: The most part of the image is dark because the value of each pixel (elements in the printed array, e.g., 2, 8, 0, and so on) represents the class ID as the table shown before. Since the pixel range is $[0,255]$ where 0 represents black and 255 represents white, we see most part of the image is black with the fact that most items are recognized by the model, while some of them are not and labeled with class ID 255, being white in the image.

**Q3/ please study the dataset a bit. What it is about?**

A3: MUAD is a uncertainty benchmark dataset for various tasks in autonomous driving. According to the configuration, we are now using the small version of MUAD, with a target of semantics (instead of depth). The corresponding ground truth information each sample in the dataset includes mainly semantic segmentation and the depth map, and the configuration shows our goal of semantic segmentation.

In [None]:
import numpy as np
import torch
from torch.utils.data import DataLoader

train_loader = DataLoader(
        train_set,
        batch_size=batch_size,
        shuffle=True,
        num_workers=4)

val_loader = DataLoader(
        val_set,
        batch_size=batch_size,
        shuffle=False,
        num_workers=4)

test_loader = DataLoader(
        test_set,
        batch_size=batch_size,
        shuffle=False,
        num_workers=4)


In [None]:
def enet_weighing(dataloader, num_classes, c=1.02):
    """Computes class weights as described in the ENet paper.

        w_class = 1 / (ln(c + p_class)),

    where c is usually 1.02 and p_class is the propensity score of that
    class:

        propensity_score = freq_class / total_pixels.

    References:
        https://arxiv.org/abs/1606.02147

    Args:
        dataloader (``data.Dataloader``): A data loader to iterate over the
            dataset.
        num_classes (``int``): The number of classes.
        c (``int``, optional): AN additional hyper-parameter which restricts
            the interval of values for the weights. Default: 1.02.

    """
    class_count = 0
    total = 0
    for _, label in dataloader:
      label = label.cpu().numpy()
      # Flatten label
      flat_label = label.flatten()
      flat_label = flat_label[flat_label != 255]

      # Sum up the number of pixels of each class and the total pixel
      # counts for each label
      class_count += np.bincount(flat_label, minlength=num_classes)
      total += flat_label.size

    # Compute propensity score and then the weights for each class
    propensity_score = class_count / total
    return 1 / (np.log(c + propensity_score))

In [None]:
print("\nComputing class weights...")
print("(this can take a while depending on the dataset size)")
class_weights = enet_weighing(train_loader, 19)
class_weights = torch.from_numpy(class_weights).float().cuda()
print("Class weights:", class_weights)

**Q4/ why do we need to evaluate the class_weights?**

A4: The sample number of each class is different (i.e., the dataset itself is not balanced). This means potential big differences among the weights of classes, which leads to the case that the model is biased to the majority classes with many samples in the dataset and ignore the minority. We introduce a custom class weighing scheme from ENet to restrict the weights in a certain range, so as for weighting the loss function during training to signal to the model that it should pay greater attention to samples from minority classes.

## C. building the DNN

**Q5/ Do we really use Unet? What did I change :)? (that is hard)**

A5: Yes, but a little bit different from the original Unet. There are no dropout layers in the decoder part of the original one. You insert dropout layers for uncertainty evaluation.



**Q6/Do we need a backbone with Unet?**

A6: No, we don't, at least not necessary. The Unet is a complete encoder-decoder architecture which can complete the end-to-end problem, i.e., takes as input the images and outputs the segmentation map. Though we can load pretrained networks like resnets as the backbone, or encoder, to enhance the performance, that is not necessary.




In [None]:
from torch import nn


class DoubleConv(nn.Module):
    """(conv => BN => ReLU) * 2."""

    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.conv(x)


class InConv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.conv = DoubleConv(in_ch, out_ch)

    def forward(self, x):
        return self.conv(x)


class Down(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.mpconv = nn.Sequential(
            nn.MaxPool2d(2),
            DoubleConv(in_ch, out_ch)
        )

    def forward(self, x):
        return self.mpconv(x)


class Up(nn.Module):
    def __init__(self, in_ch, out_ch, bilinear=True):
        super().__init__()
        self.bilinear = bilinear

        self.up = nn.ConvTranspose2d(in_ch // 2, in_ch // 2, 2, stride=2)

        self.conv = DoubleConv(in_ch, out_ch)

    def forward(self, x1, x2):
        if self.bilinear:
            x1 = F.resize(x1, size=[2*x1.size()[2],2*x1.size()[3]],
                          interpolation=v2.InterpolationMode.BILINEAR)
        else:
            x1 = self.up(x1)

        # input is CHW
        diff_y = x2.size()[2] - x1.size()[2]
        diff_x = x2.size()[3] - x1.size()[3]

        x1 = F.pad(x1, [diff_x // 2, diff_x - diff_x // 2,
                        diff_y // 2, diff_y - diff_y // 2])

        # for padding issues, see
        # https://github.com/HaiyongJiang/U-Net-Pytorch-Unstructured-Buggy/commit/0e854509c2cea854e247a9c615f175f76fbb2e3a
        # https://github.com/xiaopeng-liao/Pytorch-UNet/commit/8ebac70e633bac59fc22bb5195e513d5832fb3bd

        x = torch.cat([x2, x1], dim=1)
        return self.conv(x)


class OutConv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.conv = nn.Conv2d(in_ch, out_ch, 1)

    def forward(self, x):
        return self.conv(x)

#please note that we have added dropout layer to be abble to use MC dropout

class UNet(nn.Module):
    def __init__(self, classes):
        super().__init__()
        self.inc = InConv(3, 32)
        self.down1 = Down(32, 64)
        self.down2 = Down(64, 128)
        self.down3 = Down(128, 256)
        self.down4 = Down(256, 256)
        self.up1 = Up(512, 128)
        self.up2 = Up(256, 64)
        self.up3 = Up(128, 32)
        self.up4 = Up(64, 32)
        self.dropout = nn.Dropout2d(0.1)
        self.outc = OutConv(32, classes)

    def forward(self, x):
        x1 = self.inc(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x4 = self.down3(x3)
        x5 = self.down4(x4)
        x = self.up1(x5, x4)
        x = self.dropout(x)
        x = self.up2(x, x3)
        x = self.dropout(x)
        x = self.up3(x, x2)
        x = self.dropout(x)
        x = self.up4(x, x1)
        x = self.dropout(x)
        return self.outc(x)

## D. Utility functions

In [None]:
import matplotlib.pyplot as plt

# Colors from Colorbrewer Paired_12
colors = [[31, 120, 180], [51, 160, 44]]
colors = [(r / 255, g / 255, b / 255) for (r, g, b) in colors]

def plot_losses(train_history, val_history):
    x = np.arange(1, len(train_history) + 1)

    plt.figure(figsize=(8, 6))
    plt.plot(x, train_history, color=colors[0], label="Training loss", linewidth=2)
    plt.plot(x, val_history, color=colors[1], label="Validation loss", linewidth=2)
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend(loc="upper right")
    plt.title("Evolution of the training and validation loss")
    plt.show()

def plot_accu(train_history, val_history):
    x = np.arange(1, len(train_history) + 1)

    plt.figure(figsize=(8, 6))
    plt.plot(x, train_history, color=colors[0], label="Training miou", linewidth=2)
    plt.plot(x, val_history, color=colors[1], label="Validation miou", linewidth=2)
    plt.xlabel("Epoch")
    plt.ylabel("Mean IoU")
    plt.legend(loc="upper right")
    plt.title("Evolution of Miou")
    plt.show()

**Q7/  what is the IoU?**

A7: IoU, Intersection over Union, is defined as:
$$

IoU = \frac{\text{area of overlap}}{\text{area of union}}

$$
This is a metric to evaluate the accuracy of the model, which preliminarily satisfies the requirement for calculating the geometric similarity between two images, providing a straightforward implementation of image overlap measurement. However, it fails to account for the distance between the two shapes or the similarity in their aspect ratios.


### Training function

**Q8/Please complete the training and the test function**

In [None]:
from torchmetrics.utilities.compute import _safe_divide


def train( model, data_loader, optim, criterion, metric,iteration_loss=False):
    model.train()
    epoch_loss = 0.0
    metric.reset()
    for step, batch_data in enumerate(data_loader):
        # Get the inputs and labels
        img = batch_data[0].cuda()
        labels = batch_data[1].cuda()
        labels[labels >= 19] = 255
        # Squeeze the channel dimension if present: [B, 1, H, W] -> [B, H, W]
        if labels.ndim == 4 and labels.shape[1] == 1:
            labels = labels.squeeze(1)
        # Forward propagation
        outputs = model(img)
        
        # Loss computation
        loss = criterion(outputs, labels)

        # Backpropagation
        optim.zero_grad()
        loss.backward()
        optim.step()


        # Flatten the outputs and labels for metric computation
        flatten_logits = rearrange(outputs, "b c h w -> (b h w) c")
        flatten_labels = labels.flatten()
        valid_mask = flatten_labels != 255
        
        # Keep track of loss for current epoch
        epoch_loss += loss.item()

        # Keep track of the evaluation metric
        metric.update(flatten_logits[valid_mask].detach(), flatten_labels[valid_mask].detach())

        if iteration_loss:
            print("[Step: %d] Iteration loss: %.4f" % (step, loss.item()))

    # Compute IoU per class
    tp, fp, _, fn = metric._final_state()
    iou_per_class = _safe_divide(tp, tp + fp + fn, zero_division=float("nan"))

    return epoch_loss / len(data_loader), iou_per_class, metric.compute()

### Validation function

In [None]:
def test(model, data_loader, criterion, metric, iteration_loss=False):
    model.eval()
    epoch_loss = 0.0
    metric.reset()
    for step, batch_data in enumerate(data_loader):
        # Get the inputs and labels
        img = batch_data[0].cuda()
        labels = batch_data[1].cuda()
        labels[labels >= 19] = 255
        # Squeeze the channel dimension if present: [B, 1, H, W] -> [B, H, W]
        if labels.ndim == 4 and labels.shape[1] == 1:
            labels = labels.squeeze(1)
        with torch.no_grad():
            # Forward propagation
            outputs = model(img)
            
            # Flatten the outputs and labels for metric computation
            flatten_logits = rearrange(outputs, "b c h w -> (b h w) c")
            flatten_labels = labels.flatten()
            valid_mask = flatten_labels != 255

            # Loss computation
            loss = criterion(outputs, labels)

        # Keep track of loss for current epoch
        epoch_loss += loss.item()

        # Keep track of evaluation the metric
        metric.update(flatten_logits[valid_mask], flatten_labels[valid_mask])

        if iteration_loss:
            print("[Step: %d] Iteration loss: %.4f" % (step, loss.item()))

    # Compute IoU per class
    tp, fp, _, fn = metric._final_state()
    iou_per_class = _safe_divide(tp, tp + fp + fn, zero_division=float("nan"))

    return epoch_loss / len(data_loader), iou_per_class, metric.compute()

## E. Training Process

**Q9/ please train your DNN and comment?**



In [None]:
from torch import optim
from torch.optim import lr_scheduler

from torch_uncertainty.metrics.segmentation import MeanIntersectionOverUnion

print("\nTraining...\n")
num_classes = 19
# Intialize UNet
model = UNet(num_classes)
model = model.cuda()

# We are going to use the CrossEntropyLoss loss function as it's most
# frequentely used in classification problems with multiple classes which
# fits the problem. This criterion  combines LogSoftMax and NLLLoss.
criterion = torch.nn.CrossEntropyLoss(weight=class_weights,ignore_index=255)
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=nb_epochs)
print("\nRe-computing class weights for 19 classes...")
full_weights = enet_weighing(train_loader, 21) 
class_weights = torch.from_numpy(full_weights[:19]).float().cuda() # 只取前19个

print("Class weights shape:", class_weights.shape)
print("Class weights:", class_weights)

In [None]:
# Start Training
# Training loop
train_losses = []
train_IoU = []
test_losses = []
test_IoU = []

# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

best_iou = 0.0
# Initialize metric objects globally for the loop (optional, but cleaner if reset inside)
metric_train_obj = MeanIntersectionOverUnion(num_classes=num_classes).to(device)
metric_test_obj = MeanIntersectionOverUnion(num_classes=num_classes).to(device)

for epoch in range(nb_epochs):
    print(f"Epoch {epoch+1}/{nb_epochs}")
    
    # Train and Validate
    # Note: train/test functions return (loss, per_class_iou, mean_iou) based on your implementation
    train_loss, IoU_per_class_train, _ = train(model, train_loader, optimizer, criterion, metric_train_obj)
    test_loss, IoU_per_class_test, _ = test(model, test_loader, criterion, metric_test_obj)
    
    # Scheduler step
    scheduler.step()
    
    # Record history
    train_losses.append(train_loss)
    # Calculate mean of per-class IoU manually or use the 3rd return value from your function
    train_IoU.append(IoU_per_class_train.nanmean().item()) 
    
    test_losses.append(test_loss)
    test_miou = IoU_per_class_test.nanmean().item()
    test_IoU.append(test_miou)
    
    print(f"  Train Loss: {train_loss:.4f} | Train mIoU: {train_IoU[-1]:.4f}")
    print(f"  Test Loss:  {test_loss:.4f} | Test mIoU:  {test_miou:.4f}")

    # Save best model logic
    if test_miou > best_iou:
        best_iou = test_miou
        torch.save(model.state_dict(), 'unet_best.pth')
        print("  -> Best model saved!")

# Save final model as well
torch.save(model.state_dict(), 'unet.pth')

# Plotting ONCE at the end
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss', color=colors[0])
plt.plot(test_losses, label='Test Loss', color=colors[1])
plt.title('Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(train_IoU, label='Train IoU', color=colors[0])
plt.plot(test_IoU, label='Test IoU', color=colors[1])
plt.title('mIoU over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Mean IoU')
plt.legend()

plt.tight_layout()
plt.savefig("training_curves_unet.png", dpi=200, bbox_inches="tight")
plt.show()

Load a model

In [None]:
#Loading a model
model = UNet(19)
model.load_state_dict(torch.load("unet.pth"))
model = model.to("cuda")

# III. Evalution of the Trained DNN on the test

## A. classical evaluations

**Q10/ please plot the loss and miou and comment about it ?**
Both the training and test loss curves show a similar downward trend, stabilizing around 0.3. However, there's a notable difference in the mean Intersection over Union (mIoU) values: the training mIoU reaches approximately 0.75, whereas the test mIoU plateaus at around 0.65. This discrepancy indicates a definite overfitting to the training dataset.

A10: The curve is similar to the description, we note that the model is potentially overfitting.

In [None]:
plot_losses(train_losses, test_losses)

In [None]:
plot_accu(train_IoU, test_IoU)

**Q11/ what should we have done to avoid overfitting?**

A11: We have introduced dropout layers in the decoder part of Unet. To further avoid overfitting, we can:

1. Increase the dropout values (e.g., 0.2 to 0.3)
2. Augment the dataset with random erasing, cropping, erasing, and so on, in order to force the model to learn local features of an image.
3. Use an optimizer with weight decay.

In [None]:
# Now we evaluate the model on all the test set.
loss, iou, miou = test(model, test_loader, criterion, metric_test_obj)
print(">>>> [FINAL TEST on the test set: ] Avg. loss: ", loss ," | Mean IoU: ", miou)
class_names = [c.name for c in test_set.classes if c.id < 19]
# Print per class IoU on last epoch or if best iou
for key, class_iou in zip(class_names, iou, strict=True):
  print(f"{key}: {class_iou:.4f}")

## B. Uncertainty evaluations with MCP
Here you will just use as confidence score the Maximum class probability (MCP)


In [None]:
sample_idx = 0
img, target = test_set[sample_idx]

batch_img = img.unsqueeze(0).cuda()
batch_target = target.unsqueeze(0).cuda()
model.eval()
with torch.no_grad():
	# Forward propagation
	outputs = model(batch_img)
	outputs_proba = outputs.softmax(dim=1)
	# remove the batch dimension
	outputs_proba = outputs_proba.squeeze(0)
	confidence, pred = outputs_proba.max(0)

In [None]:
# Undo normalization on the image and convert to uint8.
mean = torch.tensor([0.485, 0.456, 0.406], device=img.device)
std = torch.tensor([0.229, 0.224, 0.225], device=img.device)
img = img * std[:, None, None] + mean[:, None, None]
img = F.to_dtype(img, torch.uint8, scale=True)

tmp_target = target.masked_fill(target == 255, 21)
target_masks = tmp_target == torch.arange(22, device=target.device)[:, None, None]
img_segmented = draw_segmentation_masks(img, target_masks, alpha=1, colors=test_set.color_palette)

pred_masks = pred == torch.arange(22, device=pred.device)[:, None, None]

pred_img = draw_segmentation_masks(img, pred_masks, alpha=1, colors=test_set.color_palette)

img = F.to_pil_image(img)
img_segmented = F.to_pil_image(img_segmented)
confidence_img = F.to_pil_image(confidence)
pred_img = F.to_pil_image(pred_img)

fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(30, 15))
ax1.imshow(img)
ax2.imshow(img_segmented)
ax3.imshow(pred_img)
ax4.imshow(confidence_img)
plt.show()

**Q12/ The last image is the related to the confidence score of the DNN. Can you explain why? What does the birght areas represent and what does the dark areas represent?**

Because the image visualizes the maximum class probability, a proxy for the model's confidence. Bright and yellow areas represent high confidence, meaning that the model is confident that its prediction in these areas. The dark areas represent low confidence, and the model represents uncertainty here.

### Now let's load the OOD test set

In [None]:
test_ood_set = MUAD(root="./data", target_type="semantic", version="small", split="ood" , transforms=val_transform, download=True)
test_ood_set

In [None]:
sample_idx = 0
img, target = test_ood_set[sample_idx]

batch_img = img.unsqueeze(0).cuda()
batch_target = target.unsqueeze(0).cuda()
model.eval()
with torch.no_grad():
	# Forward propagation
	outputs = model(batch_img)
	outputs_proba = outputs.softmax(dim=1)
	# remove the batch dimension
	outputs_proba = outputs_proba.squeeze(0)
	confidence, pred = outputs_proba.max(0)

In [None]:
# Undo normalization on the image and convert to uint8.
mean = torch.tensor([0.485, 0.456, 0.406], device=img.device)
std = torch.tensor([0.229, 0.224, 0.225], device=img.device)
img = img * std[:, None, None] + mean[:, None, None]
img = F.to_dtype(img, torch.uint8, scale=True)

tmp_target = target.masked_fill(target == 255, 21)
target_masks = tmp_target == torch.arange(22, device=target.device)[:, None, None]
img_segmented = draw_segmentation_masks(img, target_masks, alpha=1, colors=test_set.color_palette)

pred_masks = pred == torch.arange(22, device=pred.device)[:, None, None]

pred_img = draw_segmentation_masks(img, pred_masks, alpha=1, colors=test_set.color_palette)

img_pil = F.to_pil_image(img)
img_segmented = F.to_pil_image(img_segmented)
confidence_img = F.to_pil_image(confidence)
pred_img = F.to_pil_image(pred_img)

fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(30, 15))
ax1.imshow(img_pil)
ax2.imshow(img_segmented)
ax3.imshow(pred_img)
ax4.imshow(confidence_img)
plt.show()

**According to the output is the model confident when it comes to labeling the bear and goat ? How about the bench ?**

A: The model is over-confident when it comes to labeling the bear, the goat, and the bench. We train the model under the case that we ignore classes with ID greater than 19, such that those classes can be used as OOD objects. In the last image, the areas corresponding to the bear and the goat, and the bench are darker than the surrounding yellow marked items that the model is confident about, but not dark enough. They are still quite bright, i.e., the model is over-confident about these OOD objects, though the model doesn't know what they are. This means the model incorrectly classifies them as something In-Distribution. In specific, the bear in the 3rd image is marked in blue, which means it is classified as a car, and similarly the goat is classified as a pedestrian, the bench is divided into several parts corresponding to sidewalk, fence, and pedestrian.


**Q12 bis/ The last image is the related to the confidence score of the DNN. Can you explain why?**
**Are you happy with this image?**

Because the image visualizes the maximum class probability, a proxy for the model's confidence. As we mentioned before, bright and yellow areas represent high confidence, meaning that the model is confident that its prediction in these areas. The dark areas represent low confidence, and the model represents uncertainty here.

We are not happy with this image, because the model is over-confident and tends to classify OOD objects as ID objects. Ideally, for those OOD objects, corresponding areas in the confidence map should be dark or blue, however, the fact that it is bright yellow indicates that the model is poorly calibrated and fails to flag these anomalies as unknown.

## C. Uncertainty evaluations with Temperature Scaling
**Q13/ please implement a temperature scaling using torch_uncertainty**

Before Temprature scaling

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset
from torch_uncertainty.post_processing import TemperatureScaler

# -------------------------------------------------------------------
# 1. Extract Logits and Targets from Validation Set
# -------------------------------------------------------------------
logits_list = []
targets_list = []
model.eval()

print("Extracting validation logits and targets for calibration...")
with torch.no_grad():
    for images, labels in val_loader:
        images = images.cuda()
        labels = labels.cuda()
        logits = model(images)
        logits_list.append(logits)
        targets_list.append(labels)

# Concatenate all batches
# val_logits: (N, C, H, W)
# val_targets: (N, 1, H, W) or (N, H, W)
val_logits = torch.cat(logits_list)
val_targets = torch.cat(targets_list)

# Remove channel dim from targets if present: (N, 1, H, W) -> (N, H, W)
if val_targets.dim() == 4 and val_targets.shape[1] == 1:
    val_targets = val_targets.squeeze(1)

# -------------------------------------------------------------------
# 2. Filter Ignore Index (255) & Flatten
# -------------------------------------------------------------------
# Move channels to last dim for flattening: (N, H, W, C)
val_logits = val_logits.permute(0, 2, 3, 1)

# Flatten to pixel-wise: (N*H*W, C) and (N*H*W,)
val_logits_flat = val_logits.reshape(-1, val_logits.shape[-1])
val_targets_flat = val_targets.reshape(-1)

# Create mask for valid pixels (not 255)
# Note: Ensure targets are Long type
val_targets_flat = val_targets_flat.long()
mask = val_targets_flat != 255

# Apply mask
valid_logits = val_logits_flat[mask]
valid_targets = val_targets_flat[mask]

print(f"Original pixels: {val_targets_flat.shape[0]}, Valid pixels: {valid_targets.shape[0]}")

# -------------------------------------------------------------------
# 3. Prepare Calibration DataLoader (Pixel-wise)
# -------------------------------------------------------------------
# We use a large batch size for scalar fitting since it's just efficient computation
calibration_dataset = TensorDataset(valid_logits, valid_targets)
calibration_loader = DataLoader(calibration_dataset, batch_size=4096, shuffle=True)

# -------------------------------------------------------------------
# 4. Initialize and Fit TemperatureScaler
# -------------------------------------------------------------------
scaler = TemperatureScaler(init_val=1.0)
scaler = scaler.cuda()

print("Fitting TemperatureScaler...")
scaler.fit(calibration_loader)

# -------------------------------------------------------------------
# 5. Create Calibrated Model Wrapper and Evaluate
# -------------------------------------------------------------------
class ModelWithTemperature(torch.nn.Module):
    def __init__(self, model, scaler):
        super().__init__()
        self.model = model
        self.scaler = scaler
    def forward(self, input):
        return self.scaler(self.model(input))

calibrated_model = ModelWithTemperature(model, scaler)

print("Evaluating calibrated model...")
loss, iou, miou = test(calibrated_model, test_loader, criterion, metric_test_obj)
print(f"After Temperature Scaling - Avg. loss: {loss:.4f} | Mean IoU: {miou:.4f}")

**Seeing the two graphs above comment on the MCP unceratinty result, is the model overconfident or calibrated ?**

After temperature scaling

In [None]:
from torch_uncertainty.post_processing import TemperatureScaler



Now let's see the new confidence score image after scaling

In [None]:
import torch
from torch import nn, optim
from torch_uncertainty.metrics.classification import CalibrationError
import matplotlib.pyplot as plt


print("Extracting validation logits and targets...")
logits_list = []
targets_list = []
model.eval()

with torch.no_grad():
    for images, labels in val_loader:
        images = images.cuda()
        labels = labels.cuda()
        
        if labels.ndim == 4:
            labels = labels.squeeze(1)
            
        logits = model(images)
        logits_list.append(logits)
        targets_list.append(labels)asdfghjklxcvbnm,./+63
		

val_logits = torch.cat(logits_list).detach()
val_targets = torch.cat(targets_list).detach()


temperature = nn.Parameter(torch.ones(1).cuda() * 1.5)


nll_criterion = nn.CrossEntropyLoss(ignore_index=255) 

optimizer = optim.LBFGS([temperature], lr=0.01, max_iter=50)

def eval_nll():
    optimizer.zero_grad()
    loss = nll_criterion(val_logits / temperature, val_targets)
    loss.backward()
    return loss

print(f"Temperature before: {temperature.item():.4f}")
optimizer.step(eval_nll)
print(f"Temperature after:  {temperature.item():.4f}")


class ModelWithTemperature(nn.Module):
    def __init__(self, model, temperature):
        super().__init__()
        self.model = model
        self.temperature = temperature
    def forward(self, input):
        return self.model(input) / self.temperature

calibrated_model = ModelWithTemperature(model, temperature)

print("\nEvaluating calibrated model...")
loss, iou, miou = test(calibrated_model, test_loader, criterion, metric_test_obj)
print(f"After Scaling - Avg. loss: {loss:.4f} | Mean IoU: {miou:.4f}")


ece_metric = CalibrationError(task="multiclass", num_classes=19, num_bins=15, norm="l1")
calibrated_model.eval()
all_probs = []
all_targets = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.cuda()
        labels = labels.cuda()
        if labels.ndim == 4: labels = labels.squeeze(1) # Fix维度
        
        logits = calibrated_model(images)
        probs = torch.nn.functional.softmax(logits, dim=1)
        
        probs = probs.permute(0, 2, 3, 1).reshape(-1, 19)
        labels = labels.reshape(-1)
        
        mask = labels != 255
        all_probs.append(probs[mask])
        all_targets.append(labels[mask])

test_probs = torch.cat(all_probs)
test_targets = torch.cat(all_targets)

print(f"Test ECE after Scaling: {ece_metric(test_probs, test_targets).item():.4f}")

try:
    ece_metric.plot()
    plt.show()
except:
    pass

**Did the model get more confident ? or is it more calibrated ? Commnet on the temperature scaling graphs and results**

## D. Uncertainty evaluations with MC Dropout

Let us implement **MC dropout**. This technique decribed in [this paper](https://arxiv.org/abs/1506.02142) allow us to have a better confindence score by using the dropout during test time.



**Q\14 Please implement MC Dropout using torch_uncertainty**

In [None]:
from torch_uncertainty.models.wrappers.mc_dropout import mc_dropout


**Try the MC dropout code with a low number of estimators T like 3 and a high number 20, Explain the diffrence seen on the confidence image, is the model getting more confident or less ?**

## E. Uncertainty evaluations with Deep Ensembles
**Q\15 Please implement [Deep Ensembles](https://papers.nips.cc/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf).**


1.   You need to train 3 DNNs and save it. (Go back to the training cell above and train and save 3 diffrent models)
2.   Use TorchUncertainty to get predictions

You have two options either train several models using the code above or use TU to train the ensemble of models in parallel.

In [None]:
from torch_uncertainty.models import deep_ensembles

Test your ensemble obtained either using option 1 or 2.

In [None]:
results = trainer.test(ens_routine, test_loader)

Save the ensemble model

In [None]:
final_model_path = "ensemble.pth"
torch.save(ensemble.state_dict(), final_model_path)
print(f"Model saved to {final_model_path}")

## F. Uncertainty evaluations with Packed-Ensembles
**Q\15 Please read [Packed-Ensembles](https://arxiv.org/pdf/2210.09184). Then Implement a Packed-Ensembles Unet and train it and evaluate its Uncertainty**


**Please conclude your report**