# Mini Challenge: Deep Learning for Images and Signals
- Name: Nils Fahrni
- Submission Date: t.b.d.

## How does the performance of a U-Net semantic segmentation model differ between scenes of city streets and non-city streets in the BDD100K dataset?

## Package Usage

In [1]:
#%env WANDB_SILENT=True
%env "WANDB_NOTEBOOK_NAME" "dlbs"

import os
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from PIL import Image
import wandb
import random

env: "WANDB_NOTEBOOK_NAME"="dlbs"


In [2]:
RANDOM_SEED = 1337

random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

## Dataset
Berkeley Deep Drive Dataset: https://arxiv.org/abs/1805.04687

In [3]:
import os

BASE_DATA_PATH = os.path.join('data', 'bdd100k', 'images', '10k', 'train')
BASE_LABELS_PATH = os.path.join('data', 'bdd100k', 'labels', 'sem_seg', 'masks', 'train')

### Exploration

[Become one with the data](https://karpathy.github.io/2019/04/25/recipe/#:~:text=1.%20Become%20one%20with%20the%20data)

#### Metrics

#### Looking at some samples

#### Spatial Heatmap

https://doc.bdd100k.com/format.html#semantic-segmentation

In [4]:
class_dict = {
    0: "road",
    1: "sidewalk",
    2: "building",
    3: "wall",
    4: "fence",
    5: "pole",
    6: "traffic light",
    7: "traffic sign",
    8: "vegetation",
    9: "terrain",
    10: "sky",
    11: "person",
    12: "rider",
    13: "car",
    14: "truck",
    15: "bus",
    16: "train",
    17: "motorcycle",
    18: "bicycle"
}

In [5]:
%%script false --no-raise-error

import os
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import random
from tqdm import tqdm

label_folder = BASE_LABELS_PATH

target_width, target_height = 128, 228
N_SAMPLES = 1000

heatmaps = {class_id: np.zeros((target_height, target_width), dtype=np.float32) for class_id in class_dict.keys()}

for class_id, class_name in tqdm(class_dict.items(), desc="Processing Classes"):
    all_files = [f for f in os.listdir(label_folder) if f.endswith('.png')]
    sampled_files = random.sample(all_files, min(N_SAMPLES, len(all_files)))
    
    for file in tqdm(sampled_files, desc=f"Sampling {class_name}", leave=False):
        label_path = os.path.join(label_folder, file)
        with Image.open(label_path) as img:
            label = np.array(img)
            
            label_resized = np.array(Image.fromarray(label).resize((target_width, target_height), Image.NEAREST))

            mask = (label_resized == class_id)
            heatmaps[class_id] += mask.astype(np.float32)

    heatmaps[class_id] /= len(sampled_files)

fig, axs = plt.subplots(4, 5, figsize=(20, 15))
fig.suptitle("Spatial Heatmaps for all Classes", fontsize=20)

for class_id, class_name in class_dict.items():
    ax = axs[class_id // 5, class_id % 5]
    sns.heatmap(heatmaps[class_id], ax=ax, cmap="viridis", cbar=False)
    ax.set_title(class_name)
    ax.axis('off')

for i in range(len(class_dict), 4 * 5):
    fig.delaxes(axs[i // 5, i % 5])

plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()


Couldn't find program: 'false'


**Observations**
- Train, Rider, Motorcycle and bicycle seem to be rather underrepresented since these objects' shapes are still clearly visible and don't have a high overlap

#### Co-Occurence

In [6]:
%%script false --no-raise-error

import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from PIL import Image

label_folders = [BASE_LABELS_PATH]
num_classes = len(class_dict)
class_names = list(class_dict.values())

co_occurrence_matrix = np.zeros((num_classes, num_classes), dtype=np.int32)

for label_folder in label_folders:
    all_files = [f for f in os.listdir(label_folder) if f.endswith('.png')]
    
    for file in tqdm(all_files, desc=f"Processing Masks in {label_folder}"):
        label_path = os.path.join(label_folder, file)
        with Image.open(label_path) as img:
            label = np.array(img)

            unique_classes = np.unique(label)

            for i in range(len(unique_classes)):
                for j in range(i, len(unique_classes)):
                    class_i = unique_classes[i]
                    class_j = unique_classes[j]
                    if class_i < num_classes and class_j < num_classes:
                        co_occurrence_matrix[class_i, class_j] += 1
                        if class_i != class_j:
                            co_occurrence_matrix[class_j, class_i] += 1

plt.figure(figsize=(12, 10))
ax = sns.heatmap(co_occurrence_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.title("Class Co-Occurrence Matrix for Train and Val Sets", pad=20)
plt.xlabel("Class")
plt.ylabel("Class")

ax.xaxis.tick_top()
ax.xaxis.set_label_position('top') 

plt.xticks(rotation=45, ha="left")
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()


Couldn't find program: 'false'


### Training and Evaluation Skeleton

[Set up the end-to-end training/evaluation skeleton + get dumb baselines](https://karpathy.github.io/2019/04/25/recipe/#:~:text=Set%20up%20the%20end%2Dto%2Dend%20training/evaluation%20skeleton%20%2B%20get%20dumb%20baselines)

In [7]:
import torch
import torch.nn as nn
from torcheval.metrics import MulticlassAccuracy

RANDOM_SEED = 1337

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

print(f"Using {device} device")

Using cuda device


#### Data Loading & Splitting

In [8]:
from torch.utils.data import DataLoader
from torchvision import transforms
import numpy as np
import torch

from data import BDD100KDataset, custom_split_dataset_with_det, check_dataset_overlap

DET_TRAIN_PATH = './data/bdd100k/labels/det_20/det_train.json'
DET_VAL_PATH = './data/bdd100k/labels/det_20/det_val.json'

split_data = custom_split_dataset_with_det(base_data_path=BASE_DATA_PATH, 
                                           base_labels_path=BASE_LABELS_PATH, 
                                           det_train_path=DET_TRAIN_PATH, 
                                           det_val_path=DET_VAL_PATH)

check_dataset_overlap(
    split_data['train']['image_filenames'],
    split_data['val']['image_filenames'],
    split_data['test']['image_filenames']
)

image_transform = transforms.Compose([
    transforms.Resize((72, 128)),
    transforms.ToTensor(),
])

label_transform = transforms.Compose([
    transforms.Resize((72, 128), interpolation=transforms.InterpolationMode.NEAREST),
    transforms.Lambda(lambda x: torch.tensor(np.array(x), dtype=torch.long)),
])

train_dataset = BDD100KDataset(
    images_dir=split_data['train']['data_folder'],
    labels_dir=split_data['train']['labels_folder'],
    filenames=split_data['train']['image_filenames'],
    transform=image_transform,
    target_transform=label_transform,
    scene_info=split_data['train']['scene_map']
)

val_dataset = BDD100KDataset(
    images_dir=split_data['val']['data_folder'],
    labels_dir=split_data['val']['labels_folder'],
    filenames=split_data['val']['image_filenames'],
    transform=image_transform,
    target_transform=label_transform,
    scene_info=split_data['val']['scene_map']
)

test_dataset = BDD100KDataset(
    images_dir=split_data['test']['data_folder'],
    labels_dir=split_data['test']['labels_folder'],
    filenames=split_data['test']['image_filenames'],
    transform=image_transform,
    target_transform=label_transform,
    scene_info=split_data['test']['scene_map']
)


--- Split Sizes ---
- Train Images: 2518
- Val Images: 454
- Test Images: 454

--- Overlap Report ---
✔️ No overlap detected between train and validation sets.
✔️ No overlap detected between train and test sets.
✔️ No overlap detected between validation and test sets.



In [9]:
import numpy as np
from collections import Counter
import matplotlib.pyplot as plt
from tqdm import trange

def map_class_names_and_order(class_distribution, class_dict):
    ordered_classes = sorted(class_dict.keys())  # Ensure consistent class order
    class_names = [class_dict[class_id] for class_id in ordered_classes if class_id in class_distribution]
    proportions = [class_distribution[class_id] for class_id in ordered_classes if class_id in class_distribution]
    return class_names, proportions

def plot_class_distribution(class_distribution, title, class_dict):
    class_names, proportions = map_class_names_and_order(class_distribution, class_dict)
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(class_names, proportions, color='skyblue', edgecolor='black')
    
    for bar, proportion in zip(bars, proportions):
        plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height(), 
                 f"{proportion * 100:.2f}%", ha='center', va='bottom', fontsize=9)

    plt.grid(axis='y', linestyle='--', alpha=0.7)

    plt.xlabel('Class')
    plt.ylabel('Proportion of Pixels')
    plt.title(title)
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()

def analyze_class_distribution(dataset, num_classes, dataset_name):
    class_counts = Counter()
    
    for idx in trange(len(dataset), desc=f"Analyzing {dataset_name}"):
        try:
            _, mask, _ = dataset[idx]  # Access dataset item
            mask_array = np.array(mask)  # Convert mask to numpy array
            unique, counts = np.unique(mask_array, return_counts=True)
            class_counts.update(dict(zip(unique, counts)))
        except Exception as e:
            print(f"Error processing index {idx}: {e}")
            continue

    # Normalize counts
    total_pixels = sum(class_counts.values())
    class_distribution = {cls: count / total_pixels for cls, count in class_counts.items()}

    return class_counts, class_distribution

train_class_counts, train_class_distribution = analyze_class_distribution(train_dataset, num_classes=19, dataset_name="Train")
val_class_counts, val_class_distribution = analyze_class_distribution(val_dataset, num_classes=19, dataset_name="Validation")
test_class_counts, test_class_distribution = analyze_class_distribution(test_dataset, num_classes=19, dataset_name="Test")

Analyzing Train: 100%|██████████| 2518/2518 [00:29<00:00, 84.87it/s]
Analyzing Validation: 100%|██████████| 454/454 [00:05<00:00, 81.15it/s]
Analyzing Test: 100%|██████████| 454/454 [00:05<00:00, 78.16it/s]


In [10]:
%%script false --no-raise-error

plot_class_distribution(train_class_distribution, "Train Class Distribution", class_dict)
plot_class_distribution(val_class_distribution, "Validation Class Distribution", class_dict)
plot_class_distribution(test_class_distribution, "Test Class Distribution", class_dict)

Couldn't find program: 'false'


#### Training and Evaluation Skeleton

In [11]:
from trainer import Trainer
from torch.utils.data import DataLoader

train_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=16, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=16, shuffle=False)

In [12]:
import collections

ordered_class_dists = collections.OrderedDict(sorted(train_class_distribution.items()))
class_weights = torch.tensor(list(ordered_class_dists.values()), device=device).float()[:-1]

### Baseline: (Tiny-)U-Net

In [13]:
overfit_datalader = DataLoader(train_dataset[:8], batch_size=8, shuffle=True)

In [14]:
import torch.optim as optim
import torch.nn as nn
from core import UNetBaseline

model = UNetBaseline(num_classes=19).to(device)
criterion = nn.CrossEntropyLoss(ignore_index=255, weight=class_weights)
optimizer = optim.Adam(model.parameters(), lr=3e-4)

Trainer(model,
        criterion, 
        optimizer,
        epochs=50,
        seed=RANDOM_SEED, 
        device=device, 
        verbose=True, 
        run_name="unet_baseline").run(train_dataloader, 
                                      val_dataloader)

Model trainer was already initialized. Skipping wandb initialization.
Model unet_baseline already exists! Skipping training.


### Overfit

[Overfit](https://karpathy.github.io/2019/04/25/recipe/#:~:text=3.-,Overfit,-At%20this%20stage)

In [15]:
from simple_slurm import Slurm

slurm = Slurm(
    gpus=1,
    partition='performance',
    time='1-0:0:0',
    job_name='job_name',
    out=f'slurm_{Slurm.JOB_ID}.log',
    error=f'slurm_{Slurm.JOB_ID}.err',
    cpus_per_task='16'
)

slurm.run('python main.py')

In [17]:
import torch.optim as optim
import torch.nn as nn
from core import UNet

baseline_encoder_dims = [64, 128, 256, 512]
baseline_decoder_dims = [512, 256, 128, 64]

encoder_dims = baseline_encoder_dims[:]
decoder_dims = baseline_decoder_dims[:]

for iteration in range(2):
    print(f"Iteration {iteration + 1}: Training with encoder_dims={encoder_dims} and decoder_dims={decoder_dims}")
    
    model = UNet(num_classes=19, encoder_dims=encoder_dims, decoder_dims=decoder_dims).to(device)
    criterion = nn.CrossEntropyLoss(ignore_index=255, weight=class_weights)
    optimizer = optim.Adam(model.parameters(), lr=3e-4)
    
    Trainer(model,
            criterion, 
            optimizer,
            epochs=50,
            seed=RANDOM_SEED, 
            device=device, 
            verbose=True, 
            run_name=f"unet_overfit_iteration_{iteration + 1}").run(train_dataloader, 
                                                                    val_dataloader)
    
    next_dim = encoder_dims[-1] * 2
    encoder_dims.append(next_dim)
    decoder_dims.insert(0, next_dim)
    print(decoder_dims)

Iteration 1: Training with encoder_dims=[64, 128, 256, 512] and decoder_dims=[512, 256, 128, 64]


0,1
epoch,▁█
train_iou,█▁
train_loss,▁█
val_iou,█▁
val_loss,▁█

0,1
epoch,2.0
train_iou,0.01552
train_loss,4.01882
val_iou,0.01541
val_loss,4.03323


Epoch 1/50 - Train Loss: 0.8648, Train IoU: 0.1291 - Val Loss: 0.5913, Val IoU: 0.1511
Model saved to models\unet_overfit_iteration_1_x8mef9g2.pth with val_loss 0.5913
Epoch 2/50 - Train Loss: 0.5270, Train IoU: 0.1554 - Val Loss: 0.5592, Val IoU: 0.1544
Model saved to models\unet_overfit_iteration_1_x8mef9g2.pth with val_loss 0.5592
Epoch 3/50 - Train Loss: 0.4710, Train IoU: 0.1635 - Val Loss: 0.5121, Val IoU: 0.1662
Model saved to models\unet_overfit_iteration_1_x8mef9g2.pth with val_loss 0.5121
Epoch 4/50 - Train Loss: 0.4407, Train IoU: 0.1685 - Val Loss: 0.4919, Val IoU: 0.1680
Model saved to models\unet_overfit_iteration_1_x8mef9g2.pth with val_loss 0.4919
Epoch 5/50 - Train Loss: 0.4138, Train IoU: 0.1730 - Val Loss: 0.4583, Val IoU: 0.1743
Model saved to models\unet_overfit_iteration_1_x8mef9g2.pth with val_loss 0.4583
Epoch 6/50 - Train Loss: 0.3936, Train IoU: 0.1766 - Val Loss: 0.4957, Val IoU: 0.1713
Epoch 7/50 - Train Loss: 0.3709, Train IoU: 0.1802 - Val Loss: 0.4282, Va

KeyboardInterrupt: 

### Regularization

[Regularize](https://karpathy.github.io/2019/04/25/recipe/#:~:text=4.-,Regularize,-Ideally%2C%20we%20are)

### Tuning the model

[Tune](https://karpathy.github.io/2019/04/25/recipe/#:~:text=5.-,Tune,-You%20should%20now)

### Ensembles & Leave it training

[Squeeze out the juice](https://karpathy.github.io/2019/04/25/recipe/#:~:text=Squeeze%20out%20the%20juice)