# Task 2.2: Training the model

In this notebook, I convert the BDD100k labels to COCO format, train a Faster R-CNN ResNet-50 FPN network for 5 epochs, and save the model weights locally.

Unfortunately, you will not be able to run this without installing all the necessary packages and making some serious adjustments to the paths. 

Please read along, as I have tried my best to explain via markdown and comments.

### The first step is to convert the annotations into COCO format. I did this by hand.

In [None]:
# Since we already know the classes from analysis time, let's just
# hardcode a dict to avoid counting everything again

# Here I have made a mistake for which I have paid for dearly during evaluation
# The indices should start at 1, but I am leaving it as-is to demonstrate what I have done
category_ids = {
    "traffic sign": 0,
    "traffic light": 1,
    "car": 2,
    "rider": 3,
    "motor": 4, # Motorcycle, haha
    "person": 5,
    "bus": 6,
    "truck": 7,
    "bike": 8,
    "train": 9
}

# Also, the image size is fixed for the whole dataset
IMAGE_WIDTH = 1280
IMAGE_HEIGHT = 720


In [2]:
from tqdm import tqdm
from itertools import islice # Needed to create subset
import json

def convert_bdd_to_coco(bdd_json, config):
    split, size = config # Not using size in this run
    print(f"Starting conversion for BDD {split} labels.")
    coco_json = {
        "type": "instances"
    }

    # Setting the categories attribute is straightforward,
    # since we have already assigned ids to each of the
    # 10 classes, and they have no supercategory
    coco_json["categories"] = [
        {
            "id": id,
            "name": category_name,
            "supercategory": "none"
        }
        for category_name, id in category_ids.items()
    ]

    # Now, we shall iterate over each of the objects of the
    # bdd_json, and populate the images and annotations keys
    # of our coco_json object
    coco_json["images"] = []
    coco_json["annotations"] = []
    for img_index, obj in enumerate(tqdm(bdd_json, total=len(bdd_json))):
        image_obj = {
            "file_name": obj["name"],
            "height": IMAGE_HEIGHT,
            "width": IMAGE_WIDTH,
            "id": img_index
        }
        image_has_valid_labels = False
        for label in obj["labels"]:
            if label["category"] in category_ids.keys():
                image_has_valid_labels = True
                x1 = label["box2d"]["x1"]
                y1 = label["box2d"]["y1"]
                x2 = label["box2d"]["x2"]
                y2 = label["box2d"]["y2"]
                # Build annotation object from extracted information
                annotation = {
                    "id": label["id"],
                    "image_id": img_index,
                    "category_id": category_ids[label["category"]],
                    "bbox": [x1, y1, x2 - x1, y2 - y1],
                    "area": float((x2 - x1) * (y2 - y1)),
                    "iscrowd": 0,
                    "ignore": 0,
                    "segmentation": [x1, y1, x1, y2, x2, y2, x2, y1]
                }
                coco_json["annotations"].append(annotation)
        if image_has_valid_labels:
            coco_json["images"].append(image_obj)

    # Finally, write the coco_json to a label file
    with open(f"/home/ghosh/content/data/labels_coco/{split}.json", "w") as f:
        json.dump(coco_json, f)
    print(f"Finished conversion for BDD {split} labels.")

In [3]:
# Element at index 1 of tuple is only required when using a subset
# of the data. In this case, the subset is 10%. But I won't use it.
configs = [("train", 7000), ("val", 1000)]
for config in configs:
    # Read the bdd label json and generate coco labels
    bdd_json_path = f"/home/ghosh/content/data/labels_json/bdd100k_labels_images_{config[0]}.json"
    with open(bdd_json_path, "r") as f:
        bdd_json = json.load(f)
    convert_bdd_to_coco(bdd_json, config)

Starting conversion for BDD train labels.


100%|██████████| 69863/69863 [00:03<00:00, 21088.21it/s]


Finished conversion for BDD train labels.
Starting conversion for BDD val labels.


100%|██████████| 10000/10000 [00:00<00:00, 54703.37it/s]


Finished conversion for BDD val labels.


### Now that the labels have been converted, let's move onto setting things up for training. First we import necessary modules and initialize some paths variables and constants for convenience.

In [4]:
import torch
import torchvision

from PIL import Image
from torch.utils.data import DataLoader
from torchmetrics.detection.mean_ap import MeanAveragePrecision
from torchvision.datasets import CocoDetection
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor, FasterRCNN_ResNet50_FPN_Weights
from torchvision.transforms import functional as F
from tqdm import tqdm

# Let's initialize our paths
# For training set
train_images_path = "/home/ghosh/content/data/images/train"
train_labels_coco_path = "/home/ghosh/content/data/labels_coco/train.json"
# For validation set
val_images_path = "/home/ghosh/content/data/images/val"
val_labels_coco_path = "/home/ghosh/content/data/labels_coco/val.json"

# And set some hyperparameters
BATCH_SIZE = 4 # Small batch size due to time and memory constraints
LEARNING_RATE = 0.001
WEIGHT_DECAY = 0.0005
MOMENTUM = 0.9
NUM_EPOCHS = 5 # Few epochs due to time and memory constraints

In [5]:
import wandb
import importlib

with open("/home/ghosh/content/wandb_api_key.txt", "r") as f:
    wandb_api_key = f.read().strip()

wandb.login(key=wandb_api_key)
wandb.init(
    project="bosch-assignment",
    name="faster-rcnn-train",
    config={
        "learning_rate": LEARNING_RATE,
        "weight_decay": WEIGHT_DECAY,
        "momentum": MOMENTUM,
        "batch_size": BATCH_SIZE,
        "num_epochs": NUM_EPOCHS
    },
    reinit=True
)

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/ghosh/.netrc
[34m[1mwandb[0m: Currently logged in as: [33manik-ghosh[0m ([33manik-ghosh-rwth-aachen-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Now that wandb has been set up, lets define a function to create a data loader for our train and test splits

In [6]:
# Create transform for dataloader
class COCOTransform:
    def __call__(self, image, target):
        # Convert the PIL image to a tensor
        image_tensor = F.to_tensor(image)

        # Extract labels and correspondingbounding boxes
        labels = []
        bounding_boxes = []
        for obj in target:
            x, y, w, h = obj["bbox"]
            labels.append(obj["category_id"])
            bounding_boxes.append([x, y, x + w, y + h])

        # Return image as a tensor, and dict of labels and bounding boxes
        return image_tensor, {
            "boxes": torch.tensor(bounding_boxes, dtype=torch.float32),
            "labels": torch.tensor(labels, dtype=torch.int64)
        }

In [7]:
# Function with lots of boilerplate for creating
# dataloaders for training and validation set
def get_train_and_val_dataloaders(
        train_images_path,
        train_labels_coco_path,
        val_images_path,
        val_labels_coco_path,
        batch_size=4
    ):
    train_dataset = CocoDetection(
        root=train_images_path,
        annFile=train_labels_coco_path,
        transforms=COCOTransform()
    )

    val_dataset = CocoDetection(
        root=val_images_path,
        annFile=val_labels_coco_path,
        transforms=COCOTransform()
    )

    train_loader = DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=2,
        collate_fn=lambda batch: tuple(zip(*batch))
    )

    val_loader = DataLoader(
        val_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=2,
        collate_fn=lambda batch: tuple(zip(*batch))
    )

    return train_loader, val_loader

Then we create a function to setup our model, which is Faster R-CNN with a ResNet-50 backbone + Feature Pyramid Network

In [8]:
def setup_model(device):
    # For this assignment, I picked Faster R-CNN with ResNet50 + FPN
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
        weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT
    )
    # Setting final MLP size
    model.roi_heads.box_predictor = FastRCNNPredictor(
        model.roi_heads.box_predictor.cls_score.in_features,
        11 # Since we have 10 classes in BDD100k + 1 for background
    )
    # Use CPU if GPU not available
    model.to(device)

    return model


def setup_optimizer_and_scheduler(model):
    # References of the model weights that need to be updated during training
    model_weights = [
        param for param in model.parameters()
        if param.requires_grad
    ]

    # Stochastic Gradient Descent with Momentum and Weight Decay
    optimizer = torch.optim.SGD(
        params=model_weights,
        lr=LEARNING_RATE,
        weight_decay=WEIGHT_DECAY,
        momentum=MOMENTUM
    )

    # To dynamically adjust learning rate for more stable training
    lr_scheduler = torch.optim.lr_scheduler.StepLR(
        optimizer,
        step_size=3,
        gamma=0.1
    )

    return optimizer, lr_scheduler

In each epoch, the images and labels will be moved to the same device as the model, and then the output of the model will be computed for the images, which will then be compared to the ground truth to compute the loss. 

Based on the loss, the gradients will be computed, and the weights of the model will be updated using Stochastic Gradient Descent


In [None]:
def train_one_epoch(model, optimizer, train_loader, device, epoch):
    model.train()
    epoch_loss = 0

    for images, targets in tqdm(train_loader, desc=f"Epoch {epoch}"):
        images = [image.to(device) for image in images]
        targets = [
            {
                key: value.to(device)
                for key, value in target.items()
            }
            for target in targets
        ]

        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())

        # Set any accumulated gradients to zero as a just-in-case
        optimizer.zero_grad()
        # Backpropagate
        losses.backward()
        # Update weights with backpropagated gradients
        optimizer.step()

        epoch_loss += losses.item()

    # Print and log to wandb
    print(f"Epoch {epoch} loss: {epoch_loss / len(train_loader)}")
    wandb.log({"train_loss": epoch_loss / len(train_loader)}, step=epoch)

    # Save weights at the end of each epoch to load later for evaluation
    local_weights_path = f"/home/ghosh/content/model_weights/faster_rcnn_weights_ep{epoch}.pth"
    torch.save(model.state_dict(), local_weights_path)

Evaluation function.  

Although this is part of the third task, I just wanted to show that I know how to use a validation set. 

I won't be tuning hyperparameters, but I feel that it is necessary to demonstrate the model's performance at the end of the epoch on an unseen dataset. Using torch.no_grad() ensures that the weights aren't polluted by the validation set.



In [None]:
def evaluate_on_validation_set(model, val_loader, device, epoch):
    model.eval()
    metric = MeanAveragePrecision()

    with torch.no_grad():
        for images, targets in val_loader:
            images = [image.to(device) for image in images]
            targets = [
                {
                    key: value.to(device)
                    for key, value in target.items()
                }
                for target in targets
            ]
            outputs = model(images)
            metric.update(outputs, targets)

    result = metric.compute()
    output_string = (
        f"Validation results for epoch {epoch}: mAP = {result['map'].item()} | "
        f"mAP@0.5 = {result['map_50'].item()} | "
        f"mAP@0.75 = {result['map_75'].item()} | "
    )

    # Print and log to wandb
    print(output_string)
    wandb.log({
        "val_mAP": result["map"].item(),
        "val_mAP@0.5": result["map_50"].item(),
        "val_mAP@0.75": result["map_75"].item(),
    })

And now for the final block: the training loop! 

Several variables and constants being used here have been defined in the imports block above


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader = get_train_and_val_dataloaders(
    train_images_path,
    train_labels_coco_path,
    val_images_path,
    val_labels_coco_path,
    BATCH_SIZE
)
model = setup_model(device)
optimizer, lr_scheduler = setup_optimizer_and_scheduler(model)

for epoch in range(NUM_EPOCHS):
    train_one_epoch(model, optimizer, train_loader, device, epoch+1)
    evaluate_on_validation_set(model, val_loader, device, epoch+1)
    lr_scheduler.step()

loading annotations into memory...
Done (t=5.62s)
creating index...
index created!
loading annotations into memory...
Done (t=1.28s)
creating index...
index created!


Epoch 1: 100%|██████████| 17466/17466 [2:06:02<00:00,  2.31it/s]  


Epoch 1 loss: 0.8207084674148861
Validation results for epoch 1: mAP = 0.21971940994262695 | mAP@0.5 = 0.42661821842193604 | mAP@0.75 = 0.19748111069202423 | 


Epoch 2: 100%|██████████| 17466/17466 [2:03:49<00:00,  2.35it/s] 


Epoch 2 loss: 0.7637807072995455
Validation results for epoch 2: mAP = 0.22688241302967072 | mAP@0.5 = 0.43382614850997925 | mAP@0.75 = 0.2058487832546234 | 


Epoch 3: 100%|██████████| 17466/17466 [2:03:50<00:00,  2.35it/s] 


Epoch 3 loss: 0.747370448849475
Validation results for epoch 3: mAP = 0.2351963371038437 | mAP@0.5 = 0.44903817772865295 | mAP@0.75 = 0.210839182138443 | 


Epoch 4: 100%|██████████| 17466/17466 [2:03:52<00:00,  2.35it/s] 


Epoch 4 loss: 0.714674378115738
Validation results for epoch 4: mAP = 0.24171017110347748 | mAP@0.5 = 0.45656388998031616 | mAP@0.75 = 0.22230303287506104 | 


Epoch 5: 100%|██████████| 17466/17466 [2:03:45<00:00,  2.35it/s] 


Epoch 5 loss: 0.7095464302251856
Validation results for epoch 5: mAP = 0.24161143600940704 | mAP@0.5 = 0.4578508734703064 | mAP@0.75 = 0.22035889327526093 | 
