# Introduction to autonomous vehicles - Task 1 Perception

## Task Description

Select a Dataset of road traffic containing annotated objects. 
Please select a method and use this dataset to train a model to detect and classify **Pedestrians, Cyclists and Vehicles** on a Video Clip. 
The video clip could be from the dataset or from other sources.

**Available Datasets:**

- [KITTI Dataset](https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark)
- [COCO Dataset](https://cocodataset.org/)
- [Waymo Dataset](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files)
  - **Note:** You might have to first register [here](https://waymo.com/open/) to get access to the dataset.

**Rules:**

- Use whatever framework you prefer (Pytroch, Tensorflow, ultralytics, etc.)
- Recommended to use any version of YOLO
- Use the code from the GitHub repository of the previous mentioned or other published methods (e.g. https://github.com/ultralytics/yolov3)
- Use pretrained weights
  - **Note:** You can use the pretrained weights, but you have to train and adapt them to your dataset.

## Setup

**Requirements:**

You should set up the environment accordingly. For now you can use the below code to install the required packages.

In [None]:
!pip install torch torchvision torchaudio
!pip install ipykernel
!pip install jupyterlab
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install scikit-learn
!pip install seaborn
!pip install ultralytics
!pip install wandb

**Wandb settings:**

Add a file named `wandb-api-key.txt` with your wandb API key in it so that we can login to wandb.

```bash
echo "your_wandb_api_key" > wandb-api-key.txt
```

In [1]:
import wandb

# Read wandb API key from file
with open('../wandb-api-key.txt', 'r') as file:
    wandb.login(key=file.read().strip())

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/lukas-kurz/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mkurzlukas[0m ([33mkurzlukas-johannes-kepler-universit-t-linz[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


**Ultralytics settings:**

Ultralytics might have some leftover settings from another run or from the previous task. You can use the following code to reset the settings.

In [3]:
from ultralytics import settings
from ultralytics import YOLO
import os
from pathlib import Path


# Get current working directory from system
working_dir = os.getcwd()
settings['datasets_dir'] = Path(working_dir + '/../datasets').resolve().__str__()
settings['weights_dir'] = Path(working_dir + '/weights').resolve().__str__()
settings['runs_dir'] = Path(working_dir + '/runs').resolve().__str__()
settings['wandb'] = True

settings

{'settings_version': '0.0.6',
 'datasets_dir': '/home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/datasets',
 'weights_dir': '/home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/task1/weights',
 'runs_dir': '/home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/task1/runs',
 'uuid': 'f8376e8d76181ab673a86b3f562cfd88bbe4c72eeaa8b93b13c3c8c81b770686',
 'sync': True,
 'api_key': '',
 'openai_api_key': '',
 'clearml': True,
 'comet': True,
 'dvc': True,
 'hub': True,
 'mlflow': True,
 'neptune': True,
 'raytune': True,
 'tensorboard': True,
 'wandb': True,
 'vscode_msg': True}

## Dataset

For our first project we chose the provided **COCO Dataset**

It is an open dataset for region segmentation, that is hosted as a challenge. As such, **the test set annotations are not available**.

The COCO dataset has 80 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment. For our task we will not need all of these, but luckily with the implementation of ultralytics we can select a subset of these to train on.

Concretely we will group them as:
### Pedestrians
* Person
* Dog
* Cow
* Horse
* Cat

### Vehicle
* Motorcycle
* Bus
* Car
* Truck
* Train

### Cyclist
* Bicycle



TODO: visualize some of the stats of the dataset, as well as some example images with annotations

## Training

For a first attempt, we can use ultralytics' implementation of a training without any augmentations, which we can add later to check and compare results.

TODO: actually run and train the model with parameters setting like in: https://docs.ultralytics.com/modes/train/

In [5]:
# Load a model
model = YOLO("yolo11n.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="coco128.yaml", epochs=2)

Ultralytics 8.3.86 🚀 Python-3.11.11 torch-2.6.0+cu124 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11143MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolo11n.pt, data=coco128.yaml, epochs=2, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train7, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf

100%|██████████| 6.66M/6.66M [00:00<00:00, 73.9MB/s]
Unzipping /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/datasets/coco128.zip to /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/datasets/coco128...: 100%|██████████| 263/263 [00:00<00:00, 5734.12file/s]

Dataset download success ✅ (1.1s), saved to [1m/home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/datasets[0m


                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      6640  ultralytics.nn.modules.block.C3k2            [32, 64, 1, False, 0.25]      
  3                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
  4                  -1  1     26080  ultralytics.nn.modules.block.C3k2            [64, 128, 1, False, 0.25]     
  5                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
  6                  -1  1     87040  ultralytics.nn.modules.




 16                  -1  1     32096  ultralytics.nn.modules.block.C3k2            [256, 64, 1, False]           
 17                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
 18            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 19                  -1  1     86720  ultralytics.nn.modules.block.C3k2            [192, 128, 1, False]          
 20                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 21            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 22                  -1  1    378880  ultralytics.nn.modules.block.C3k2            [384, 256, 1, True]           
 23        [16, 19, 22]  1    464912  ultralytics.nn.modules.head.Detect           [80, [64, 128, 256]]          
YOLO11n summary: 181 layers, 2,624,080 parameters, 2,624,064 gradients, 6.6 GFLOPs

Tran

Freezing layer 'model.23.dfl.conv.weight'
[34m[1mAMP: [0mrunning Automatic Mixed Precision (AMP) checks...
[34m[1mAMP: [0mchecks passed ✅


[34m[1mtrain: [0mScanning /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/datasets/coco128/labels/train2017... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<00:00, 2508.91it/s]

[34m[1mtrain: [0mNew cache created: /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/datasets/coco128/labels/train2017.cache



[34m[1mval: [0mScanning /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/datasets/coco128/labels/train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]


Plotting labels to /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/task1/runs/detect/train7/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.000119, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1m/home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/task1/runs/detect/train7[0m
Starting training for 2 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/2      2.63G      1.219      1.587      1.271        217        640: 100%|██████████| 8/8 [00:03<00:00,  2.60it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 4/4 [00:00<00:00,  4.04it/s]

                   all        128        929      0.681      0.598      0.682      0.514






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        2/2      2.66G      1.203      1.379      1.232        218        640: 100%|██████████| 8/8 [00:02<00:00,  3.25it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 4/4 [00:00<00:00,  4.21it/s]

                   all        128        929      0.696      0.594      0.685      0.519






2 epochs completed in 0.003 hours.
Optimizer stripped from /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/task1/runs/detect/train7/weights/last.pt, 5.5MB
Optimizer stripped from /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/task1/runs/detect/train7/weights/best.pt, 5.5MB

Validating /home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/task1/runs/detect/train7/weights/best.pt...
Ultralytics 8.3.86 🚀 Python-3.11.11 torch-2.6.0+cu124 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11143MiB)
YOLO11n summary (fused): 100 layers, 2,616,248 parameters, 0 gradients, 6.5 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 4/4 [00:01<00:00,  2.07it/s]


                   all        128        929      0.709      0.592      0.685      0.519
                person         61        254      0.819      0.677      0.782      0.544
               bicycle          3          6       0.47      0.167      0.394      0.252
                   car         12         46      0.733      0.196      0.248      0.169
            motorcycle          4          5      0.723          1      0.995      0.789
              airplane          5          6      0.838      0.833      0.955      0.854
                   bus          5          7      0.779      0.714      0.727      0.659
                 train          3          3      0.735      0.939       0.83      0.714
                 truck          5         12      0.538       0.25      0.391      0.242
                  boat          2          6       0.89        0.5      0.637      0.398
         traffic light          4         14      0.513      0.143      0.242      0.166
             stop sig

0,1
lr/pg0,▁█
lr/pg1,▁█
lr/pg2,▁█
metrics/mAP50(B),▁█
metrics/mAP50-95(B),▁█
metrics/precision(B),▁█
metrics/recall(B),█▁
model/GFLOPs,▁
model/parameters,▁
model/speed_PyTorch(ms),▁

0,1
lr/pg0,1e-05
lr/pg1,1e-05
lr/pg2,1e-05
metrics/mAP50(B),0.68465
metrics/mAP50-95(B),0.51857
metrics/precision(B),0.70869
metrics/recall(B),0.59221
model/GFLOPs,6.614
model/parameters,2624080.0
model/speed_PyTorch(ms),2.889


## Evaluation

TODO evaluate the model, consult this website.

https://docs.ultralytics.com/modes/val/

In [None]:
# Validate the model
metrics = model.val()  # no arguments needed, dataset and settings remembered
metrics.box.map  # map50-95
metrics.box.map50  # map50
metrics.box.map75  # map75
metrics.box.maps  # a list contains map50-95 of each category

## Video Prediction

For our task, we should create an annotate result from some video. With this, we should use a video of road traffic, which we then annotate with our trained model, image by image, comparing different trained models.

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n.pt")  # pretrained YOLO11n model

# Run batched inference on a list of images
results = model(["image1.jpg", "image2.jpg"])  # return a list of Results objects

# Process results list
for result in results:
    boxes = result.boxes  # Boxes object for bounding box outputs
    masks = result.masks  # Masks object for segmentation masks outputs
    keypoints = result.keypoints  # Keypoints object for pose outputs
    probs = result.probs  # Probs object for classification outputs
    obb = result.obb  # Oriented boxes object for OBB outputs
    result.show()  # display to screen
    result.save(filename="result.jpg")  # save to disk