# Introduction to autonomous vehicles - Task 1 Perception

## Task Description

Select a Dataset of road traffic containing annotated objects. 
Please select a method and use this dataset to train a model to detect and classify **Pedestrians, Cyclists and Vehicles** on a Video Clip. 
The video clip could be from the dataset or from other sources.

**Available Datasets:**

- [KITTI Dataset](https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark)
- [COCO Dataset](https://cocodataset.org/)
- [Waymo Dataset](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files)
  - **Note:** You might have to first register [here](https://waymo.com/open/) to get access to the dataset.

**Rules:**

- Use whatever framework you prefer (Pytroch, Tensorflow, ultralytics, etc.)
- Recommended to use any version of YOLO
- Use the code from the GitHub repository of the previous mentioned or other published methods (e.g. https://github.com/ultralytics/yolov3)
- Use pretrained weights
  - **Note:** You can use the pretrained weights, but you have to train and adapt them to your dataset.

## Setup

**Requirements:**

You should set up the environment accordingly. For now you can use the below code to install the required packages.

In [2]:
!pip install torch torchvision torchaudio
!pip install ipykernel
!pip install jupyterlab
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install scikit-learn
!pip install seaborn
!pip install ultralytics
!pip install wandb



**Wandb settings:**

Add a file named `wandb-api-key.txt` with your wandb API key in it so that we can login to wandb.

```bash
echo "your_wandb_api_key" > wandb-api-key.txt
```

In [1]:
import wandb

# Read wandb API key from file
with open('wandb-api-key.txt', 'r') as file:
    wandb.login(key=file.read().strip())

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/lukas-kurz/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mkurzlukas[0m ([33mkurzlukas-johannes-kepler-universit-t-linz[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


**Ultralytics settings:**

Ultralytics might have some leftover settings from another run or from the previous task. You can use the following code to reset the settings.

In [2]:
from ultralytics import settings
from ultralytics import YOLO
import os
from pathlib import Path


# Get current working directory from system
working_dir = os.getcwd()
settings['datasets_dir'] = Path(working_dir + '/../datasets').resolve().__str__()
settings['weights_dir'] = Path(working_dir + '/weights').resolve().__str__()
settings['runs_dir'] = Path(working_dir + '/runs').resolve().__str__()
settings['wandb'] = True

settings

{'settings_version': '0.0.6',
 'datasets_dir': '/home/lukas-kurz/Desktop/autonomous_driving/datasets',
 'weights_dir': '/home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/weights',
 'runs_dir': '/home/lukas-kurz/Desktop/autonomous_driving/intro_to_autonomous_vehicles_2025/runs',
 'uuid': 'f8376e8d76181ab673a86b3f562cfd88bbe4c72eeaa8b93b13c3c8c81b770686',
 'sync': True,
 'api_key': '',
 'openai_api_key': '',
 'clearml': True,
 'comet': True,
 'dvc': True,
 'hub': True,
 'mlflow': True,
 'neptune': True,
 'raytune': True,
 'tensorboard': True,
 'wandb': True,
 'vscode_msg': True}

## Dataset

For our first project we chose the provided **COCO Dataset**

It is an open dataset for region segmentation, that is hosted as a challenge. As such, **the test set annotations are not available**.

The COCO dataset has 80 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment. For our task we will not need all of these, but luckily with the implementation of ultralytics we can select a subset of these to train on.

Concretely we will group them as:
### Pedestrians
* Person
* Dog
* Cow
* Horse
* Cat

### Vehicle
* Motorcycle
* Bus
* Car
* Truck
* Train

### Cyclist
* Bicycle



TODO: visualize some of the stats of the dataset, as well as some example images with annotations

In [None]:
# Download the coco dataset either using the inhbuilt download from ultralytics and try to access it or just run a model train() to download the dataset in case its missing

## Training

For a first attempt, we can use ultralytics' implementation of a training without any augmentations, which we can add later to check and compare results.

TODO: actually run and train the model with parameters setting like in: https://docs.ultralytics.com/modes/train/

In [3]:
# Load a model
model = YOLO("yolo11n.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="coco.yaml", epochs=2, project="intro_to_av", name="yolo11n")

Ultralytics 8.3.86 🚀 Python-3.11.11 torch-2.6.0+cu124 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11143MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolo11n.pt, data=coco.yaml, epochs=2, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=intro_to_av, name=yolo11n, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show

Freezing layer 'model.23.dfl.conv.weight'
[34m[1mAMP: [0mrunning Automatic Mixed Precision (AMP) checks...
[34m[1mAMP: [0mchecks passed ✅


[34m[1mtrain: [0mScanning /home/lukas-kurz/Desktop/autonomous_driving/datasets/coco/la[0m
IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



[34m[1mtrain: [0mNew cache created: /home/lukas-kurz/Desktop/autonomous_driving/datasets/coco/labels/train2017.cache


ValueError: not enough values to unpack (expected 3, got 0)

## Evaluation

TODO evaluate the model, consult this website.

https://docs.ultralytics.com/modes/val/

In [8]:
# Validate the model
metrics = model.val()  # no arguments needed, dataset and settings remembered
metrics.box.map  # map50-95
metrics.box.map50  # map50
metrics.box.map75  # map75
metrics.box.maps  # a list contains map50-95 of each category

Ultralytics 8.3.86 🚀 Python-3.11.11 torch-2.6.0+cu124 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11143MiB)
YOLO11n summary (fused): 100 layers, 2,616,248 parameters, 31,920 gradients, 6.5 GFLOPs


FileNotFoundError: '/usr/src/ultralytics/ultralytics/cfg/datasets/coco.yaml' does not exist

## Video Prediction

For our task, we should create an annotate result from some video. With this, we should use a video of road traffic, which we then annotate with our trained model, image by image, comparing different trained models.

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n.pt")  # pretrained YOLO11n model

# Run batched inference on a list of images
results = model(["image1.jpg", "image2.jpg"])  # return a list of Results objects

# Process results list
for result in results:
    boxes = result.boxes  # Boxes object for bounding box outputs
    masks = result.masks  # Masks object for segmentation masks outputs
    keypoints = result.keypoints  # Keypoints object for pose outputs
    probs = result.probs  # Probs object for classification outputs
    obb = result.obb  # Oriented boxes object for OBB outputs
    result.show()  # display to screen
    result.save(filename="result.jpg")  # save to disk