# Introduction to autonomous vehicles - Task 1 Perception

## Task Description

Select a Dataset of road traffic containing annotated objects. 
Please select a method and use this dataset to train a model to detect and classify **Pedestrians, Cyclists and Vehicles** on a Video Clip. 
The video clip could be from the dataset or from other sources.

**Available Datasets:**

- [KITTI Dataset](https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark)
- [COCO Dataset](https://cocodataset.org/)
- [Waymo Dataset](https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_1_2_0_individual_files)
  - **Note:** You might have to first register [here](https://waymo.com/open/) to get access to the dataset.

**Rules:**

- Use whatever framework you prefer (Pytroch, Tensorflow, ultralytics, etc.)
- Recommended to use any version of YOLO
- Use the code from the GitHub repository of the previous mentioned or other published methods (e.g. https://github.com/ultralytics/yolov3)
- Use pretrained weights
  - **Note:** You can use the pretrained weights, but you have to train and adapt them to your dataset.

## Setup

**Requirements:**

You should set up the environment accordingly. For now you can use the below code to install the required packages.

In [1]:
!pip install torch torchvision torchaudio
!pip install ipykernel
!pip install jupyterlab
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install scikit-learn
!pip install seaborn
!pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.3.85-py3-none-any.whl.metadata (35 kB)
Collecting numpy<=2.1.1,>=1.23.0 (from ultralytics)
  Downloading numpy-2.1.1-cp311-cp311-macosx_14_0_arm64.whl.metadata (60 kB)
Collecting opencv-python>=4.6.0 (from ultralytics)
  Using cached opencv_python-4.11.0.86-cp37-abi3-macosx_13_0_arm64.whl.metadata (20 kB)
Collecting tqdm>=4.64.0 (from ultralytics)
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting py-cpuinfo (from ultralytics)
  Using cached py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Downloading ultralytics-8.3.85-py3-none-any.whl (922 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m922.2/922.2 kB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-2.1.1-cp311-cp311-macosx_14_0_arm64.whl (5.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Ultralytics settings:**

Ultralytics might have some leftover settings from another run or from the previous task. You can use the following code to reset the settings.

In [22]:
from ultralytics import settings
from ultralytics import YOLO
import os
from pathlib import Path

# Get current working directory from system
working_dir = os.getcwd()
settings['datasets_dir'] = Path(working_dir + '/../datasets').resolve().__str__()
settings['weights_dir'] = Path(working_dir + '/weights').resolve().__str__()
settings['runs_dir'] = Path(working_dir + '/runs').resolve().__str__()

settings

{'settings_version': '0.0.6',
 'datasets_dir': '/Users/lukaskurz/University/intro_to_autonomous_vehicles/datasets',
 'weights_dir': '/Users/lukaskurz/University/intro_to_autonomous_vehicles/task1/weights',
 'runs_dir': '/Users/lukaskurz/University/intro_to_autonomous_vehicles/task1/runs',
 'uuid': '5bf14b72f2999298c6afb0e09dcf80db7e70e2569c9e343b0f8bd5cfc28d1d70',
 'sync': True,
 'api_key': '',
 'openai_api_key': '',
 'clearml': True,
 'comet': True,
 'dvc': True,
 'hub': True,
 'mlflow': True,
 'neptune': True,
 'raytune': True,
 'tensorboard': True,
 'wandb': False,
 'vscode_msg': True}

## Dataset

For our first project we chose the provided **COCO Dataset**

It is an open dataset for region segmentation, that is hosted as a challenge. As such, **the test set annotations are not available**.

The COCO dataset has 80 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment. For our task we will not need all of these, but luckily with the implementation of ultralytics we can select a subset of these to train on.

Concretely we will group them as:
### Pedestrians
* Person
* Dog
* Cow
* Horse
* Cat

### Vehicle
* Motorcycle
* Bus
* Car
* Truck
* Train

### Cyclist
* Bicycle



TODO: visualize some of the stats of the dataset, as well as some example images with annotations

In [None]:
# Download the coco dataset either using the inhbuilt download from ultralytics and try to access it or just run a model train() to download the dataset in case its missing

New https://pypi.org/project/ultralytics/8.3.86 available 😃 Update with 'pip install -U ultralytics'
Ultralytics 8.3.85 🚀 Python-3.11.11 torch-2.6.0 CPU (Apple M1 Max)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolo11n.pt, data=coco.yaml, epochs=0, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train14, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=Fa

[34m[1mtrain: [0mScanning /Users/lukaskurz/University/intro_to_autonomous_vehicles/datasets/coco/labels/train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 [00:00<?, ?it/s]
[34m[1mval: [0mScanning /Users/lukaskurz/University/intro_to_autonomous_vehicles/datasets/coco/labels/val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:00<?, ?it/s]

Plotting labels to /Users/lukaskurz/University/intro_to_autonomous_vehicles/task1/runs/detect/train14/labels.jpg... 





[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m SGD(lr=0.01, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1m/Users/lukaskurz/University/intro_to_autonomous_vehicles/task1/runs/detect/train14[0m
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      1/100         0G     0.8654      1.594      1.101        186        640:   0%|          | 1/7393 [00:12<26:32:04, 12.92s/it]


KeyboardInterrupt: 

## Training

For a first attempt, we can use ultralytics' implementation of a training without any augmentations, which we can add later to check and compare results.

TODO: actually run and train the model with parameters setting like in: https://docs.ultralytics.com/modes/train/

In [20]:
# Load a model
model = YOLO("yolo11n.pt")  # load a pretrained model (recommended for training)

# Train the model
results = model.train(data="coco.yaml", epochs=2, imgsz=160, device="mps")

Ultralytics 8.3.85 🚀 Python-3.11.11 torch-2.6.0 MPS (Apple M1 Max)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolo11n.pt, data=coco.yaml, epochs=2, time=None, patience=100, batch=16, imgsz=160, save=True, save_period=-1, cache=False, device=mps, workers=8, project=None, name=train13, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=N

[34m[1mtrain: [0mScanning /Users/lukaskurz/University/intro_to_autonomous_vehicles/datasets/coco/labels/train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 [00:00<?, ?it/s]
[34m[1mval: [0mScanning /Users/lukaskurz/University/intro_to_autonomous_vehicles/datasets/coco/labels/val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:00<?, ?it/s]

Plotting labels to /Users/lukaskurz/University/intro_to_autonomous_vehicles/task1/runs/detect/train13/labels.jpg... 





[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.000119, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
Image sizes 160 train, 160 val
Using 0 dataloader workers
Logging results to [1m/Users/lukaskurz/University/intro_to_autonomous_vehicles/task1/runs/detect/train13[0m
Starting training for 2 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/2     0.565G      1.982      4.821      1.501        149        160:   2%|▏         | 112/7393 [00:47<51:38,  2.35it/s]


KeyboardInterrupt: 

## Evaluation

TODO evaluate the model, consult this website.

https://docs.ultralytics.com/modes/val/

In [21]:
# Validate the model
metrics = model.val()  # no arguments needed, dataset and settings remembered
metrics.box.map  # map50-95
metrics.box.map50  # map50
metrics.box.map75  # map75
metrics.box.maps  # a list contains map50-95 of each category

Ultralytics 8.3.85 🚀 Python-3.11.11 torch-2.6.0 CPU (Apple M1 Max)
YOLO11n summary (fused): 100 layers, 2,616,248 parameters, 31,920 gradients, 6.5 GFLOPs


FileNotFoundError: '/usr/src/ultralytics/ultralytics/cfg/datasets/coco.yaml' does not exist

## Video Prediction

For our task, we should create an annotate result from some video. With this, we should use a video of road traffic, which we then annotate with our trained model, image by image, comparing different trained models.

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n.pt")  # pretrained YOLO11n model

# Run batched inference on a list of images
results = model(["image1.jpg", "image2.jpg"])  # return a list of Results objects

# Process results list
for result in results:
    boxes = result.boxes  # Boxes object for bounding box outputs
    masks = result.masks  # Masks object for segmentation masks outputs
    keypoints = result.keypoints  # Keypoints object for pose outputs
    probs = result.probs  # Probs object for classification outputs
    obb = result.obb  # Oriented boxes object for OBB outputs
    result.show()  # display to screen
    result.save(filename="result.jpg")  # save to disk