# Object Detection & Co.


Download data from: https://drive.google.com/file/d/1JNnpaIf8fnlo5vum-D5ZpeRVFLyPIz-z/view?usp=sharing

![](images/tasks.png)

## Basic Idea

![](images/arch.png)


## YOLO (**You Only <s>Live</s> Look Once**)

YOLO is a series of very popular **real-time** object detection systems with very good performance.

![](./images/yolo-arch.png)

## (Partial) History of YOLO algorithm

| Version           | Year | Authors / Organization                    | New Publication or Refinement | Notes                                                                       |                                                                                                               |
| ----------------- | ---- | ----------------------------------------- | ----------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| YOLOv1            | 2016 | Joseph Redmon et al.                      | New Publication               | Introduced unified real-time object detection.                              |                                                                                                               |
| YOLOv2 (YOLO9000) | 2017 | Joseph Redmon, Ali Farhadi                | New Publication               | Improved accuracy and speed; trained jointly on classification + detection. |                                                                                                               |
| YOLOv3            | 2018 | Joseph Redmon, Ali Farhadi                | New Publication               | Better performance with multi-scale predictions and deeper network.         |                                                                                                               |
| YOLOv4            | 2020 | Alexey Bochkovskiy et al.                 | Refinement                    | Based on Darknet; used bag-of-freebies and bag-of-specials techniques.      |                                                                                                               |
| YOLOv5            | 2020 | Ultralytics                               | Refinement (Unofficial fork)  | Not published in peer-reviewed form; rewritten in PyTorch.                  |                                                                                                               |
| YOLOv6            | 2022 | Meituan (open-sourced by them)            | New Publication               | Focused on industrial applications, speed-accuracy trade-off.               |                                                                                                               |
| YOLOv7            | 2022 | Chien-Yao Wang et al. (Wong Kin Yiu team) | New Publication               | Introduced new training methods and model architectures.                    |                                                                                                               |
| YOLOv8            | 2023 | Ultralytics                               | Refinement                    | Not a formal publication; introduced improved PyTorch implementation.       |                                                                                                               |
| YOLOv9            | 2024 | Chien-Yao Wang et al. (CVPR 2024 paper)   | New Publication               | Introduced Generalized ELAN and DyHead; state-of-the-art performance.       |                                                                                                               |
| YOLOv10           | 2024 | Ao Wang et al. (NeurIPS 2024 paper)       | New Publication               | Proposed NMS-free training and holistic efficiency-accuracy design.         |                                                                                                               |
| YOLOv11           | 2024 | Rahima Khanam, Muhammad Hussain           | New Publication               | Introduced C3k2, SPPF, and C2PSA components; expanded capabilities.         |                                                                                                               |
| YOLOv12           | 2025 | Yunjie Tian, Qixiang Ye, David Doermann   | New Publication               | Attention-centric design with competitive speed and accuracy.               |




## Ultralytics YOLO11

We will use the very popular library Ultralytics due to its easy of use. It is important to know that there is some controversy about the licensing, read [this](https://www.reddit.com/r/computervision/comments/1e3uxro/ultralytics_new_agpl30_license_exploiting/) if interested. Before using it for commercial projects or for publications you should definitely read the license.

Have a look at the [documentation](https://docs.ultralytics.com/usage/python/) to get started.

In [None]:
%pip install -q ultralytics opencv-python

In [None]:
from ultralytics import YOLO

In [None]:
from IPython.display import display
from PIL import Image
import cv2

def show_img(input,convert_to_rgb=True):
    # Internally ultralitics works with OpenCV so we need to convert to RGB
    if convert_to_rgb:
        input = cv2.cvtColor(input, cv2.COLOR_BGR2RGB)
    
    img = Image.fromarray(input)

    display(img)


In [None]:
model = YOLO('models/yolo11n.pt')

results = model('./data/pedestrian.jpg')

# otherwise
# results = model.predict(source='./data/pedestrian.jpg')

print(f'Found {len(results)} results')

# Process results list

for result in results:
    #print(result.names)
    probs = result.probs  # Probs object for classification outputs
    #print(probs)

    # This is always present
    boxes = result.boxes  # Boxes object for bounding box outputs
    #print('Boxes')
    #print(boxes)

    masks = result.masks  # Masks object for segmentation masks outputs
    #print('Masks')
    #print(masks)

    keypoints = result.keypoints  # Keypoints object for pose outputs
    #print('Keypoints')
    #print(keypoints)

    obb = result.obb  # Oriented boxes object for OBB outputs
    #print('OBB')
    #print(obb)

    img = result.plot() # plot to img object
    show_img(img)

    #result.show()  # display to screen
    #result.save(filename="result.jpg")  # save to disk

In [None]:
def task2model_name(task):
    if task:
        task = '-' + task
    else:
        task = ''
        
    return f'models/yolo11n{task}.pt'


def yolo_img(input, task=None):
    
    model_name = task2model_name(task)
    print(model_name)
    model = YOLO(model_name)

    res = model(input)

    show_img(res[0].plot())

tasks = [None,'seg','pose','obb']

for task in tasks:
    
    res = yolo_img('./data/pedestrian.jpg', task=task)
    


# yolo_img('./data/port.png', task='obb')



    

#### Exercise: YOLO Fine-tuning
- Use YOLO-seg to do tumor segmentation on BrainMRI dataset we used for image segmentation. Since YOLO-seg expects also the bounding box you need to generate bounding boxes from segmentation masks.
- Add more classes to YOLO (for example pencil/pen)

See ultralytics documentations on how to organize the dataset and how to train the models.

### Working with videos

`ultralytics` directly support inference on video data, but we will do it ourselves using OpenCV.

In [None]:
from IPython.display import Video

Video("./data/dancing_1.mp4", embed=True)

In [None]:
Video("./data/tennis_1.mp4", embed=True)

In [None]:
import cv2
from IPython.display import clear_output

def yolo_video(task, input, output=None,show=True, device='cpu'):

    model_name = task2model_name(task)
    print(model_name)
    model = YOLO(model_name)
    
    cap = cv2.VideoCapture(input)

    width  = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = cap.get(cv2.CAP_PROP_FPS)

    if output:
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(output, fourcc, fps, (width, height))

    while cap.isOpened():
   
        ret, frame = cap.read()
        if not ret:
            break

        # Run YOLO inference
        results = model.predict(source=frame, stream=True, task=task, conf=0.5, device=device)
    
        #annotated = results[0].plot() # this for streaming=False
        for r in results:
            annotated = r.plot()  # Returns annotated frame

        # Convert BGR to RGB for display
        rgb_frame = cv2.cvtColor(annotated, cv2.COLOR_BGR2RGB)
        img = Image.fromarray(rgb_frame)

        # Display in notebook
        clear_output(wait=True)
        display(img)
        
        if output:
            out.write(annotated)

    
    cap.release()

    if output:
        out.release()

    
    


In [None]:
yolo_video('seg',"./data/tennis_1.mp4")

In [None]:
yolo_video('pose',"./data/tennis_1.mp4","./results/tennis_out.mp4")

In [None]:
yolo_video('pose',"./data/dancing_1.mp4")

In [None]:
yolo_video('pose',0)

#### Copyright Disclaimer

This notebook uses parts of the following videos:
- [The Ten Rules of Techno (performance)](https://www.youtube.com/watch?v=jm-rMP9bvbU) by **Underdog Electronic Music School** 
- [Practicing With 12 UT**R - How Do College Tennis Players Train?](https://www.youtube.com/watch?v=HTpcVloOwbQ) by **MPTennis**