It is highly recommended to use a powerful **GPU**, you can use it for free uploading this notebook to [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb).
<table align="center">
 <td align="center"><a target="_blank" href="https://colab.research.google.com/github/ezponda/intro_deep_learning/blob/main/class/CNN/Object_Detection_YOLO_ultralytics.ipynb">
        <img src="https://colab.research.google.com/img/colab_favicon_256px.png"  width="50" height="50" style="padding-bottom:5px;" />Run in Google Colab</a></td>
  <td align="center"><a target="_blank" href="https://github.com/ezponda/intro_deep_learning/blob/main/class/CNN/Object_Detection_YOLO_ultralytics.ipynb">
        <img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png"  width="50" height="50" style="padding-bottom:5px;" />View Source on GitHub</a></td>
</table>

**Table of Contents**

* [Introduction to Ultralytics](#Introduction-to-Ultralytics)
* [Object Detection with Ultralytics using Pretrained YOLO V8 Models](#Object-Detection-with-Ultralytics-using-Pretrained-YOLO-V8-Models)
* [Non-Max Suppression (NMS) and Intersection Over Union (IoU)](#Non-Max-Suppression-(NMS)-and-Intersection-Over-Union-(IoU))
* [Webcam Local](#Web-cam-Local)
* [Different Vision Tasks](#YOLO-for-Different-Vision-Tasks)

# Introduction to [Ultralytics](https://www.ultralytics.com/)

Ultralytics is a company that offers a variety of AI-based solutions, including the popular YOLO (You Only Look Once) object detection models. Their YOLO models are known for their speed and accuracy, making them suitable for real-time object detection tasks.

The Ultralytics framework provides a convenient and powerful platform for training, evaluating, and deploying object detection models. It is built on top of PyTorch and supports various YOLO versions.

For more detailed information and resources on Ultralytics, you can visit the [Ultralytics official documentation](https://docs.ultralytics.com/).



You can install it with:

```python
%pip install torch torchvision torchaudio
%pip install ultralytics
%pip install opencv-python
```

In [None]:
#%pip install torch torchvision torchaudio
#%pip install ultralytics
#%pip install opencv-python
# For using the webcam:
#%pip install lap
#%pip install lapx

# Object Detection with Ultralytics using Pretrained YOLO Models


## Introduction to Object Detection

Object detection is a crucial area in computer vision, aiming to recognize and locate objects within images. It has vast applications in fields like autonomous driving, surveillance, and image retrieval.

## Introduction to YOLO (You Only Look Once)

YOLO is a state-of-the-art, real-time object detection system that applies a single neural network to the full image. This approach divides the image into regions and predicts bounding boxes and probabilities for each region. YOLO is known for its speed and accuracy, making it a popular choice for real-time applications.


YOLO (You Only Look Once) is a groundbreaking approach in object detection for its unique way of processing images. Traditional object detection methods apply the detection algorithm multiple times to different parts of the image, whereas YOLO applies a single neural network to the entire image. This network divides the image into a grid and predicts bounding boxes and class probabilities for each grid cell.

The advantages of YOLO include:
- **Speed**: By processing the entire image in one evaluation, YOLO significantly reduces computational load, enabling real-time detection.
- **Accuracy**: Despite its speed, YOLO achieves high accuracy, particularly in detecting small objects.



## Choosing the Pretrained YOLO Model

When selecting a pretrained model for your object detection tasks, it's essential to consider the balance between speed and accuracy. The YOLO (You Only Look Once) V8 models come in different sizes to cater to a variety of requirements, from real-time applications to more accuracy-intensive tasks. These models have been trained on the COCO dataset, which is a benchmark in the object detection field.

### [COCO (Common Objects in Context)](https://cocodataset.org/#home)

- **Dataset Overview**: The COCO dataset is a comprehensive collection for object detection, segmentation, and captioning. It features over 200,000 images and 80 object categories, making it one of the most diverse datasets available. This extensive variety allows for the development and testing of robust object detection models like YOLO V8.


### YOLO V8 Model Variants

The following table describes the different variants of the YOLO V8 model, providing a comparison based on size, accuracy, speed, and computational requirements:

| Model  | Size (pixels) | mAP<sub>val</sub> 50-95 | Speed (CPU ONNX, ms) | Speed (A100 TensorRT, ms) | Parameters (M) | FLOPs (B) |
|--------|---------------|------------------------|----------------------|---------------------------|----------------|-----------|
| yolov8n | 640          | 37.3                   | 80.4                 | 0.99                      | 3.2            | 8.7       |
| yolov8s | 640          | 44.9                   | 128.4                | 1.20                      | 11.2           | 28.6      |
| yolov8m | 640          | 50.2                   | 234.7                | 1.83                      | 25.9           | 78.9      |
| yolov8l | 640          | 52.9                   | 375.2                | 2.39                      | 43.7           | 165.2     |
| yolov8x | 640          | 53.9                   | 479.1                | 3.53                      | 68.2           | 257.8     |


### YOLO V9 Model Variants

The following table describes the different variants of the YOLO V9 model, providing a comparison based on size, accuracy, and computational requirements:

| Model   | Size (pixels) | AP<sub>val</sub> 50-95 | AP<sub>val</sub> 50 | AP<sub>val</sub> 75 | Parameters (M) | FLOPs (B) |
|---------|---------------|-----------------------|---------------------|---------------------|----------------|-----------|
| YOLOv9-S | 640          | 46.8                  | 63.4                | 50.7                | 7.2            | 26.7      |
| YOLOv9-M | 640          | 51.4                  | 68.1                | 56.1                | 20.1           | 76.8      |
| YOLOv9-C | 640          | 53.0                  | 70.2                | 57.8                | 25.5           | 102.8     |
| YOLOv9-E | 640          | 55.6                  | 72.8                | 60.6                | 58.1           | 192.5     |







- **Size (pixels)**: The input resolution for the model. All models use the same input resolution but differ in their internal architecture and complexity.
- **mAP<sub>val</sub> 50-95**: The mean Average Precision on the COCO validation dataset, covering IoU thresholds from 0.5 to 0.95. Higher values indicate better accuracy.
- **Speed (CPU ONNX, ms/A100 TensorRT, ms)**: Inference speed measured in milliseconds. Lower times indicate faster performance. The speed is provided for both CPU (using ONNX) and NVIDIA A100 GPU (using TensorRT).
- **Parameters (M)**: The number of trainable parameters in millions. More parameters typically mean a more complex model that can capture detailed features but may be slower and more memory-intensive.
- **FLOPs (B)**: Floating Point Operations per second in billions. This metric gives an idea of the computational demand of the model. Higher values indicate more computational complexity.

When choosing a model, consider the trade-off between speed and accuracy that best fits your application's requirements. Smaller models like YOLOv8n are faster but less accurate, suitable for real-time applications. In contrast, larger models like YOLOv8x provide higher accuracy at the cost of increased inference time, suitable for high-accuracy requirements.

In [None]:
from ultralytics import YOLO
import numpy as np

model = YOLO('yolov8n.pt')  # load a pretrained YOLOv8n detection, v9 model:  YOLO('yolov9c.pt')

In [None]:
help(model)

In [None]:
from matplotlib import pyplot as plt
import urllib
import numpy as np
import cv2
import time

url = 'https://akm-img-a-in.tosshub.com/indiatoday/images/story/201812/dogs_and_cats.jpeg?TAxD19DTCFE7WiSYLUdTu446cfW4AbuW&size=770:433'
image_path = "dog-cat.jpg"
urllib.request.urlretrieve(url, image_path)

# Read the image in color mode
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # Transform to RGB


plt.imshow(image)
plt.axis('off')
plt.show()

In [None]:
results = model(image)
print('Results:\n')
print(results)

In [None]:
results[0].names

In [None]:
print(results[0].boxes)

In [None]:
for r in results:
    # Extract the original and annotated images
    original_img = r.orig_img[..., ::-1]
    annotated_image_bgr = r.plot()  # BGR numpy array of predictions
    annotated_image_rgb = annotated_image_bgr[..., ::-1]  # Convert BGR to RGB
    
    # Plot the annotated image
    plt.figure(figsize=(9, 5))
    plt.imshow(annotated_image_rgb) # RGB PIL image
    plt.axis('off')
    plt.show()

In [None]:
import urllib
import cv2
from matplotlib import pyplot as plt

def download_images(image_urls, plot_images=False):
    """
    Downloads images from the given URLs, converts them to RGB format, and optionally plots them.
    
    Args:
    image_urls (list of tuples): A list where each tuple contains the image URL and the desired local file path.
    plot_images (bool): If True, the images will be plotted. Defaults to False.
    
    Returns:
    list: A list of paths where the images have been saved.
    """
    image_paths = []  # Store the local file paths of the images
    
    for image_url, image_path in image_urls:
        # Download the image from the URL and save it to the local file path
        urllib.request.urlretrieve(image_url, image_path)
        image_paths.append(image_path)  # Add the local file path to the list
        
        if plot_images:
            # Read the image in color mode
            image = cv2.imread(image_path, cv2.IMREAD_COLOR)
            # Convert the image from BGR to RGB format
            image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            
            # Plot the image
            plt.figure(figsize=(8, 4))
            plt.imshow(image_rgb)  # Display the image in RGB format
            plt.axis('off')  # Turn off axis labels and ticks
            plt.title(image_path)  # Set the title of the plot as the image path
            plt.show()  # Display the plot
    
    return image_paths


In [None]:
import ssl

ssl._create_default_https_context = ssl._create_unverified_context


In [None]:
image_urls = [
    ('https://i.ibb.co/R7pRTLy/beach-no-axis.png', 'beach.jpg'),
    ('https://i.ibb.co/jL1kZRF/phones.png', 'phones.jpg'),
    ('https://i.ytimg.com/vi/1ZupwFOhjl4/maxresdefault.jpg', 'traffic.jpg')
]

image_paths = download_images(image_urls, plot_images=True)

In [None]:
# Run the model
results = model(image_paths)

# Show results
for i, r in enumerate(results):
    # Extract the original and annotated images
    original_img = r.orig_img[..., ::-1]
    annotated_image_bgr = r.plot()  # BGR numpy array of predictions
    annotated_image_rgb = annotated_image_bgr[..., ::-1]  # Convert BGR to RGB
    
    # Plot the annotated image
    plt.figure(figsize=(10, 6))
    plt.imshow(annotated_image_rgb) # RGB PIL image
    plt.axis('off')
    plt.title(image_paths[i])
    plt.show()


### Compare different models

In [None]:
from typing import List
def compare_models(image_path: str, model_names: List[str], conf_threshold: float=0.25):
    """
    Simple function to compare different YOLO models on the same image.
    """
    
    # Read the image
    img = cv2.imread(image_path)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Set up plot
    fig, axs = plt.subplots(len(model_names), 1, figsize=(10, 5 * len(model_names)))
    if len(model_names) == 1:
        axs = [axs]  # Make it iterable if only one model
    
    # Print summary header
    print("Model Comparison Summary:")
    print("-" * 60)
    print(f"{'Model':<12} | {'Detections':<10} | {'Inference Time (ms)':<20}")
    print("-" * 60)
    
    # Process each model
    for i, model_name in enumerate(model_names):
        # Load model
        model = YOLO(model_name)
        
        # Measure inference time
        start_time = time.time()
        results = model(img_rgb, conf=conf_threshold)
        inference_time = (time.time() - start_time) * 1000  # ms
        
        # Get detection count
        num_detections = len(results[0].boxes)
        
        # Plot results
        axs[i].imshow(results[0].plot())
        axs[i].set_title(f"{model_name}: {num_detections} objects, {inference_time:.1f}ms", fontsize=12)
        axs[i].axis('off')
        
        # Print summary line
        print(f"{model_name:<12} | {num_detections:<10} | {inference_time:.1f}")
    
    plt.tight_layout()
    plt.show()

compare_models('traffic.jpg', ["yolov8n.pt", "yolov10n.pt", "yolo11n.pt", "yolo12l.pt"], conf_threshold=0.3)

### Non-Max Suppression (NMS) and Intersection Over Union (IoU) 

Object detection models, often detect multiple bounding boxes around the same object. This leads to redundant detections for the same object, which is undesirable. To resolve this, two key concepts are utilized: Intersection Over Union (IoU) and Non-Max Suppression (NMS). These techniques help refine the boxes surrounding detected objects, ensuring each object is identified accurately and uniquely.

#### Intersection Over Union (IoU)

IoU is a metric used to quantify the percent overlap between two bounding boxes. It is calculated by dividing the area of overlap between the two boxes by the area of their union:

\begin{equation*}
\text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}}
\end{equation*}

For object detection, IoU is utilized to determine how close a predicted bounding box is to the ground truth bounding box. During evaluation, a higher IoU represents a better prediction by the model.

However, in the context of running inference with a model, IoU is crucial for Non-Max Suppression.

#### Non-Max Suppression (NMS)

Non-Max Suppression is a technique to ensure that only the most probable bounding box for an object is preserved while all other redundant boxes are removed. Here’s how it generally works:

1. Select the box with the highest probability of object detection (confidence score).
2. Compute the IoU of this box with all other boxes. If any box has an IoU greater than a set threshold (typically between 0.5 and 0.7), it is suppressed (i.e., removed).
3. Repeat this process for all boxes until each detected object is represented by only one box.

NMS ensures that in cases where multiple boxes predict the same object, only the most accurate one is kept.

#### Usage in Ultralytics YOLO Inference

When you run inference using the Ultralytics YOLO model, you can control the behavior of NMS and IoU through the inference arguments:

- `conf`: This is the confidence threshold. Detections with a confidence score below this threshold are disregarded before NMS. By adjusting this value, you can filter out weak detections early. **(default 0.25)**
- `iou`: This is the IoU threshold for NMS. In areas where multiple bounding boxes overlap, if the overlap (IoU) is greater than this threshold, only the box with the highest confidence is kept. **(default 0.7)**

Here is how you can use these parameters in practice:

```python
from ultralytics import YOLO

# Load a pretrained YOLO model
model = YOLO('yolov8n.pt')

# Run inference with custom confidence and IoU thresholds
results = model.predict(image, conf=0.25, iou=0.7)
```

#### Question 1: Change confidence and IoU thresholds for detecting more objects

In [None]:
# Run the model
results = model(image_paths, conf=..., iou=...)

# Show results
for r in results:
    # Extract the original and annotated images
    original_img = r.orig_img[..., ::-1]
    annotated_image_bgr = r.plot()  # BGR numpy array of predictions
    annotated_image_rgb = annotated_image_bgr[..., ::-1]  # Convert BGR to RGB
    
    # Plot the annotated image
    plt.figure(figsize=(10, 6))
    plt.imshow(annotated_image_rgb) # RGB PIL image
    plt.axis('off')
    plt.show()

## Web cam Local

### Detection loop

The detection loop consists of four phases:

- Loading the webcam frame

- Running the image through the model

- Updating the output with the resulting predictions

In [None]:
from PIL import Image
from IPython.display import Image as IPyImage

cap = cv2.VideoCapture(0)
time.sleep(1)  ### letting the camera autofocus

axes = None
NUM_FRAMES = 50  # you can change this
processed_imgs = []
for i in range(NUM_FRAMES):
    # Load frame from the camera
    ret, frame = cap.read()
    
    # Run the model
    result = model(frame, verbose=False)
    annotated_image_bgr = result[0].plot()
    annotated_image_rgb = annotated_image_bgr[:,:, ::-1]  # Convert BGR to RGB
    
    img = Image.fromarray(np.uint8(annotated_image_rgb))
    processed_imgs.append(img)
    cv2.imshow("test", annotated_image_bgr)
    cv2.waitKey(1)

cap.release()
cv2.destroyAllWindows()

In [None]:
## create gif
processed_imgs[0].save('web_cam.gif',
                       format='GIF',
                       append_images=processed_imgs[1:],
                       save_all=True,
                       duration=100,
                       loop=0)

In [None]:
# IPyImage('web_cam.gif', format='png', width=15 * 40, height=3 * 40) 

## Question 2: Traffic Scene  Object Detection

In [None]:
## load the scene gif
url_1 = 'https://i.ibb.co/wpHvb58/scene1.gif'
scene_1_path = 'scene_1.gif'
urllib.request.urlretrieve(url_1, scene_1_path)

In [None]:
# IPyImage(scene_1_path, format='png', width=15 * 40, height=3 * 40)
IPyImage(url=url_1)

In [None]:
model = YOLO('yolov8x.pt')  # load a pretrained YOLOv8n detection, v9 model:  YOLO('yolov9c.pt')

In [None]:
from tqdm import tqdm

gif_object = Image.open(scene_1_path)

# Display individual frames from the loaded animated GIF file
processed_imgs = []
for _, ind in tqdm(enumerate(range(0, gif_object.n_frames))):
    gif_object.seek(ind)
    ## frame in numpy array format (512, 512, 3)
    frame = np.array(gif_object.convert('RGB'))[:,:,::-1]
    
    ## Object Detection code
    ...
    
    img = Image.fromarray(np.uint8(annotated_image_rgb))
    processed_imgs.append(img)

In [None]:
## save the processed scene gif
processed_imgs[0].save('scene1_boxes.gif',
                       format='GIF',
                       append_images=processed_imgs[1:],
                       save_all=True,
                       duration=200,
                       loop=0)

In [None]:
IPyImage('scene1_boxes.gif', format='png', width=15 * 40, height=3 * 40) 

## YOLO for Different Vision Tasks

###  Segmentation

Segmentation models, indicated by the `-seg` suffix (e.g., `yolov8n-seg.pt`), are designed for more detailed analysis. These models not only detect objects but also delineate their exact shapes, segmenting each object from the background and other objects.


In [None]:
model = YOLO('yolo11n-seg.pt')

In [None]:
help(model)

In [None]:
results = model('traffic.jpg')
results[0].show()  # Display the segmentation results

In [None]:
res = results[0]
res

In [None]:
res.masks

In [None]:
res.names

In [None]:
from ultralytics.engine.results import Results

def extract_segmentation_mask(
    detection_result: Results,
    object_index: int = 0
) -> np.ndarray:
    """
    Extracts a binary segmentation mask for a specific detected object.
    
    This function creates a precise binary mask from YOLO segmentation contours
    where white pixels (255) represent the object and black pixels (0) represent 
    the background.
    
    Parameters:
        detection_result: The YOLO Results object containing segmentation data
        object_index: The index of the object to extract (0 for first detection)
    
    Returns:
        A binary mask as a NumPy array of type uint8
    """
    # Verify the inputs
    if detection_result.masks is None:
        raise ValueError("No segmentation masks found in detection results")
    
    if object_index >= len(detection_result.masks.xy):
        raise IndexError(f"Object index {object_index} is out of range. "
                         f"Only {len(detection_result.masks.xy)} objects detected.")
    
    # Get original image dimensions
    height, width = detection_result.orig_img.shape[:2]
    
    # Create an empty mask with the original image dimensions
    binary_mask = np.zeros((height, width), dtype=np.uint8)
    
    # Get the contour points from the detection result
    contour_points = detection_result.masks.xy[object_index]
    
    # Format the contour for OpenCV's drawContours function
    formatted_contour = contour_points.astype(np.int32).reshape(-1, 1, 2)
    
    # Draw the filled contour onto the empty mask
    cv2.drawContours(binary_mask, [formatted_contour], -1, 255, cv2.FILLED)

    # Convert the mask to a binary format
    binary_mask = binary_mask.astype(bool)

    return binary_mask

In [None]:
img = np.copy(res.orig_img)

for object_index in range(len(res.boxes.cls.tolist())):
    if object_index == 2:
        break
    # Get class label
    label = res.names[res.boxes.cls.tolist()[object_index]]
    print('_'*50)
    print(label)

    # Create the binary mask for the object
    b_mask = extract_segmentation_mask(res, object_index)

    # Create the isolated image from the binary b_mask
    masked_img = img.copy()
    # Set background to black
    masked_img[~b_mask] = 0

    plt.figure(figsize=(9, 5))
    plt.imshow(masked_img) # RGB PIL image
    plt.axis('off')
    plt.show()

In [None]:
annotated_image_bgr = res.plot()  # BGR numpy array of predictions
annotated_image_rgb = annotated_image_bgr[..., ::-1]  # Convert BGR to RGB

# Plot the annotated image
plt.figure(figsize=(9, 5))
plt.imshow(annotated_image_rgb) # RGB PIL image
plt.axis('off')
plt.show()

#### Working with Segmentation Masks: Visualization and Manipulation

In [None]:
url = 'https://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg'
image_path = 'einstein.jpg'
urllib.request.urlretrieve(url, image_path)

# Load and display image
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(4, 4))
plt.imshow(image_rgb)
plt.axis('off')
plt.title('Original Image')
plt.show()

In [None]:
# Run segmentation
results = model(image, conf=0.3)  # Lower confidence threshold for our example
result = results[0]

# Display the default visualization
plt.figure(figsize=(10, 10))
plt.imshow(result.plot())
plt.axis('off')
plt.title('Default YOLO Segmentation Result')
plt.show()

##### Explore the Segmentation Data Structure

In [None]:
print("Available attributes in result:")
print(dir(result))

# Check mask information
if result.masks is not None:
    print("\nMask information:")
    print(f"Number of masks: {len(result.masks)}")
    print(f"Shape of first mask: {result.masks.data[0].shape}")
    print(f"Data type: {result.masks.data[0].dtype}")
    
    # Get class information
    if len(result.boxes) > 0:
        print("\nDetected classes:")
        for i, box in enumerate(result.boxes):
            class_id = int(box.cls)
            class_name = result.names[class_id]
            confidence = float(box.conf)
            print(f"Object {i+1}: {class_name} (Confidence: {confidence:.2f})")

In [None]:
def pixelate(image, blocks=16):
    """Heavily pixelate an image region by downsampling and upsampling"""
    # Get dimensions
    h, w = image.shape[:2]
    
    # Downsample
    temp = cv2.resize(image, (blocks, blocks), interpolation=cv2.INTER_LINEAR)
    
    # Upsample back to original size
    return cv2.resize(temp, (w, h), interpolation=cv2.INTER_NEAREST)

# Step 1: Make sure we have a mask for the detected object
if result.masks is not None and len(result.masks) > 0:
    # Get the mask for the first detected object
    object_mask = extract_segmentation_mask(result, 0)
    
    # Step 2: Create a pixelated version of the entire image
    pixelated_image = pixelate(image, blocks=16)
    
    # Step 3: Create two different pixelation effects
    
    # Example 1: Pixelate everything EXCEPT the object
    inverse_pixelation = pixelated_image.copy()
    inverse_pixelation[object_mask] = image_rgb[object_mask]
    
    # Example 2: Pixelate ONLY the object
    object_pixelation = image_rgb.copy()
    object_pixelation[object_mask] = pixelated_image[object_mask]
    
    # Display the results
    plt.figure(figsize=(15, 10))
    
    plt.subplot(2, 2, 1)
    plt.imshow(image_rgb)
    plt.title('Original Image')
    plt.axis('off')
    
    plt.subplot(2, 2, 2)
    plt.imshow(pixelated_image)
    plt.title('Fully Pixelated Image')
    plt.axis('off')
    
    plt.subplot(2, 2, 3)
    plt.imshow(inverse_pixelation)
    plt.title('Pixelated Background, Clear Object')
    plt.axis('off')
    
    plt.subplot(2, 2, 4)
    plt.imshow(object_pixelation)
    plt.title('Clear Background, Pixelated Object')
    plt.axis('off')
    
    plt.tight_layout()
    plt.show()

### Question 3: Real-time Person Pixelation with Webcam

For this final exercise, create a webcam application that automatically detects and pixelates all people in the frame while keeping the rest of the image clear.

In [None]:
from PIL import Image
from IPython.display import Image as IPyImage
import cv2
import numpy as np
import time
from ultralytics import YOLO
from tqdm import tqdm


# Load the YOLO segmentation model
model = YOLO(...)

# Initialize webcam
cap = cv2.VideoCapture(0)
time.sleep(1)  # Letting the camera autofocus

# Parameters
NUM_FRAMES = 200  # Number of frames to capture
processed_imgs = []


# Process webcam feed
for i in tqdm(range(NUM_FRAMES)):
    # Load frame from the camera
    ret, frame = cap.read()
    if not ret:
        break
    
    # Make a copy for visualization
    display_frame = frame.copy()
    
    # Run YOLO segmentation
    results = model(frame, verbose=False)
    result = results[0]
    
    # Find the person class ID
    # Hint: Look through result.names dictionary to find the 'person' class
    person_class_id = ...
    
    # Create pixelated version of the entire frame
    pixelated_frame = pixelate(frame, blocks=10)
    
    # Check if any people were detected
    if result.masks is not None and len(result.boxes) > 0:
        # Create a combined mask for all people
        h, w = frame.shape[:2]
        people_mask = np.zeros((h, w), dtype=bool)
        
        # Process each detected person
        for i, box in enumerate(result.boxes):
            # Use the correct class ID to identify people
            if ...:
                # Use the extract_segmentation_mask function
                # to get the mask for this person
                current_mask = ...
                
                # Combine with the overall people mask
                people_mask = ...
        
        # Apply pixelation only to people areas
        # Use boolean indexing to replace only the masked areas
        ...
    
    # Display the processed frame
    cv2.putText(display_frame, "People Pixelation", (10, 30), 
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow("Privacy Filter", display_frame)
    
    # Convert to RGB for PIL and save to list
    display_rgb = cv2.cvtColor(display_frame, cv2.COLOR_BGR2RGB)
    img = Image.fromarray(np.uint8(display_rgb))
    processed_imgs.append(img)
    
    # Exit if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release resources
cap.release()
cv2.destroyAllWindows()

# Save processed frames as a GIF
# Complete the code to save the frames as a GIF
if processed_imgs:
    ...



In [None]:
# Display the GIF
try:
    IPyImage('webcam_pixelated_people.gif', format='png', width=15 * 40, height=10 * 40)
except:
    print("GIF display failed. The file should still be saved.")

### Pose

Pose estimation models, identified by the `-pose` suffix (e.g., `yolov8n-pose.pt`), are specialized in detecting human figures and estimating their postures by identifying key body points.

In [None]:
model = YOLO('yolov8l-pose.pt')

In [None]:
image_urls = [
    (
        'https://images.unsplash.com/photo-1561049501-e1f96bdd98fd?q=80&w=2778&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D',
        'yoga_1.jpg'
    ),
    (
        'https://images.unsplash.com/photo-1545205597-3d9d02c29597?q=80&w=2940&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D',
        'yoga_2.jpg'
    ),
]
image_paths = download_images(image_urls, plot_images=True)

In [None]:
# Run the model
results = model(image_paths)

# Show results
for i, r in enumerate(results):
    # Extract the original and annotated images
    original_img = r.orig_img[..., ::-1]
    annotated_image_bgr = r.plot()  # BGR numpy array of predictions
    annotated_image_rgb = annotated_image_bgr[..., ::-1]  # Convert BGR to RGB
    
    # Plot the annotated image
    plt.figure(figsize=(10, 6))
    plt.imshow(annotated_image_rgb) # RGB PIL image
    plt.axis('off')
    plt.title(image_paths[i])
    plt.show()

In [None]:
r.keypoints.conf.shape

In [None]:
r.keypoints.conf

In [None]:
help(r)