Raissa Anggia Maharani

2206048581

For this Hands-On Prompts, I choose YOLO

1. In Google Colab, install the libraries necessary for real-time object detection using YOLO and video processing using OpenCV. Explain the roles of each library
Install the libraries required for YOLO object detection and OpenCV for handling video input and output. Include PyTorch for running the YOLO model, and ensure that GPU acceleration is enabled in Google Colab to enhance performance.

In [None]:
# Install PyTorch with GPU support
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install OpenCV for video processing
!pip install opencv-python-headless

# Install YOLOv5 from the official repository (includes dependencies like PyTorch, PyYAML, etc.)
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt


In [None]:
import torch
print(torch.cuda.is_available())  # Should return True if GPU is enabled

2. Since Google Colab doesn't support direct webcam input, upload a video file and load the pre-trained YOLOv5 model for object detection
Explain how to upload a video file and mount Google Drive in Colab. Load the YOLOv5 model for detecting objects in the video frames. Discuss the architecture of YOLO and how it processes each frame, detecting multiple objects and outputting bounding boxes and class labels.

Upload Video

In [None]:
from google.colab import files
uploaded = files.upload()  # Upload the video file (e.g., .mp4)


Load YOLOv5 for Object Detection

YOLOv5 comes with pre-trained models that you can load easily. You can either use a small, fast model like yolov5s or a larger, more accurate model like yolov5x. Here’s how to load the pre-trained model and perform object detection on the video frames:

In [None]:
import torch
import cv2
from google.colab.patches import cv2_imshow  # Import cv2_imshow

# Load the YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Load video
cap = cv2.VideoCapture('/content/race_car.mp4')

# Loop through video frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection on the frame
    results = model(frame)

    # Get results
    results.render()  # This modifies the frame with bounding boxes and labels

    # Display the frame
    cv2_imshow(frame)  # Use cv2_imshow instead of cv2.imshow
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


3. Detect objects frame by frame using YOLOv5 and extract relevant information (bounding boxes, class labels)
Process each frame using YOLOv5, extracting bounding boxes and object classifications. Discuss how YOLO handles multiple detections in real time, and how to structure the extracted data (bounding boxes, labels) for later use.

In [None]:
import torch
import cv2
import pandas as pd

# Load the YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Load video
video_path = '/content/race_car.mp4'
cap = cv2.VideoCapture(video_path)

# Prepare a list to store the extracted information
extracted_data = []

# Loop through video frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection on the frame
    results = model(frame)

    # Extract bounding boxes, class labels, and confidences
    for detection in results.xyxy[0]:  # [x1, y1, x2, y2, confidence, class]
        x1, y1, x2, y2, conf, cls = detection.tolist()
        label = model.names[int(cls)]

        # Store the relevant data for this frame
        extracted_data.append({
            'frame_id': cap.get(cv2.CAP_PROP_POS_FRAMES),  # Current frame number
            'x1': x1, 'y1': y1, 'x2': x2, 'y2': y2,
            'confidence': conf,
            'class_id': int(cls),
            'class_label': label
        })

    # (Optional) Display the frame with bounding boxes
    results.render()  # This modifies the frame with bounding boxes and labels
    cv2.imshow('YOLOv5 Object Detection', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release video capture and destroy windows
cap.release()
cv2.destroyAllWindows()

# Convert extracted data into a DataFrame for analysis
df = pd.DataFrame(extracted_data)


Saving Extracted Data to CSV

In [None]:
df.to_csv('yolo_detections.csv', index=False)

Normalize the Bounding Box Coordinates

In [None]:
import torch
import cv2

# Load the YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Load the video
video_path = '/content/race_car.mp4'  # Replace with your video path
cap = cv2.VideoCapture(video_path)

# Get frame width and height for normalization
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Prepare a list to store the extracted information
extracted_data = []

# Loop through video frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection on the current frame
    results = model(frame)  # Pass the frame to the YOLO model

    # Extract bounding boxes, class labels, and confidence scores from results
    for detection in results.xyxy[0]:  # [x1, y1, x2, y2, confidence, class]
        x1, y1, x2, y2, conf, cls = detection.tolist()

        # Normalize the bounding box coordinates
        x1_norm = x1 / frame_width
        y1_norm = y1 / frame_height
        x2_norm = x2 / frame_width
        y2_norm = y2 / frame_height

        # Store the relevant data for this frame
        extracted_data.append({
            'frame_id': int(cap.get(cv2.CAP_PROP_POS_FRAMES)),  # Current frame number
            'x1_norm': x1_norm, 'y1_norm': y1_norm, 'x2_norm': x2_norm, 'y2_norm': y2_norm,
            'confidence': conf,
            'class_id': int(cls),
            'class_label': model.names[int(cls)]  # Convert class index to label
        })

# Release video capture after processing
cap.release()

# Now, the 'extracted_data' list contains all the detection data for further use.


4. Preprocess the object detection results and store them for further analysis
Explain how to preprocess the YOLO output by normalizing coordinates and converting the data into a structured format suitable for tracking and further analysis. Highlight the importance of maintaining consistent object identifiers across frames to facilitate effective tracking.


In [None]:
import torch
import cv2
import pandas as pd

# Load the YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Load video
video_path = '/content/race_car.mp4'
cap = cv2.VideoCapture(video_path)

# Get frame width and height for normalization
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Prepare a list to store the extracted information
extracted_data = []

# Loop through video frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection on the frame
    results = model(frame)

    # Normalize and extract bounding boxes, class labels, and confidences
    for detection in results.xyxy[0]:  # [x1, y1, x2, y2, confidence, class]
        x1, y1, x2, y2, conf, cls = detection.tolist()

        # Normalize the bounding box coordinates
        x1_norm = x1 / frame_width
        y1_norm = y1 / frame_height
        x2_norm = x2 / frame_width
        y2_norm = y2 / frame_height

        # Store the relevant data for this frame
        extracted_data.append({
            'frame_id': int(cap.get(cv2.CAP_PROP_POS_FRAMES)),  # Current frame number
            'x1_norm': x1_norm, 'y1_norm': y1_norm, 'x2_norm': x2_norm, 'y2_norm': y2_norm,
            'confidence': conf,
            'class_id': int(cls),
            'class_label': model.names[int(cls)]
        })

# Release video capture
cap.release()

# Convert the extracted data into a DataFrame for analysis
df = pd.DataFrame(extracted_data)


 Saving Preprocessed Data for Tracking

In [None]:
 # Save preprocessed data to a CSV file for future use
df.to_csv('preprocessed_yolo_detections.csv', index=False)


5. Implement object tracking using YOLOv5 output and explain the significance of tracking objects over multiple frames
Track objects across frames, focusing on the need to maintain identity consistency. This ensures that objects can be followed and tracked accurately as they move across multiple frames. Discuss various object tracking methods and their importance in real-time video applications.

Install DeepSORT and YOLOv5 Integration

In [None]:
!pip install deep-sort-realtime

In [None]:
import torch
import cv2
from deep_sort_realtime.deepsort_tracker import DeepSort

# Load YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Initialize DeepSORT tracker
tracker = DeepSort(max_age=30, nn_budget=70, nms_max_overlap=1.0)

# Load video
video_path = '/content/race_car.mp4'
cap = cv2.VideoCapture(video_path)

# Loop through video frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection with YOLOv5
    results = model(frame)

    # Extract detection results (xyxy format)
    dets = []
    for detection in results.xyxy[0]:  # [x1, y1, x2, y2, confidence, class]
        x1, y1, x2, y2, conf, cls = detection.tolist()
        # Convert to format [x1, y1, width, height] for DeepSORT
        bbox = [x1, y1, x2 - x1, y2 - y1]
        dets.append((bbox, conf, int(cls)))

    # Update tracker with the current frame's detections
    tracks = tracker.update_tracks(dets, frame=frame)

    # Draw bounding boxes and track IDs on frame
    for track in tracks:
        if not track.is_confirmed():
            continue

        track_id = track.track_id
        ltrb = track.to_ltrb()  # Get left, top, right, bottom coordinates
        class_id = track.get_class()

        # Draw bounding box and track ID
        cv2.rectangle(frame, (int(ltrb[0]), int(ltrb[1])), (int(ltrb[2]), int(ltrb[3])), (255, 0, 0), 2)
        cv2.putText(frame, f'ID: {track_id} Class: {class_id}', (int(ltrb[0]), int(ltrb[1]) - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    # Display the frame (optional)
    cv2.imshow('Tracked Frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


In [None]:
import cv2
import torch

# Load the YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Open the video file
video_path = '/race_car.mp4'  # Ganti dengan path ke video Anda
cap = cv2.VideoCapture(video_path)

# Get video properties
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Define the codec and create VideoWriter object
output_path = '/race_car.mp4'  # Ganti dengan path untuk menyimpan video output
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec untuk mp4
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform inference on the frame
    results = model(frame)

    # Process results
    # results.xyxy[0] contains bounding box coordinates and confidence scores
    for *xyxy, conf, cls in results.xyxy[0]:
        label = f'{model.names[int(cls)]} {conf:.2f}'  # Get label and confidence
        # Draw bounding box on the frame
        cv2.rectangle(frame, (int(xyxy[0]), int(xyxy[1])), (int(xyxy[2]), int(xyxy[3])), (255, 0, 0), 2)
        cv2.putText(frame, label, (int(xyxy[0]), int(xyxy[1] - 10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    # Write the frame with detections to the output video
    out.write(frame)

    # Optionally display the resulting frame
    cv2.imshow('YOLOv5 Object Detection', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release resources
cap.release()
out.release()
cv2.destroyAllWindows()



In [None]:
import cv2
import torch

# Load the YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Open the video file
video_path = '/content/race_car.mp4'
cap = cv2.VideoCapture(video_path)

# Get video properties
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Define the codec and create VideoWriter object
output_path = '/content/output_video.mp4'
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("No more frames to read.")
        break

    # Perform inference on the frame
    results = model(frame)

    # Process results
    for *xyxy, conf, cls in results.xyxy[0]:
        label = f'{model.names[int(cls)]} {conf:.2f}'
        cv2.rectangle(frame, (int(xyxy[0]), int(xyxy[1])), (int(xyxy[2]), int(xyxy[3])), (255, 0, 0), 2)
        cv2.putText(frame, label, (int(xyxy[0]), int(xyxy[1] - 10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    # Write the frame with detections to the output video
    out.write(frame)

# Release resources
cap.release()
out.release()
print("Finished processing video.")


In [None]:
from IPython.display import Video

# Menampilkan video yang telah disimpan
Video('/content/output_video.mp4', embed=True)


5. Optimize YOLO performance in Google Colab by adjusting input parameters and enabling GPU acceleration
Explore strategies to optimize the YOLOv5 pipeline for real-time performance. This can include adjusting the input resolution to balance speed and accuracy, skipping frames to process videos faster, and using model optimizations like quantization to reduce inference time while maintaining detection quality.


In [None]:
!git clone https://github.com/ultralytics/yolov5  # Clone YOLOv5 repository
%cd yolov5
!pip install -r requirements.txt  # Install dependencies

In [None]:
import torch

# Load YOLOv5 model with lower input size for faster processing
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)  # Load YOLOv5 small model
model.eval()  # Set model to evaluation mode
model.stride = 32  # Stride size for downsampling
model.img_size = 416  # Reduce input size to 416x416 for faster processing

In [None]:
import cv2

# Function to process video with skipping frames
def process_video_with_skipping(input_video_path, output_video_path, skip_frames=3):
    cap = cv2.VideoCapture(input_video_path)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Define video output
    out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps // skip_frames, (frame_width, frame_height))

    frame_count = 0
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Only process every 'skip_frames' frames
        if frame_count % skip_frames == 0:
            # Run YOLOv5 inference on frame
            results = model(frame)
            results.render()  # Draw bounding boxes on frame

            # Write processed frame to output video
            out.write(results.imgs[0])

        frame_count += 1

    cap.release()
    out.release()
    print("Finished processing video.")

# Example usage
process_video_with_skipping('input_video.mp4', 'output_video.mp4')


In [None]:
from torch.quantization import quantize_dynamic
import cv2

# Perform quantization on YOLOv5 model
quantized_model = quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8  # Quantize linear layers to int8
)

# Function to run quantized model on each frame of a video
def run_quantized_inference_on_video(input_video_path, output_video_path):
    cap = cv2.VideoCapture(input_video_path)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Define video output settings
    out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (frame_width, frame_height))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Run inference on the current frame
        results = quantized_model(frame)  # Process frame with quantized model
        results.render()  # Draw bounding boxes and labels on the frame

        # Write the processed frame to the output video
        out.write(results.imgs[0])

    # Release resources
    cap.release()
    out.release()
    print("Finished processing video with quantized model.")

# Example usage
run_quantized_inference_on_video('input_video.mp4', 'output_video_quantized.mp4')
