# Task 1: Setup your development environment and store the test video locally (10 points)

## Evironment

I'm using a Docker-based environment that takes advantage of my NVIDIA GPU. To learn more about the configuration for this container, please see docker-compose.yaml, Dockerfile, and requirements.txt files. In short, I'm using a tensorflow base image that supports Jupyter notebooks and GPU usage, installing dependencies, and exposing a port so I can access the notebook within the container through my host machine's browser. I'm also mounting a volume to persist any changes made in the container. To set this up, I also had to install the appropriate GPU drivers, WSL2, and Docker Desktop and did so by following [this tutorial](https://www.youtube.com/watch?v=YozfiLI1ogY&t=717s&ab_channel=KNuggies).

In [8]:
import torch

In [9]:
print(torch.cuda.is_available())

True


## Store Test Video Locally

In [10]:
from pytube import YouTube
import re
import os

In [11]:
def download(url):
    if not os.path.exists("videos"):
        os.makedirs("videos")
    pattern = r"v=([^&]+)"
    videoId = re.search(pattern, url).group(1)
    yt = YouTube(url)
    stream = yt.streams.filter(progressive=True).get_by_itag(22)
    stream.download(filename="videos/" + videoId+".mp4")

In [12]:
urls = ["https://www.youtube.com/watch?v=WeF4wpw7w9k&t=44s&ab_channel=PantelisMonogioudis",
        "https://www.youtube.com/watch?v=2NFwY15tRtA&ab_channel=PantelisMonogioudis",
        "https://www.youtube.com/watch?v=5dRramZVu2Q&ab_channel=R2bEEaton"]

In [13]:
for url in urls:
    download(url)

# Task 2: Object Detection (40 points)

In [14]:
import ultralytics
import ultralyticsplus

In [15]:
from ultralyticsplus import YOLO, render_result
model = YOLO('mshamrai/yolov8s-visdrone')

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'...


100%|██████████| 6.23M/6.23M [00:00<00:00, 71.0MB/s]


config.json:   0%|          | 0.00/161 [00:00<?, ?B/s]

best.pt:   0%|          | 0.00/22.5M [00:00<?, ?B/s]

In [16]:
import cv2
import numpy as np
from PIL import Image

In [17]:
output_directory = "object_detected_videos"
if not os.path.exists(output_directory):
    os.makedirs(output_directory)

In [18]:
def parse(inFile, outFile):
    inVideo = cv2.VideoCapture(inFile)
    fps = inVideo.get(cv2.CAP_PROP_FPS)
    frame_size = (int(inVideo.get(cv2.CAP_PROP_FRAME_WIDTH)), int(inVideo.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    outVideo = cv2.VideoWriter(outFile, cv2.VideoWriter_fourcc(*'mp4v'), fps, frame_size)
    frameCount = 0
    while True:
        success, frame = inVideo.read()
        frameCount += 1
        if not success: break
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = model.predict(frame)
        render = render_result(model=model, image=rgb_frame, result=results[0])
        outFrame = cv2.cvtColor(np.array(render), cv2.COLOR_RGB2BGR)
        outVideo.write(np.array(outFrame))
    inVideo.release()
    outVideo.release()

In [19]:
video_paths = os.listdir("videos")

In [None]:
for path in video_paths:
    videoId = path.split('.')[0]
    parse("videos/" + path, output_directory + "/" + videoId + ".mp4")


0: 352x640 (no detections), 97.1ms
Speed: 5.7ms preprocess, 97.1ms inference, 30.2ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 6.5ms
Speed: 1.9ms preprocess, 6.5ms inference, 0.7ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 7.2ms
Speed: 1.6ms preprocess, 7.2ms inference, 0.7ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 7.5ms
Speed: 3.2ms preprocess, 7.5ms inference, 0.9ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 6.9ms
Speed: 1.5ms preprocess, 6.9ms inference, 0.8ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 7.0ms
Speed: 2.8ms preprocess, 7.0ms inference, 0.8ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 8.5ms
Speed: 3.7ms preprocess, 8.5ms inference, 0.6ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 motor, 8.5ms
Speed: 2.0ms preprocess, 8.5ms inference, 69.0ms post

This code used a model pretrained on a drone dataset as an object detector. I then parsed through each video's frames, using the object detector on each one. Using the output of the object detector, I created new videos with the bounding box and label to show that object detection was successful. Given that GitHub doesn't store large files, I chose to host these videos on YouTube.
- https://youtu.be/KOMcCO_nfD0
- https://youtu.be/Mt_TAALjJQg
- https://youtu.be/YJ3OSMPVL88