# Assignment Chapter 3 - COMPUTER VISION [Case #3]
Startup Campus, Indonesia - `Artificial Intelligence (AI)` (Batch 7)
* Dataset: Any YouTube videos
* Libraries: PyTorch, Numpy, OpenCV2
* Objective: Real-time Object Detection using CNN-based Pre-trained Models

`PERSYARATAN` Semua modul (termasuk versi yang sesuai) sudah di-install dengan benar.
<br>`CARA PENGERJAAN` Lengkapi baris kode yang ditandai dengan **#TODO**.
<br>`TARGET PORTFOLIO` Peserta mampu mengimplementasikan model PyTorch untuk mendeteksi objek secara *real-time*.

### Install additional library

In [2]:
!pip install cap-from-youtube

Collecting cap-from-youtube
  Downloading cap_from_youtube-0.2.0-py3-none-any.whl.metadata (3.1 kB)
Collecting yt-dlp (from cap-from-youtube)
  Downloading yt_dlp-2024.10.22-py3-none-any.whl.metadata (171 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.6/171.6 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Collecting brotli (from yt-dlp->cap-from-youtube)
  Downloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.5 kB)
Collecting mutagen (from yt-dlp->cap-from-youtube)
  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)
Collecting pycryptodomex (from yt-dlp->cap-from-youtube)
  Downloading pycryptodomex-3.21.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting websockets>=13.0 (from yt-dlp->cap-from-youtube)
  Downloading websockets-13.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metada

### Import Libraries

In [3]:
import torch
import numpy as np
import cv2

from cap_from_youtube import cap_from_youtube
from time import time
from datetime import datetime as dt

### User-defined Class

In [4]:
class ObjectDetection:
    def __init__(self, url, out_file="{}_video.avi".format(dt.now().strftime("%Y%m%d_%H%M%S"))):
        """
        Initializes the class with youtube url and output file.
        :param url: Has to be as youtube URL,on which prediction is made.
        :param out_file: A valid output file name.
        """

        self._URL = url
        self.model = self.load_model()
        self.classes = self.model.names
        self.out_file = out_file
        self.device = 'cuda' if torch.cuda.is_available() else 'cpu'

    def get_video_from_url(self):
        """
        Creates a new video streaming object to extract video frame by frame to make prediction on.
        :return: opencv2 video capture object, with lowest quality frame available for video.
        """

        return cap_from_youtube(self._URL)

    def load_model(self):
        """
        Loads the model from pytorch hub.
        :return: Trained Pytorch model.
        """

        # TODO: Panggil model ultralytics/yolov5
        # Lihat caranya di https://pytorch.org/hub/ultralytics_yolov5/#load-from-pytorch-hub
        model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
        return model

    def score_frame(self, frame):
        """
        Takes a single frame as input, and scores the frame using yolo5 model.
        :param frame: input frame in numpy/list/tuple format.
        :return: Labels and Coordinates of objects detected by model in the frame.
        """

        self.model.to(self.device)

        frame = [frame]
        results = self.model(frame)
        labels, cord = results.xyxyn[0][:, -1].cpu().numpy(), results.xyxyn[0][:, :-1].cpu().numpy()
        return labels, cord

    def class_to_label(self, x):
        """
        For a given label value, return corresponding string label.
        :param x: numeric label
        :return: corresponding string label
        """

        return self.classes[int(x)]

    def plot_boxes(self, results, frame):
        """
        Takes a frame and its results as input, and plots the bounding boxes and label on to the frame.
        :param results: contains labels and coordinates predicted by model on the given frame.
        :param frame: Frame which has been scored.
        :return: Frame with bounding boxes and labels ploted on it.
        """

        labels, cord = results
        n = len(labels)
        x_shape, y_shape = frame.shape[1], frame.shape[0]
        for i in range(n):
            row = cord[i]
            if row[4] >= 0.2:
                x1, y1, x2, y2 = int(row[0]*x_shape), int(row[1]*y_shape), int(row[2]*x_shape), int(row[3]*y_shape)
                bgr = (0, 255, 0)
                cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
                cv2.putText(frame, self.class_to_label(labels[i]), (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.9, bgr, 2)

        return frame

    def __call__(self):
        """
        This function is called when class is executed, it runs the loop to read the video frame by frame,
        and write the output into a new file.
        :return: void
        """

        player = self.get_video_from_url()
        assert player.isOpened()

        x_shape = int(player.get(cv2.CAP_PROP_FRAME_WIDTH))
        y_shape = int(player.get(cv2.CAP_PROP_FRAME_HEIGHT))
        four_cc = cv2.VideoWriter_fourcc(*"MJPG")
        out = cv2.VideoWriter(self.out_file, four_cc, 20, (x_shape, y_shape))

        for i in range(1, 300):
            start_time = time()
            ret, frame = player.read()

            results = self.score_frame(frame)
            frame = self.plot_boxes(results, frame)
            end_time = time()

            fps = 1/np.round(end_time - start_time, 3)
            print(f"Frames Per Second : {fps}")
            out.write(frame)

### IMPORTANT: Activate your GPU

- Di Google Colab, klik **Runtime > Change runtime time**, lalu silakan pilih **T4 GPU**.

### Start Object Detection

In [11]:
if __name__ == "__main__":
    # Pastikan CUDA enable: TRUE
    print("CUDA enable: {}".format(torch.cuda.is_available()))

    # TODO: Isi parameter dengan URL YouTube yang tersedia (secara bergantian):
    # 1. Crowded place: https://www.youtube.com/watch?v=dwD1n7N7EAg
    # 2. Solar system: https://www.youtube.com/watch?v=g2KmtA97HxY
    # 3. Road traffic: https://www.youtube.com/watch?v=wqctLW0Hb_0
    urls = ["https://www.youtube.com/watch?v=dwD1n7N7EAg","https://www.youtube.com/watch?v=g2KmtA97HxY","https://www.youtube.com/watch?v=wqctLW0Hb_0"]
    for url in urls:
        import warnings
        warnings.simplefilter("ignore", category=FutureWarning)
        run_model = ObjectDetection(url=url)
        run_model()
        print("+++"*50)

CUDA enable: True


Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-10-29 Python-3.10.12 torch-2.5.0+cu121 CUDA:0 (Tesla T4, 15102MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


[youtube] Extracting URL: https://www.youtube.com/watch?v=dwD1n7N7EAg
[youtube] dwD1n7N7EAg: Downloading webpage
[youtube] dwD1n7N7EAg: Downloading ios player API JSON
[youtube] dwD1n7N7EAg: Downloading mweb player API JSON
[youtube] dwD1n7N7EAg: Downloading m3u8 information
Frames Per Second : 2.8901734104046244
Frames Per Second : 45.45454545454546
Frames Per Second : 28.57142857142857
Frames Per Second : 43.47826086956522
Frames Per Second : 45.45454545454546
Frames Per Second : 52.631578947368425
Frames Per Second : 43.47826086956522
Frames Per Second : 50.0
Frames Per Second : 45.45454545454546
Frames Per Second : 50.0
Frames Per Second : 45.45454545454546
Frames Per Second : 50.0
Frames Per Second : 45.45454545454546
Frames Per Second : 50.0
Frames Per Second : 43.47826086956522
Frames Per Second : 50.0
Frames Per Second : 50.0
Frames Per Second : 52.631578947368425
Frames Per Second : 45.45454545454546
Frames Per Second : 50.0
Frames Per Second : 50.0
Frames Per Second : 41.6666

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-10-29 Python-3.10.12 torch-2.5.0+cu121 CUDA:0 (Tesla T4, 15102MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


[youtube] Extracting URL: https://www.youtube.com/watch?v=g2KmtA97HxY
[youtube] g2KmtA97HxY: Downloading webpage
[youtube] g2KmtA97HxY: Downloading ios player API JSON
[youtube] g2KmtA97HxY: Downloading mweb player API JSON
[youtube] g2KmtA97HxY: Downloading m3u8 information
Frames Per Second : 4.201680672268908
Frames Per Second : 47.61904761904761
Frames Per Second : 40.0
Frames Per Second : 55.55555555555556
Frames Per Second : 55.55555555555556
Frames Per Second : 55.55555555555556
Frames Per Second : 38.46153846153846
Frames Per Second : 45.45454545454546
Frames Per Second : 34.48275862068965
Frames Per Second : 45.45454545454546
Frames Per Second : 45.45454545454546
Frames Per Second : 50.0
Frames Per Second : 43.47826086956522
Frames Per Second : 41.666666666666664
Frames Per Second : 52.631578947368425
Frames Per Second : 50.0
Frames Per Second : 50.0
Frames Per Second : 62.5
Frames Per Second : 58.8235294117647
Frames Per Second : 62.5
Frames Per Second : 66.66666666666667
Fra

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-10-29 Python-3.10.12 torch-2.5.0+cu121 CUDA:0 (Tesla T4, 15102MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


[youtube] Extracting URL: https://www.youtube.com/watch?v=wqctLW0Hb_0
[youtube] wqctLW0Hb_0: Downloading webpage
[youtube] wqctLW0Hb_0: Downloading ios player API JSON
[youtube] wqctLW0Hb_0: Downloading mweb player API JSON
[youtube] wqctLW0Hb_0: Downloading m3u8 information
Frames Per Second : 1.6366612111292962
Frames Per Second : 45.45454545454546
Frames Per Second : 16.949152542372882
Frames Per Second : 50.0
Frames Per Second : 43.47826086956522
Frames Per Second : 43.47826086956522
Frames Per Second : 41.666666666666664
Frames Per Second : 43.47826086956522
Frames Per Second : 50.0
Frames Per Second : 58.8235294117647
Frames Per Second : 66.66666666666667
Frames Per Second : 66.66666666666667
Frames Per Second : 66.66666666666667
Frames Per Second : 71.42857142857143
Frames Per Second : 62.5
Frames Per Second : 55.55555555555556
Frames Per Second : 58.8235294117647
Frames Per Second : 66.66666666666667
Frames Per Second : 66.66666666666667
Frames Per Second : 66.66666666666667
Fr

In [5]:
# [ PERTANYAAN ]
# TODO: Apa perbedaan "image classification" dan "object detection"?

Image Classification melakukan klasifikasi seluruh gambar ke dalam satu buah kategori atau label tertentu, yang setiap labelnya mewakili isi atau konten utama dari gambar secara keseluruhan.

Object detection dapat mengidentifikasi dan memisahkan beberapa objek yang terdapat pada suatu gambar yang biasanya objek akan dikotaki dan kemudian melabeli seriap objek berbeda yang terdeteksi

In [6]:
# [ PERTANYAAN ]
# TODO: Di video mana YOLOv5 memiliki akurasi deteksi terburuk? Mengapa?

Video yang memiliki kemungkinan akurasi terburuk adalah Solar System, mengapa? Karena pada hasil output yang terdapat pada video Solar System menunjukkan kecepatan deteksi yang sangat tinggi dibanding video #1(yang menunjukkan model sedikit struggle) dan #2 yang cukup cepat namun stabil dan tidak terlalu cepat, dan untuk solar system, YOLOv5 adalah model yang dilatih menggunakan gambar asli/dunia nyata(manusia,mobil, dll), dan jika mendeteksi video pada solar system yang merupakan objek yang sangat berbeda dari objek yang ada dalam dataset pelatihan, biasanya video objek angkasa selalu bervariasi dalam pencahayaan nya yang secara visual terbilang cukup unik.

### Scoring
Total `#TODO` = 4
<br>Checklist:

- [x] Panggil model ultralytics/yolov5
- [x] Isi parameter dengan URL YouTube yang tersedia
- [x] [ PERTANYAAN ] Apa perbedaan "image classification" dan "object detection"?
- [x] [ PERTANYAAN ] Di video mana YOLOv5 memiliki akurasi deteksi terburuk? Mengapa?

### Additional readings
* N/A

### Copyright © 2024 Startup Campus, Indonesia
* Prepared by **Nicholas Dominic, M.Kom.** [(profile)](https://linkedin.com/in/nicholas-dominic)
* You may **NOT** use this file except there is written permission from PT. Kampus Merdeka Belajar (Startup Campus).
* Please address your questions to mentors.