<div align="center">

# YOLOv3 simple object detection on video files
</div>

YOLOv3 (You Only Look Once v3) is a state-of-the-art object detection algorithm that is widely used in computer vision applications. It is known for its fast inference speed and high accuracy in detecting multiple objects in a video. In this Jupyter notebook, we will implement a YOLOv3 object detection algorithm on a set of test images using Python and OpenCV library. We will use pre-trained weights and configurations for YOLOv3 and YOLOv3-tiny models to detect objects in video frames and draw bounding boxes around them with corresponding labels.


## Downloading pretrained models and test files

In [1]:
!git clone https://github.com/pjreddie/darknet

Cloning into 'darknet'...
remote: Enumerating objects: 5955, done.[K
remote: Total 5955 (delta 0), reused 0 (delta 0), pack-reused 5955[K
Receiving objects: 100% (5955/5955), 6.37 MiB | 21.19 MiB/s, done.
Resolving deltas: 100% (3932/3932), done.


In [2]:
!git clone https://github.com/mohamedamine99/YOLOv3-simple-object-detection

Cloning into 'YOLOv3-simple-object-detection'...
remote: Enumerating objects: 82, done.[K
remote: Counting objects: 100% (82/82), done.[K
remote: Compressing objects: 100% (67/67), done.[K
remote: Total 82 (delta 22), reused 69 (delta 12), pack-reused 0[K
Unpacking objects: 100% (82/82), 38.87 MiB | 9.79 MiB/s, done.


In [3]:
!wget https://pjreddie.com/media/files/yolov3-tiny.weights

--2023-03-12 12:52:44--  https://pjreddie.com/media/files/yolov3-tiny.weights
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 35434956 (34M) [application/octet-stream]
Saving to: ‘yolov3-tiny.weights’


2023-03-12 12:52:46 (26.3 MB/s) - ‘yolov3-tiny.weights’ saved [35434956/35434956]



In [4]:
!wget https://pjreddie.com/media/files/yolov3.weights

--2023-03-12 12:52:46--  https://pjreddie.com/media/files/yolov3.weights
Resolving pjreddie.com (pjreddie.com)... 128.208.4.108
Connecting to pjreddie.com (pjreddie.com)|128.208.4.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 248007048 (237M) [application/octet-stream]
Saving to: ‘yolov3.weights’


2023-03-12 12:52:53 (36.3 MB/s) - ‘yolov3.weights’ saved [248007048/248007048]



## YOLOv3 implementation

In [5]:
import os
import shutil
import time

import numpy as np
import cv2
import matplotlib.pyplot as plt
import imageio

In [6]:
# Paths for various files and directories
coco_names_file = '/content/YOLOv3-simple-object-detection/coco.names'

yolov3_cfg = '/content/YOLOv3-simple-object-detection/configs/yolov3.cfg'
yolov3_tiny_cfg = '/content/YOLOv3-simple-object-detection/configs/yolov3-tiny.cfg'

yolov3_weights = '/content/yolov3.weights'
yolov3_tiny_weights = '/content/yolov3-tiny.weights'

test_vids_path = '/content/YOLOv3-simple-object-detection/test vids'
results_yolov3 = '/content/results/YOLOv3'
results_yolov3_tiny = '/content/results/YOLOv3_tiny'

In [7]:
# Reading the COCO dataset class names from the coco names file
labels = []
with open(coco_names_file, 'rt') as coco_file:
    labels = coco_file.read().rstrip('\n').rsplit('\n')
    
print(labels)

['person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa', 'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']


In [8]:
# Creating YOLOv3 DNN model from configuration and pre-trained weights
net = cv2.dnn.readNetFromDarknet(yolov3_cfg, yolov3_weights)

# Creating YOLOv3-tiny DNN model from configuration and pre-trained weights
net_tiny = cv2.dnn.readNetFromDarknet(yolov3_tiny_cfg, yolov3_tiny_weights)

In [9]:
def preprocess_img_for_detection(net, img, size = (320, 320)):
    """
    This function preprocesses an input image for object detection using a specified YOLOv3 or YOLOv3-tiny
    DNN model. The image is resized to the specified size and converted into a blob. The blob is then set
    as the input for the DNN model. The function returns the output of the DNN model after forward pass.

    Parameters:
        net: cv2.dnn_Net object
        YOLOv3 or YOLOv3-tiny DNN model.

        img: numpy.ndarray
        Input image for object detection.
    
        size: tuple, optional
        Size to which the input image is resized. Default value is (320, 320).

    Returns:
        outputs: numpy.ndarray
        Output of the DNN model after forward pass.
    

    """
    # Convert the input image into a blob
    blob = cv2.dnn.blobFromImage(img, 1 / 255, size , [0, 0, 0], 1, crop=False)

    # Set the blob as the input for the DNN model
    net.setInput(blob)
    layersNames = net.getLayerNames()

    # Perform forward pass through the DNN model
    output_layers_idx = net.getUnconnectedOutLayers()[0]-1
    outputNames = [(layersNames[idx-1]) for idx in  net.getUnconnectedOutLayers()]
    #print(outputNames)
    outputs = net.forward(outputNames)

    # Return the output of the DNN model after forward pass
    return outputs


In [10]:
def detectObjects(img, outputs, score_threshold = 0.8, NMS_threshold = 0.5 ):
    """
    This function takes an input image and the output of a YOLOv3 or YOLOv3-tiny DNN model after forward pass,
    detects objects in the image and draws bounding boxes around the objects. It also writes the class label and
    confidence score for each object inside the bounding box.

    Parameters:
        img: numpy.ndarray
        Input image for object detection.

        outputs: numpy.ndarray
        Output of the YOLOv3 or YOLOv3-tiny DNN model after forward pass.
            
        score_threshold: float, optional
            Minimum confidence score required for an object to be considered for detection. Default value is 0.8.
            
        NMS_threshold: float, optional
            Non-maximum suppression threshold for eliminating overlapping bounding boxes. Default value is 0.5.
    
        Returns:
            img: numpy.ndarray
            Input image with bounding boxes and class labels drawn around the detected objects.
    
    """
    # Get the shape of the input image
    hT, wT, cT = img.shape

    # Create empty lists to store the bounding boxes, class IDs and confidence scores for detected objects
    bbox = []
    classIds = []
    confs = []

    # Loop over each output of the DNN model after forward pass
    for output in outputs:
        # Loop over each detection in the output
        for det in output:
        # Extract the class ID, confidence score and bounding box coordinates from the detection
            scores = det[5:]
            classId = np.argmax(scores)
            confidence = scores[classId]
            if confidence > score_threshold:
                w,h = int(det[2]*wT) , int(det[3]*hT)
                x,y = int((det[0]*wT)-w/2) , int((det[1]*hT)-h/2)
                bbox.append([x,y,w,h])
                classIds.append(classId)
                confs.append(float(confidence))

    # Perform non-maximum suppression to eliminate overlapping bounding boxes
    indices = cv2.dnn.NMSBoxes(bbox, confs, score_threshold, NMS_threshold)

    # Loop over each index in the indices list
    for i in indices :
        # Get the bounding box coordinates, class label and confidence score for the current index
        box = bbox[i]
        x, y, w, h = box[0], box[1], box[2], box[3]
        cv2.rectangle(img, (x, y), (x+w,y+h), (255, 0 , 255), 2)
        cv2.putText(img,f'{labels[classIds[i]].upper()} {int(confs[i]*100)}%',
                    (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 0, 255), 2)
        
    #img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # Return the input image with bounding boxes and class labels drawn around the detected objects
    return img

 

In [11]:
# Creating directories to store resulting images
os.makedirs(results_yolov3) 
os.makedirs(results_yolov3_tiny) 

In [12]:
def detect_objects_in_videos_directory(net, videos_path, save_dir, size = (416, 416) , 
                                       score_threshold = 0.5, NMS_threshold = 0.4):
    
    """
    Detects objects in videos in the specified directory using the given model and saves the resulting video files
    to the specified directory.
    
    Args:
    - net: the neural network model to use for object detection
    - videos_path: the path to the directory containing the videos to process
    - save_dir: the path to the directory where the processed videos will be saved
    - size: the size to resize the frames to before passing them to the neural network
    - score_threshold: the confidence threshold below which detected objects will be discarded
    - NMS_threshold: the Non-Maximum Suppression (NMS) threshold for removing overlapping bounding boxes
    
    Returns:
    - None
    """
    
    for video_file in os.listdir(videos_path):
        cap = cv2.VideoCapture(os.path.join(videos_path, video_file))

        width  = int(cap.get(3) )  # get `width` 
        height = int(cap.get(4) )  # get `height` 
        print((width,height))
        save_file = os.path.join(save_dir, video_file[:-4] + ".avi")

        # define an output VideoWriter  object
        out = cv2.VideoWriter(save_file,
                            cv2.VideoWriter_fourcc(*"MJPG"),
                            20,(width,height))

        # Check if the webcam is opened correctly
        if not cap.isOpened():
            print("Error opening video stream or file")

        # Read the video frames
        while cap.isOpened():
            ret, frame = cap.read()

            # If the frame was not read successfully, break the loop
            if not ret:
                print("Error reading frame")
                print((width,height))
                break
            beg = time.time()

            # Capture the video frame
            # by frame
            outputs = preprocess_img_for_detection(net, frame, size)

            # Generate and then overlay the model heatmap for the current frame
            frame = detectObjects(frame, outputs, score_threshold , NMS_threshold  )
            #end = time.time()
            #fps = 1/(end - beg)
            #frame = cv2.putText(frame, f"FPS = {fps}", (20,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
            # append frame to the video file
            out.write(frame)
            
            # the 'q' button is set as the
            # quitting button you may use any
            # desired button of your choice

            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # After the loop release the cap 
        cap.release()
        out.release()



In [14]:
detect_objects_in_videos_directory(net_tiny, test_vids_path, results_yolov3_tiny, size = (416, 416) , 
                                       score_threshold = 0.3, NMS_threshold = 0.4)

(360, 640)
Error reading frame
(360, 640)
(640, 360)
Error reading frame
(640, 360)
(720, 1280)
Error reading frame
(720, 1280)


In [15]:
detect_objects_in_videos_directory(net, test_vids_path, results_yolov3, size = (416, 416) , 
                                       score_threshold = 0.5, NMS_threshold = 0.4)

(360, 640)
Error reading frame
(360, 640)
(640, 360)
Error reading frame
(640, 360)
(720, 1280)
Error reading frame
(720, 1280)


In [19]:
def GIF_from_vid(vid_file, gif_file, fps = 20, skip = 2):

    """
    Creates a GIF file from a video file.

    Parameters:
        vid_file (str): The path to the video file.
        gif_file (str): The path to save the generated GIF file.
        fps (int, optional): The frames per second to be used in the GIF file. Defaults to 20.
        skip (int, optional): The number of frames to skip in the video file. Defaults to 2.

    Returns:
        None
    """
    
    # initialize frame counter
    i = 0

    cap = cv2.VideoCapture(vid_file)
    width  = int(cap.get(3) )  # get `width` 
    height = int(cap.get(4) )  # get `height` 

    # Create a writer object to write the frames to a GIF file
    writer = imageio.get_writer(gif_file, mode='I',fps=fps)

    # Check if the webcam is opened correctly
    if not cap.isOpened():
        print("Error opening video stream or file")


    # Read the video frames
    while cap.isOpened():
        ret, frame = cap.read()

        # If the frame was not read successfully, break the loop
        if not ret:
            print("Error reading frame")
            break

        # Increment the frame counter
        i+=1

        # Skip frames if necessary based on the skip parameter
        if( i % skip == 0):
            continue

        # add current RGB frame to the GIF file
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        writer.append_data(frame)

    # Close the reader and writer objects
    writer.close()
    cap.release()


In [None]:
os.makedirs('gifs/YOLOv3')
os.makedirs('gifs/YOLOv3_tiny')

In [None]:
for video_file in os.listdir(results_yolov3):
    video_path = os.path.join(results_yolov3, video_file)
    save_file = os.path.join('gifs/YOLOv3',video_file[:-4] + '.gif' )
    GIF_from_vid(video_path, save_file, fps = 20, skip = 2)

In [None]:
for video_file in os.listdir(results_yolov3_tiny):
    video_path = os.path.join(results_yolov3_tiny, video_file)
    save_file = os.path.join('gifs/YOLOv3_tiny',video_file[:-4] + '.gif' )
    GIF_from_vid(video_path, save_file, fps = 20, skip = 2)

## Preparing results for download

In [16]:
!zip -r results.zip /content/results

  adding: content/results/ (stored 0%)
  adding: content/results/YOLOv3/ (stored 0%)
  adding: content/results/YOLOv3/dog video 2.avi (deflated 1%)
  adding: content/results/YOLOv3/cat video 2.avi (deflated 2%)
  adding: content/results/YOLOv3/traffic.avi (deflated 0%)
  adding: content/results/YOLOv3_tiny/ (stored 0%)
  adding: content/results/YOLOv3_tiny/dog video 2.avi (deflated 1%)
  adding: content/results/YOLOv3_tiny/cat video 2.avi (deflated 2%)
  adding: content/results/YOLOv3_tiny/traffic.avi (deflated 0%)


In [23]:
!zip -r gifs.zip /content/gifs

  adding: content/gifs/ (stored 0%)
  adding: content/gifs/YOLOv3/ (stored 0%)
  adding: content/gifs/YOLOv3/dog video 2.gif (deflated 0%)
  adding: content/gifs/YOLOv3/cat video 2.gif (deflated 0%)
  adding: content/gifs/YOLOv3/traffic.gif (deflated 1%)
  adding: content/gifs/YOLOv3_tiny/ (stored 0%)
  adding: content/gifs/YOLOv3_tiny/dog video 2.gif (deflated 0%)
  adding: content/gifs/YOLOv3_tiny/cat video 2.gif (deflated 0%)
  adding: content/gifs/YOLOv3_tiny/traffic.gif (deflated 1%)
