# Simple overview for using YOLOv3 model

YOLOv3 is an object detection algorithm designed to operate in real time. The current version of yolov3 uses pytorch as a backend so you will need to have *torch* and *torchvision* installed, as well as *imageAI*. If you want to just get this model running feel free to skip but I will cover some of the background knowledge as to why and how YOLO works.

YOLO is based on convolutional layers of a neural network. These layers convolve the input with kernels to learn features. For example:

$$\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

This kernel will detect features that follow a line from the top left of the image to the bottom right. YOLO is special because it only uses 1x1 convolutions, meaning all the kernels are 1x1 sized. 

The exact method in which YOLO detects objects is as follows:
The algorithm separates an image into a grid. Bounding boxes are determined by looking at the image with various windows and trying to associate pixels together. The bounding boxes get a score that represents how accurately each class describes what is visible in that window. Yolo is not recursive, it divides the image into a grid and each cell predicts a handful of boxes. This approach means that classification and detection are performed at the same time. Only the bounding boxes with highest confidence scores are displayed.

## Retraining your YOLO model

I'm not going to go super in depth on how to retrain your YOLO model, but know that you can do this to have it suit your own dataset more precisely. This is going to be very computationally intensive and you will need roughly 1000 images per class you want your class to recognize (and run data augmentation on this afterwards to make your model more robust). Data augmentation can include rotate, flip or zoom manipulations of your dataset. This will increase the transferability of your model. A link to an in-depth guide is here: https://imageai.readthedocs.io/en/latest/custom/index.html so follow that if you want to know more.



## What is our example?

The first cell will hold an example that takes your computer's inbuilt camera and classifies that datafeed as well as displays it. You will need the *.pt* pretrained model downloaded to your computer as we don't want to train a model from scratch, simply use it. *OpenCV* is also a requirement here. The *.h5* file that is also in this folder is for previous models of the YOLOv3 network that ran on a Keras/TensorFlow backend but provided for completeness.

In [2]:
from imageai.Detection import ObjectDetection
import cv2

obj_detect = ObjectDetection()
obj_detect.setModelTypeAsYOLOv3()
#obj_detect.setModelPath(r"C:\Users\ikin5\Desktop\PhD\extra learning\YOLO\yolov3.pt")
obj_detect.setModelPath('path to .pt pretrained model')
obj_detect.loadModel()


cam_feed = cv2.VideoCapture(0) #set input pipeline to be camera feed
cam_feed.set(cv2.CAP_PROP_FRAME_WIDTH, 650) #frame width and height, you can change these
cam_feed.set(cv2.CAP_PROP_FRAME_HEIGHT, 750)

while True:    #this will run infinitely if you let it
    ret, img = cam_feed.read()   
    annotated_image, preds = obj_detect.detectObjectsFromImage(input_image=img,
                      output_type="array",
                      display_percentage_probability=False,
                      display_object_name=True)
    #Each frame is detected individually, with a good GPU this can be reasonable FPS 
    #but on a laptop with only CPU I only get around 8 fps, just keep this in mind

    cv2.imshow("", annotated_image)     
    
    if (cv2.waitKey(1) & 0xFF == ord("q")) or (cv2.waitKey(1)==27):   #the escape key for this program is q
        break

cam_feed.release()
cv2.destroyAllWindows()

## Same but now from a stored file

Just make sure the paths are correct, this adds bounding boxes and classifications to the file specified by input_file and stores the output in output_file. The file *input_video_yol.mp4* is provided to use as a test case.

In [None]:
from imageai.Detection import VideoObjectDetection

vid_obj_detect = VideoObjectDetection()

vid_obj_detect.setModelTypeAsYOLOv3()

vid_obj_detect.setModelPath(r"C:/Datasets/yolo.h5")
vid_obj_detect.loadModel()

input_file='path to input'
output_file='path to output'

detected_vid_obj = vid_obj_detect.detectObjectsFromVideo(
    input_file_path=input_file,
    output_file_path = output_file,
    frames_per_second=15,
    log_progress=True,
    return_detected_frame = True,
)

print(detected_vid_obj)