### Where does Yolo come from? Background
To understand "Yolo" you must first know about computer vision and a little bit about it's history
The art of making computer look at things and recognize, detect and diff is called computer vision
it has been one of the most difficult attribute of learning for computers despite the fact that humans excel at it without much effort.
There had been many efforts on the past to teach computers understanding the art of looking and interpreting things since the emergence of A.I in 1960s

### Why Yolo?
There are many algorithms used for computer vision so why do we need YOLO?
In the field of computer vision other algorithms performed very well but problem occured at the real time object detection or recognition as in cases of autonomous vehicle, this is where YOLO shines.
YOLO is a state of the art computer vision algorithm that is faster then the fastest algorithms out there and doesn't fall behind in the accuracy field either

### How does it work?
unlike previous attempts at Object detection algorithms used to process images thousands of time to make a prediction which at the end was quite time taking and also computationally expensive. Instead what yolo does is passes the images through algo just once as it's name says "You Only Look Once".
- What it does is divides the image into grids of small cells
- each cell predicts if some object is in there with a certain confidence level 
- then where is it
- what class does it belong to <br />
and then we set some threshold to drop the predictions with low confidences

#### Prediction validation
Yolo uses intersection over union approach to decide whether the prediction was right or not
i.e the bounding boxes of prediction and actual target value are used to calculate Intersction over union 
IoU = Area of Intersection/ Area of Union <br />
A certain threshold either discards or keeps the prediction

### Non-Maximum Suppression
In CV algorithms have a tendency of detecting an object not just once but many more times creating more than one bounding boxes around a single object, in YOLO nms is used to overcome that
How nms works is :
1. It looks at all the boxes around a single object and selects the one with heighest probability.
2. The boxes which have high IoU with the current box are suppressed.
3. Then it selects the one with next heighest probability/confidence.
4. Recursive

### Anchor boxes
As discussed above if we went and applied that way on a picture that has two objects in a same grid it will only 
return one bounding box which isn't the case in reality hence anchor boxes come into play
We define number of anchor boxes and so we get that many outputs outta each grid
<br />
![no anchor](no_anchor.png)
![2 anchors used ](anchor.png)

Here,
- pc defines whether an object is present in the grid or not (it is the confidence score)
- bx, by, bh, bw specify the bounding box if there is an object( x,y being the center, h and w being height and width)
- c1, c2, c3 represent the classes. incase of three classes

### Flattening 
Flatten means that anything greater than 1 dimension must be convert to 1D.

By now you should have understood what YOLO is and how it works, let's get to the coding now

In [1]:
import cv2 as cv
import numpy as np
import csv

In [2]:
#Write down conf, nms thresholds,inp width/height
confThreshold = 0.25
nmsThreshold = 0.40
inpWidth = 416
inpHeight = 416

#Load names of classes and turn that into a list
classesFile = "yolov3"
classes = None

with open(classesFile,'rt') as f:
    classes = f.read().rstrip('\n').split('\n')

#Model configuration
modelConf = 'yolov3.cfg'
modelWeights = 'yolov3.weights'

#Set up the net
net = cv.dnn.readNetFromDarknet(modelConf, modelWeights)
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)

#Process inputs
winName = 'DL OD with OpenCV'
cv.namedWindow(winName, cv.WINDOW_NORMAL)

In [3]:
import csv
def writeInCsv(imname, count):
    with open("result.csv",'a') as output:
        output_data = csv.writer(output,delimiter = ',')
        output_data.writerow([imname,count])

In [4]:
def postprocess(frame, outs):
    frameHeight = frame.shape[0]
    frameWidth = frame.shape[1]

    classIDs = []
    confidences = []
    boxes = []
    count = 0

    for out in outs:
        for detection in out:         
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
            if confidence > confThreshold:
                centerX = int(detection[0] * frameWidth)
                centerY = int(detection[1] * frameHeight)
                width = int(detection[2]* frameWidth)
                height = int(detection[3]*frameHeight)
                left = int(centerX - width/2)
                top = int(centerY - height/2)
                if classes[classID] == 'person':
                    classIDs.append(classID)
                    confidences.append(float(confidence))
                    boxes.append([left, top, width, height])

    indices = cv.dnn.NMSBoxes (boxes,confidences, confThreshold, nmsThreshold )

    indices = cv.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)
    for i in indices:
        i = i[0]
        box = boxes[i]
        left = box[0]
        top = box[1]
        width = box[2]
        height = box[3]
        count += 1
        drawPred(classIDs[i], confidences[i], left, top, left + width, top + height,count)
    return count

def drawPred(classId, conf, left, top, right, bottom,count):
    # Draw a bounding box.
    cv.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3)
    label = '%.2f' % conf

    # Get the label for the class name and its confidence
    if classes:
        assert (classId < len(classes))
        label = '%s:%s:%s' % (classes[classId], label,count)

    cv.putText(frame, label, (left,top), cv.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 1)

def getOutputsNames(net):
    # Get the names of all the layers in the network
    layersNames = net.getLayerNames()
   
    # Get the names of the output layers, i.e. the layers with unconnected outputs
    return [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
                
fileToInfer = input("Enter the input type \nV for video \nI for images \n0 for webcam")
frame_num = 0
if fileToInfer == '0':
    cap = cv.VideoCapture(0)
    frame_width = int(cap.get(3))
    frame_height = int(cap.get(4))
    while cv.waitKey(1) < 0:
        ret, frame = cap.read()
        if ret == False:
            print("video not captured")
            break
        else:
            blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop = False)

            net.setInput(blob)
            outs = net.forward(getOutputsNames(net))

            count = postprocess(frame, outs)
            #show the image
            cv.imshow(winName, frame)
            imgName = "web_%s"%frame_num
            frame_num += 1
            writeInCsv(imgName, count)
elif fileToInfer == 'I':
    imgName = input("Image Name: ")
    frame = cv.imread(imgName)
    blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop = False)

    net.setInput(blob)
    outs = net.forward (getOutputsNames(net))

    count = postprocess(frame, outs)
    #show the image
    cv.imshow(winName, frame)
    writeInCsv(imgName, count)
else:
    vidName = input("Video Name: ")
    cap = cv.VideoCapture(vidName)
    frame_width = int(cap.get(3))
    frame_height = int(cap.get(4))
    while cv.waitKey(1) < 0:
        ret, frame = cap.read()
        if ret == False:
            print("video not captured")
            break
        else:
            blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop = False)

            net.setInput(blob)
            outs = net.forward (getOutputsNames(net))

            count = postprocess (frame, outs)
            #show the image
            cv.imshow(winName, frame)
            imgName = "%s:%s"%(vidName,frame_num)
            frame_num += 1
            writeInCsv(imgName, count)
cv.waitKey(0)
cv.destroyAllWindows()

Enter the input type 
V for video 
I for images 
0 for webcamI
Image Name: q


error: OpenCV(4.1.2) C:\projects\opencv-python\opencv\modules\imgproc\src\resize.cpp:3720: error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'
