# AI traffic Counter Using YOLOv3 and OpenCV
An AI traffic counter is implemented in this project to detect and track vehicles on a video stream and count those going through a defined line on each side of a highway. It utilizes the following two algorithms:
- YOLO to detect objects on each of the video frames.
- SORT to track those objects over different frames.
This project explains implementation of an AI traffic counter using YOLOv3 and OpenCV to count elements in Realtime videos realtime object detection in real time video files. Computer vision is a huge part of the data science/AI domain which is substantially advanced over the last couple of years.


## YOLO algorithm
In recent years, deep learning algorithms are offering cutting-edge improved results for object detection. YOLO algorithm is one of the most popular Convolutional Neural Networks with a single end-to-end model that can perform object detection in real-time. YOLO stands for, You Only Look Once and is an algorithm developed by Joseph Redmon, et al. and first described in the 2015 paper titled “You Only Look Once: Unified, Real-Time Object Detection. The creation of the algorithm stemmed from the idea to place down a grid on the image and apply the image classification and localization algorithm to each of the grids.
Here the YOLOv3, a refined design which uses predefined anchor boxes to improve bounding box, is utilized for object detection in new images. Source code and pre-trained models of YOLOv3 is available in the official DarkNet GitHub repository.


#### Download the Pre-Trained Model
The first step is to download the pre-trained model weights using the DarkNet code base on the COCO dataset and place them into current working directory with the filename “yolov3-tiny.weights”.

- **YOLOv3-tiny Pre-trained Model Weights can be downloaded at** <a href="https://pjreddie.com/media/files/yolov3-tiny.weights" target="_blank">yolov3-tiny.weights(34 MB)</a>.


## SORT Algorithms
Simple Online and Realtime Tracking (SORT) is an implementation of tracking-by-detection framework, in which objects are detected in each frame and information of past and current frames are used to produce object identities on the fly. It is designed for online and real-time tracking applications. SORT was initially described in [this paper](http://arxiv.org/abs/1602.00763) by Alex Bewley et al.


## OpenCV
OpenCV is an open source library which provides tools to perform image and video processing for the computer vision, machine learning, and image processing applications. It is particularly popular for real-time operations which is very important in today’s systems. Integration with various libraries, such as Numpuy and python resulted in great capablities of processing the OpenCV array structures for analysis and mathematical operations.

In [7]:
# import the necessary packages
import numpy as np
import imutils
import time
import cv2
import glob

import warnings
warnings.filterwarnings("ignore")

from sort import *

**Loading the model:**
OpenCV is one of the best computer vision libraries and it has functionalities for running deep learning inference. The OpenCV DNN module supports deep learning inference on images and videos. However, it does not support fine-tuning and training. 

OpenCV DNN module is highly optimized for Intel processors and can achieve high FPS when running inference on real-time videos for object detection and image segmentation applications. Here, a high FPS with the DNN module is carried out using pre-trained YOLOv3 model.

In [12]:
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

conf_threshold = 0.3
nms_threshold = 0.2

Functions to check the intersection with defined lines in the frames

In [13]:
# Return true if line segments AB and CD intersect
def intersect(A,B,C,D):
    return ccw(A,C,D) != ccw(B,C,D) and ccw(A,B,C) != ccw(A,B,D)

def ccw(A,B,C):
    return (C[1]-A[1]) * (B[0]-A[0]) > (B[1]-A[1]) * (C[0]-A[0])

Reading a traffic video stream of a highway by creating a cv2.VideoCapture object which is a class for video capturing from video files, image sequences, or cameras. Then priting the info of total frames in video

Processing a video means, performing operations on the video frame by frame. Frames are nothing but just the particular instance of the video in a single point of time. Therefore, can be treated as an regular image.

The operations on frames started with vs.read() which returns a bool (True/False). If the frame is read correctly, it will be True. So you can check for the end of the video by checking this returned value.

Each frame is processed similar to an image and the object detection steps are similar to what was described in my previous project **Image Object Detector** (see the following github link)
https://github.com/majid-hosseini/Image-Object-Detector

In [16]:
# initialize the video stream, pointer to output video file, and
# frame dimensions
vs = cv2.VideoCapture("input/highway.mp4")
_,frame= vs.read()
(H, W) = frame.shape[:2]

writer = None
frameIndex = 1

# try to determine the total number of frames in the video file
try:
    prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT if imutils.is_cv2() \
        else cv2.CAP_PROP_FRAME_COUNT
    total = int(vs.get(prop))
    print("[INFO] {} total frames in video".format(total))

# an error occurred while trying to determine the total
# number of frames in the video file
except:
    print("[INFO] could not determine # of frames in video")
    print("[INFO] no approx. completion time can be provided")
    total = -1
    
cross_check = []
tracker = Sort()
memory = {}
time_test = {}
time_for_speed = []

dict_id_speed = {}
counter1 = 0
counter2 = 0

[INFO] 812 total frames in video


Defining the two lines on both directions of the highway for tracking of vehicles:

In [None]:
line1 = [(170, int(0.7*H)), (W//2 - 70, int(0.7*H))]
    
line2 = [(W//2+30, int(0.55*H)), (W - 350, int(0.55*H))]


## Implementation of AI traffic counter
**This AI traffic counter is composed of three main components: a detector, tracker and counter.**

- A detector capable of processing a Realtime video to identify vehicles in a given frame of video and returns a list of bounding boxes around the objects was explained in my previous project titled [Realtime Object Detector](https://github.com/majid-hosseini/Realtime-Object-Detector). 
- The tracker uses the bounding boxes to track the vehicles in subsequent frames. The detector is also used to update the trackers periodically to ensure that they are still tracking the vehicles correctly. 
- Once the objects are detected and tracked over different frames of a traffic video stream, a mathematical calculation is applied to count the number of vehicles that their previous and current frame positions intersect with a defined line in the frames.

### Display of Processed Video
The processed frames containing bounding boxed and class names around each detected object is displayed using cv2.imshow method. We specify a window name as a first argument, and the frame we would like to display as a second. The time between consecutive frame in display window is defined by cv.waitKey(). An small value results in a fast video display and a large value produce a show motion video display.

### Saving a Video
We processed a video frame-by-frame and plotted bounding box and the class name around each detected object, now we want to save that video. For images, it is very simple: just use cv.imwrite(). Here, a little more work is required.

This time we create a VideoWriter object. We should specify the output file name (eg: output.avi). Then we should specify the FourCC code. Then number of frames per second (fps) and frame size should be passed. And the last one is the isColor flag. If it is True, the encoder expect color frame, otherwise it works with grayscale frame.

cap.release() and cv2.destroyAllWindows() are the methods to close video files or the capturing device, and destroy the window, which was created by the imshow method.

In [17]:
# loop over frames from the video file stream
while True:
    # read the next frame from the file
    (grabbed, frame) = vs.read()    
    if not grabbed:
        break
    
    # construct a blob from the input frame and then perform a forward
    # pass of the YOLO object detector, giving us our bounding boxes
    # and associated probabilities
    blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (256, 256), swapRB=True, crop=False)
    net.setInput(blob)
    start = time.time()
    layerOutputs = net.forward(ln)
    end = time.time()
    
    # initialize our lists of detected bounding boxes, confidences,
    # and class IDs, respectively
    boxes = []
    center = []
    confidences = []
    classIDs = []
    
    # loop over each of the layer outputs
    for output in layerOutputs:
        # loop over each of the detections
        for detection in output:
            # extract the class ID and confidence (i.e., probability)
            # of the current object detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
    
            # filter out weak predictions by ensuring the detected
            # probability is greater than the minimum probability
            if confidence > conf_threshold:
                # scale the bounding box coordinates back relative to
                # the size of the image, keeping in mind that YOLO
                # actually returns the center (x, y)-coordinates of
                # the bounding box followed by the boxes' width and
                # height
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype("int")
                    
                # use the center (x, y)-coordinates to derive the top
                # and and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))
    
                # update our list of bounding box coordinates,
                # confidences, and class IDs
                center.append(int(centerY))
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)
                    
    # apply non-maxima suppression to suppress weak, overlapping bounding boxes
    idxs = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)
        
    dets = []
    if len(idxs) > 0:
        # loop over the indexes we are keeping
        for i in idxs.flatten():
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
            dets.append([x, y, x+w, y+h, confidences[i]])
    np.set_printoptions(formatter={'float': lambda x: "{0:0.3f}".format(x)})
    dets = np.asarray(dets)
    tracks = tracker.update(dets)
        
    boxes = []
    indexIDs = []
    c = []
        
    previous = memory.copy()
    memory = {}
    
    for track in tracks:
        boxes.append([track[0], track[1], track[2], track[3]])
        indexIDs.append(int(track[4]))
        memory[indexIDs[-1]] = boxes[-1]
    
    for i in range(len(boxes)):
        box = boxes[i]
        
        # extract the bounding box coordinates
        (x, y) = (int(box[0]), int(box[1]))
        (w, h) = (int(box[2]), int(box[3]))
            
        # draw a bounding box rectangle and label on the image
        color = (0,100,255)
        cv2.rectangle(frame, (x, y), (w, h), color, 2)
        
        if indexIDs[i] in previous:
            previous_box = previous[indexIDs[i]]
            (x2, y2) = (int(previous_box[0]), int(previous_box[1]))
            (w2, h2) = (int(previous_box[2]), int(previous_box[3]))
            p0 = (int(x + (w-x)/2), int(y + (h-y)/2))
            p1 = (int(x2 + (w2-x2)/2), int(y2 + (h2-y2)/2))
            #cv2.line(frame, p0, p1, color, 3)
            id = indexIDs[i] 
            
            #########################################################
            label = str(classes[classIDs[i]])
            font = cv2.FONT_HERSHEY_PLAIN
            cv2.putText(frame, label, (x, y - 5), font, 1.0, color, 2)
            if intersect(p0, p1, line1[0], line1[1]) and indexIDs[i] not in cross_check:
                counter1 += 1
                cross_check.append(indexIDs[i])

            if intersect(p0, p1, line2[0], line2[1]) and indexIDs[i] not in cross_check:
                counter2 += 1
                cross_check.append(indexIDs[i])
                
    cv2.line(frame, line1[0], line1[1], (0, 0, 255), 3)
    cv2.line(frame, line2[0], line2[1], (0, 210, 255), 3)
    ##############################################
    # draw counter
    counter_text = "Counter:{}".format(counter1)
    counter_text_2 = "Counter:{}".format(counter2)
    font = cv2.FONT_HERSHEY_TRIPLEX
    cv2.putText(frame, counter_text, (100,100), font, 2, (0, 0, 255), 4)
    cv2.putText(frame, counter_text_2, (H-10,100), font, 2, (0, 210, 255), 4)
    cv2.imshow("Image", frame)
        
    key = cv2.waitKey(1)
    if key == 27:
        break
            
    # check if the video writer is None
    if writer is None:
        # initialize our video writer
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        writer = cv2.VideoWriter("output.mp4", fourcc, 30,(1080, 720), True)

        # some information on processing single frame
        if total > 0:
            elap = (end - start)
            print("[INFO] single frame took {:.4f} seconds".format(elap))
            print("[INFO] estimated total time to finish: {:.4f}".format(
                elap * total))

    # write the output frame to disk
    #new_dim = (1080,720)
    new_dim = (W,H)
    writer.write(cv2.resize(frame,new_dim, interpolation = cv2.INTER_AREA))

    # increase frame index
    frameIndex += 1


print("[INFO] cleaning up...")
writer.release()
vs.release()
cv2.destroyAllWindows()

[INFO] single frame took 3.6262 seconds
[INFO] estimated total time to finish: 2944.4531
[INFO] cleaning up...
