# ECSE415 - Project
Kalvin Khuu (260983539)

## Setting up

In [1]:
mcgill_input_video_path = 'C:/Users/Kalvin/Documents/GitHub/ecse415/project/mcgill_drive.mp4'
st_cat_input_video_path = 'C:/Users/Kalvin/Documents/GitHub/ecse415/project/st-catherines_drive.mp4'

output_video_path = 'C:/Users/Kalvin/Documents/GitHub/ecse415/project/deepsort_output_video.mp4'

## Object Detection & Tracking

### YOLOv8 with DeepSORT

Source: https://github.com/Ikomia-dev/notebooks/blob/main/examples/HOWTO_use_DeepSORT_with_Ikomia_API.ipynb 


**Setting up** 

Using the Ikomia API, the implementation takes the YOLOv8 model for object tracking and DeepSort for object tracking.

- YOLOv8 (You Only Look Once): It utilizes a single convolutional neural network to predict multiple bounding boxes and class probabilities simultaneously across an image divided into a grid. YOLOv8 is expected to incorporate advanced features like Feature Pyramid Networks, sophisticated data augmentation, and efficiency optimizations to enhance accuracy and speed, making it suitable for applications requiring immediate detection responses such as video surveillance and autonomous driving.

- DeepSORT (Simple Online and Realtime Tracking with a Deep Association Metric): It integrates a deep learning feature extractor for appearance descriptors, enhancing its ability to track objects across frames despite changes in orientation, scale, and occlusion. DeepSORT employs a Kalman filter for state estimation and a Hungarian algorithm for assignment optimization, significantly boosting tracking reliability and performance in complex dynamic environments.

With both of these models, it would make it possible to keep track of objects even after something blocked the direct view from it. By doing so, it prevents the implementation from repetitive counting.

Parameters:
- "categories": "car, person, bicycle, traffic light, stop sign, parking meter" (Only detecting and tracking certain objects to save computing power)
- "conf_thres": "0.5" (Setting the confidence percentage relatively high to reduce possible double counting)
- "device": "gpu"

For each element, the program needs different confidence values (cars with 70% while person and bicycle with 50%).

**Assumption**

For this implementation, pedestrian and cyclist must be differentiated. However, YOLOv8 can not differentiate itself. Therefore, the assumption is that a cyclist is defined when a person area intersects with a bicycle area.

**Limitations**

As mentionned in the assumption section, a cyclist is defined when a person intersects with a bicycle. This means that a person walking in front or behind a bicycle would be defined as a cyclist. 
Even with the higher threshold of confidence, this implementation might have some occasional double counting.

## Classifying parked and moving cars

### K-means clustering with Farneback method

Source: https://www.geeksforgeeks.org/opencv-the-gunnar-farneback-optical-flow/

To classify parked and moving cars from a dash cam video, the main idea would be to use the optical flow features of each object calculated using Farneback method and cluster them in a K-mean clustering model with K = 2.

**Setting up**
Using the function of OpenCV, calcOpticalFlowFarneback, and the python class of sklearn, KMeans

- Farneback Algorithm: It uses polynomial expansion to approximate neighborhood pixels, enabling the calculation of displacements and motion vectors efficiently. This technique is particularly useful in video processing and computer vision applications for detecting and analyzing motion across frames, such as in surveillance and activity recognition systems.
- K-Means : It works by assigning data points to the nearest cluster center, recalculating cluster centers as the mean of the assigned points, and iteratively refining the process until convergence. K-means is favored for its simplicity and efficiency in processing large datasets, making it suitable for a variety of applications ranging from market segmentation to image compression.

By identifying the magnitude and the angles of an object for multiple frames, the idea would be see two clusters of objects with K = 2, one with immobile objects such as parked cars and traffic lights and one with moving object (cars).
For the sake of computing time/power, this implementation downsize the number of magnitudes and angles for each objects (0.5 for the McGill video and 0.2 for the St-Catherines video)

**Assumption**
As the car mostly moves at a constant speed, it is possible to refer to immobile objects to determine which cars are parked and which ones are moving. 
As the size of each results of magnitude and angle of the optical flow is different between objects, we need to preprocess them similar to words (https://stats.stackexchange.com/questions/127484/cluster-sequences-of-data-with-different-length).

**Limitations**
Pretty high computational power/time required for this step and as the sample pool is relatively small, two videos of 40 seconds, this model can definitly overfit or underfit some features.


In [2]:
!pip install ikomia

Collecting ikomia
  Using cached ikomia-0.10.0-cp310-none-win_amd64.whl (74.7 MB)
Collecting setuptools==59.5.0
  Using cached setuptools-59.5.0-py3-none-any.whl (952 kB)
Collecting pyyaml
  Using cached PyYAML-6.0.1-cp310-cp310-win_amd64.whl (145 kB)
Collecting python-dotenv>=0.18.0
  Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Collecting semver<4.0,>=3.0.1
  Using cached semver-3.0.2-py3-none-any.whl (17 kB)
Collecting mlflow==1.30.0
  Using cached mlflow-1.30.0-py3-none-any.whl (17.0 MB)
Collecting Flask<3
  Downloading flask-2.3.3-py3-none-any.whl (96 kB)
     ---------------------------------------- 0.0/96.1 kB ? eta -:--:--
     ---------------------------------------- 96.1/96.1 kB 2.8 MB/s eta 0:00:00
Collecting databricks-cli<1,>=0.8.7
  Using cached databricks_cli-0.18.0-py2.py3-none-any.whl (150 kB)
Collecting pytz<2023
  Using cached pytz-2022.7.1-py2.py3-none-any.whl (499 kB)
Collecting entrypoints<1
  Using cached entrypoints-0.4-py3-none-any.whl (5.3 kB)
Col

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython-sql 0.5.0 requires sqlalchemy>=2.0, but you have sqlalchemy 1.4.52 which is incompatible.

[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
# HELPER FUNCTIONS

def intersects(box1, box2):
    # Extract coordinates from boxes
    x1, y1, w1, h1 = box1
    x2, y2, w2, h2 = box2
    # Calculate the coordinates of the intersection rectangle
    x_intersection = max(x1, x2)
    y_intersection = max(y1, y2)
    w_intersection = min(x1 + w1, x2 + w2) - x_intersection
    h_intersection = min(y1 + h1, y2 + h2) - y_intersection
    # Calculate the area of the intersection rectangle
    area_intersection = w_intersection * h_intersection
    # Calculate the area of the two boxes
    area_box1 = w1 * h1
    area_box2 = w2 * h2
    # Calculate the area of the union of the two boxes
    area_union = area_box1 + area_box2 - area_intersection
    # Calculate the intersection over union (IoU)
    iou = area_intersection / area_union
    return iou > 0


# Finding cyclists
def is_cyclist(bicycle_box, pedestrian_boxes):
    for pedestrian_box in pedestrian_boxes:
        # Check if the bicycle box intersects with any pedestrian box
        if intersects(bicycle_box, pedestrian_box):
            return True
    return False


### Detection and Tracking of objects for the McGill Video

In [69]:
# To detect unique objects and their Optical Flow
from ikomia.dataprocess.workflow import Workflow
import cv2
import numpy as np

# Init your workflow
wf = Workflow()

# Add object detection algorithm
detector = wf.add_task(name="infer_yolo_v8", auto_connect=True)

# Add ByteTrack tracking algorithm
tracking = wf.add_task(name="infer_deepsort", auto_connect=True)

detector.set_parameters({
    "categories": "all",
    "conf_thres": "0.5",
    "device": "gpu",
})


tracking.set_parameters({
    "categories": "all",
    "conf_thres": "0.5",
    "device": "gpu",
})

# Open the video file
stream = cv2.VideoCapture(mcgill_input_video_path)
if not stream.isOpened():
    print("Error: Could not open video.")
    exit()

# Sets for each unique object
unique_cars = set()
unique_pedestrians = set()

unique_cylists = set()
unique_parked_cars = set()
other_objects = set()

# Optical Flow
ret, prev_frame = stream.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

all_car_mags_mcgill = {}
all_car_angles_mcgill = {}

all_immobile_objects_mags_mcgill = {}
all_immobile_objects_angles_mcgill = {}

while True:
    # Read image from stream
    ret, frame = stream.read()

    # Test if the video has ended or there is an error
    if not ret:
        print("Info: End of video or error.")
        break

    # Run the workflow on current frame
    wf.run_on(array=frame)

    # Get results
    image_out = tracking.get_output(0)
    obj_detect_out = tracking.get_output(1)
    detected_objects = obj_detect_out.get_objects()

    # # Optical Flow
    # gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
    # magnitude, angle = cv2.cartToPolar(flow[..., 0], flow[..., 1])

    # # Used for figuring out cyclists
    # persons_found = []
    # for obj in detected_objects:
    #     id = obj.id
    #     label = obj.label
    #     confidence = obj.confidence
    #     bbox = obj.box

    #     if label == "car" and confidence > 0.7:
    #         unique_cars.add(id)

    #         # Extract the patch of the object
    #         mag_patch = magnitude[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
    #         angle_patch = angle[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
    #         if id not in all_car_mags_mcgill:
    #             all_car_mags_mcgill[id] = []
    #             all_car_angles_mcgill[id] = []
    #         # Downsample the patches to reduce the size of the data
    #         all_car_mags_mcgill[id].append(cv2.resize(mag_patch, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_LINEAR))
    #         all_car_angles_mcgill[id].append(cv2.resize(angle_patch, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_LINEAR))
            
    #     elif label == "person" and confidence > 0.5:
    #         unique_pedestrians.add(id)
    #         persons_found.append(bbox)
    #     elif label == "bicycle" and is_cyclist(bbox, persons_found) and confidence > 0.5:
    #         unique_cylists.add(id)
    #     elif label == "traffic light":
    #         other_objects.add(id)

    #         # Extract the patch of the object
    #         mag_patch = magnitude[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
    #         angle_patch = angle[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
    #         if id not in all_immobile_objects_mags_mcgill:
    #             all_immobile_objects_mags_mcgill[id] = []
    #             all_immobile_objects_angles_mcgill[id] = []
    #         all_immobile_objects_mags_mcgill[id].append(cv2.resize(mag_patch, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_LINEAR))
    #         all_immobile_objects_angles_mcgill[id].append(cv2.resize(angle_patch, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_LINEAR))
    img_out = image_out.get_image_with_graphics(obj_detect_out)
    img_res = cv2.cvtColor(img_out, cv2.COLOR_RGB2BGR)
    out.write(img_out)

            
    # Press 'q' to quit the video processing
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# After the loop release everything
stream.release()
cv2.destroyAllWindows()


0: 384x640 5 cars, 55.0ms
Speed: 2.0ms preprocess, 55.0ms inference, 5.0ms postprocess per image at shape (1, 3, 384, 640)
Successfully loaded imagenet pretrained weights from "C:\Users\Kalvin\Ikomia/Plugins/Python\infer_deepsort\models\checkpoints\osnet_x1_0_imagenet.pth"
Workflow Untitled run successfully in 1629.8303999999998 ms.

0: 384x640 4 cars, 18.5ms
Speed: 2.5ms preprocess, 18.5ms inference, 2.5ms postprocess per image at shape (1, 3, 384, 640)
Workflow Untitled run successfully in 61.476600000000005 ms.

0: 384x640 5 cars, 19.1ms
Speed: 2.0ms preprocess, 19.1ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)
Workflow Untitled run successfully in 61.0761 ms.

0: 384x640 5 cars, 40.1ms
Speed: 2.0ms preprocess, 40.1ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)
Workflow Untitled run successfully in 121.27250000000001 ms.

0: 384x640 5 cars, 33.1ms
Speed: 3.0ms preprocess, 33.1ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 6

In [70]:
print("=== RESULTS FOR MCGILL_DRIVE.MP4 ===")
print("\nTotal people (pedestrians and cyclists)")
print(len(unique_pedestrians))
print("\nCyclists")
print(len(unique_cylists))
print("\nPedestrians (total number of people - number of cyclists)")
print(len(unique_pedestrians) - len(unique_cylists))
print("\nTotal Cars")
print(len(unique_cars))



=== RESULTS FOR MCGILL_DRIVE.MP4 ===

Total people (pedestrians and cyclists)
42

Cyclists
7

Pedestrians (total number of people - number of cyclists)
35

Total Cars
55


### Detection and Tracking of objects for the St-Catherine Video

In [71]:
# To detect unique objects and their Optical Flow
from ikomia.dataprocess.workflow import Workflow
# from ikomia.utils.displayIO import display

import cv2
import numpy as np

# Init your workflow
wf = Workflow()

# Add object detection algorithm
detector = wf.add_task(name="infer_yolo_v8", auto_connect=True)

# Add ByteTrack tracking algorithm
tracking = wf.add_task(name="infer_deepsort", auto_connect=True)

detector.set_parameters({
    "categories": "all",
    "conf_thres": "0.5",
    "device": "gpu",
})


tracking.set_parameters({
    "categories": "all",
    "conf_thres": "0.5",
    "device": "gpu",
})

# Open the video file
stream = cv2.VideoCapture('C:/Users/Kalvin/Documents/GitHub/ecse415/project/st-catherines_drive.mp4')
if not stream.isOpened():
    print("Error: Could not open video.")
    exit()

# # Get video properties for the output
# frame_width = int(stream.get(cv2.CAP_PROP_FRAME_WIDTH))
# frame_height = int(stream.get(cv2.CAP_PROP_FRAME_HEIGHT))
# frame_rate = stream.get(cv2.CAP_PROP_FPS)
# # Define the codec and create VideoWriter object
# # The 'XVID' codec is widely supported and provides good quality
# fourcc = cv2.VideoWriter_fourcc(*'XVID')
# out = cv2.VideoWriter('deepsort_output_video.avi', fourcc, frame_rate, (frame_width, frame_height))


# Sets for each unique object
unique_cars = set()
unique_pedestrians = set()

unique_cylists = set()
unique_parked_cars = set()
other_objects = set()

# Optical Flow
ret, prev_frame = stream.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

all_car_mags_st_cat = {}
all_car_angles_st_cat = {}

all_immobile_objects_mags_st_cat = {}
all_immobile_objects_angles_st_cat = {}

while True:
    # Read image from stream
    ret, frame = stream.read()

    # Test if the video has ended or there is an error
    if not ret:
        print("Info: End of video or error.")
        break

    # Run the workflow on current frame
    wf.run_on(array=frame)

    # Get results
    image_out = tracking.get_output(0)
    obj_detect_out = tracking.get_output(1)

    detected_objects = obj_detect_out.get_objects()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
    magnitude, angle = cv2.cartToPolar(flow[..., 0], flow[..., 1])

    # all persons
    persons_found = []
    for obj in detected_objects:
        id = obj.id
        label = obj.label
        confidence = obj.confidence
        bbox = obj.box

        if label == "car" and confidence > 0.7:
            unique_cars.add(id)
            mag_patch = magnitude[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
            angle_patch = angle[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
            if id not in all_car_mags_st_cat:
                all_car_mags_st_cat[id] = []
                all_car_angles_st_cat[id] = []
            # Downsample the patches to reduce the size of the data
            all_car_mags_st_cat[id].append(cv2.resize(mag_patch, None, fx=0.2, fy=0.2, interpolation=cv2.INTER_LINEAR))
            all_car_angles_st_cat[id].append(cv2.resize(angle_patch, None, fx=0.2, fy=0.2, interpolation=cv2.INTER_LINEAR))
            
        elif label == "person" and confidence > 0.5:
            unique_pedestrians.add(id)
            persons_found.append(bbox)
        elif label == "bicycle"  and is_cyclist(bbox, persons_found) and confidence > 0.5:
            unique_cylists.add(id)
        elif label == "traffic light":
            other_objects.add(id)
            mag_patch = magnitude[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
            angle_patch = angle[int(bbox[1]):int(bbox[1]+bbox[3]), int(bbox[0]):int(bbox[0]+bbox[2])]
            if id not in all_immobile_objects_mags_st_cat:
                all_immobile_objects_mags_st_cat[id] = []
                all_immobile_objects_angles_st_cat[id] = []
            all_immobile_objects_mags_st_cat[id].append(cv2.resize(mag_patch, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_LINEAR))
            all_immobile_objects_angles_st_cat[id].append(cv2.resize(angle_patch, None, fx=0.5, fy=0.5, interpolation=cv2.INTER_LINEAR))
    
    # # Convert the result to BGR color space for displaying
    # img_out = image_out.get_image_with_graphics(obj_detect_out)
    # img_res = cv2.cvtColor(img_out, cv2.COLOR_RGB2BGR)

    # # Save the resulting frame
    # out.write(img_out)

    # # Display
    # display(img_res, title="DeepSORT", viewer="opencv")
            
    # Press 'q' to quit the video processing
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# After the loop release everything
stream.release()
cv2.destroyAllWindows()


0: 384x640 3 persons, 7 cars, 1 truck, 25.1ms
Speed: 3.0ms preprocess, 25.1ms inference, 2.1ms postprocess per image at shape (1, 3, 384, 640)
Successfully loaded imagenet pretrained weights from "C:\Users\Kalvin\Ikomia/Plugins/Python\infer_deepsort\models\checkpoints\osnet_x1_0_imagenet.pth"
Workflow Untitled run successfully in 726.9818 ms.

0: 384x640 3 persons, 5 cars, 1 truck, 22.8ms
Speed: 3.0ms preprocess, 22.8ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)
Workflow Untitled run successfully in 77.8538 ms.

0: 384x640 3 persons, 5 cars, 1 truck, 20.1ms
Speed: 3.0ms preprocess, 20.1ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)
Workflow Untitled run successfully in 78.02799999999999 ms.

0: 384x640 3 persons, 5 cars, 1 truck, 63.7ms
Speed: 4.0ms preprocess, 63.7ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)
Workflow Untitled run successfully in 253.8712 ms.

0: 384x640 4 persons, 5 cars, 1 truck, 23.9ms
Speed: 2.0ms p

In [73]:
print("=== RESULTS FOR ST-CATHERINES.MP4 ===")
print("\nTotal people (pedestrians and cyclists)")
print(len(unique_pedestrians))
print("\nCyclists")
print(len(unique_cylists))
print("\nPedestrians (total number of people - number of cyclists)")
print(len(unique_pedestrians) - len(unique_cylists))
print("\nTotal Cars")
print(len(unique_cars))

=== RESULTS FOR ST-CATHERINES.MP4 ===

Total people (pedestrians and cyclists)
121

Cyclists
2

Pedestrians (total number of people - number of cyclists)
119

Total Cars
80


#### Classifying parked and moving cars for the both videos

In [74]:
# McGill Drive
# Preprocess the data (stacking and reshaping)
# https://stats.stackexchange.com/questions/127484/cluster-sequences-of-data-with-different-length

equal_car_flows_mcgill = list()
for key, _ in all_car_mags_mcgill.items():
    # reshape the list of arrays to a single array
    all_mag_values = np.concatenate([all_car_mags_mcgill[key][i].ravel() for i in range(len(all_car_mags_mcgill[key]))])
    all_angle_values = np.concatenate([all_car_angles_mcgill[key][i].ravel() for i in range(len(all_car_angles_mcgill[key]))])
    equal_car_flows_mcgill.append((key, np.stack((all_mag_values, all_angle_values), axis=1)))

all_equal_car_flows_mcgill = list()
for i in range(len(equal_car_flows_mcgill)):
    array = equal_car_flows_mcgill[i][1]
    all_equal_car_flows_mcgill.append(str(','.join(str(e) for e in ['x' + str(e) for e in array])))

equal_immobile_object_flows_mcgill = list()
for key, _ in all_immobile_objects_mags_mcgill.items():
    all_mag_values = np.concatenate([all_immobile_objects_mags_mcgill[key][i].ravel() for i in range(len(all_immobile_objects_mags_mcgill[key]))])
    all_angle_values = np.concatenate([all_immobile_objects_angles_mcgill[key][i].ravel() for i in range(len(all_immobile_objects_angles_mcgill[key]))])
    equal_immobile_object_flows_mcgill.append((key, np.stack((all_mag_values, all_angle_values), axis=1)))

all_equal_immobile_object_flows_mcgill = list()
for i in range(len(equal_immobile_object_flows_mcgill)):
    array = equal_immobile_object_flows_mcgill[i][1]
    all_equal_immobile_object_flows_mcgill.append(str(','.join(str(e) for e in ['x' + str(e) for e in array])))

In [75]:
# St-Catherines Drive
# Preprocess the data (stacking and reshaping)
# https://stats.stackexchange.com/questions/127484/cluster-sequences-of-data-with-different-length

equal_car_flows_st_cat = list()
for key, _ in all_car_mags_st_cat.items():
    # reshape the list of arrays to a single array
    all_mag_values = np.concatenate([all_car_mags_st_cat[key][i].ravel() for i in range(len(all_car_mags_st_cat[key]))])
    all_angle_values = np.concatenate([all_car_angles_st_cat[key][i].ravel() for i in range(len(all_car_angles_st_cat[key]))])
    equal_car_flows_st_cat.append((key, np.stack((all_mag_values, all_angle_values), axis=1)))

all_equal_car_flows_st_cat = list()
for i in range(len(equal_car_flows_st_cat)):
    array = equal_car_flows_st_cat[i][1]
    all_equal_car_flows_st_cat.append(str(','.join(str(e) for e in ['x' + str(e) for e in array])))

equal_immobile_object_flows_st_cat = list()
for key, _ in all_immobile_objects_mags_st_cat.items():
    all_mag_values = np.concatenate([all_immobile_objects_mags_st_cat[key][i].ravel() for i in range(len(all_immobile_objects_mags_st_cat[key]))])
    all_angle_values = np.concatenate([all_immobile_objects_angles_st_cat[key][i].ravel() for i in range(len(all_immobile_objects_angles_st_cat[key]))])
    equal_immobile_object_flows_st_cat.append((key, np.stack((all_mag_values, all_angle_values), axis=1)))

all_equal_immobile_object_flows_st_cat = list()
for i in range(len(equal_immobile_object_flows_st_cat)):
    array = equal_immobile_object_flows_st_cat[i][1]
    all_equal_immobile_object_flows_st_cat.append(str(','.join(str(e) for e in ['x' + str(e) for e in array])))


In [79]:
# KMeans Clustering for both videos
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
X_car_mcgill = vectorizer.fit_transform(all_equal_car_flows_mcgill)
X_obj_mcgill = vectorizer.fit_transform(all_equal_immobile_object_flows_mcgill)
X_car_st_cat = vectorizer.fit_transform(all_equal_car_flows_st_cat)
X_obj_st_cat = vectorizer.fit_transform(all_equal_immobile_object_flows_st_cat)


In [83]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2, random_state=0)
X_car_mcgill_pred = kmeans.fit_predict(X_car_mcgill)
X_obj_mcgill_pred = kmeans.fit_predict(X_obj_mcgill)
X_car_st_cat_pred = kmeans.fit_predict(X_car_st_cat)
X_obj_st_cat_pred = kmeans.fit_predict(X_obj_st_cat)




In [84]:
print("=== KMeans Clustering for McGill Drive ===")
print("\nCars")
print(X_car_mcgill_pred)
print("\nImmobile Objects")
print(X_obj_mcgill_pred)

print("\n=== KMeans Clustering for St-Catherines Drive ===")
print("\nCars")
print(X_car_st_cat_pred)
print("\nImmobile Objects")
print(X_obj_st_cat_pred)


=== KMeans Clustering for McGill Drive ===

Cars
[0 0 1 1 0 1 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1]

Immobile Objects
[0 0 1 0 1 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0]

=== KMeans Clustering for St-Catherines Drive ===

Cars
[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

Immobile Objects
[0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 1 1]


As the immobile objects holds mostly a 0 label, we can set the label 0 to be immobile objects and 1 to be moving objects

In [90]:
print("=== RESULTS FOR BOTH VIDEOS ===")
print("\nParked Cars for McGill Drive")
print(np.count_nonzero(X_car_mcgill_pred == 0))
print("\nMoving Cars for McGill Drive")
print(np.count_nonzero(X_car_mcgill_pred == 1))
print("\nParked Cars for St-Catherines Drive")
print(np.count_nonzero(X_car_st_cat_pred == 0))
print("\nMoving Cars for St-Catherines Drive")
print(np.count_nonzero(X_car_st_cat_pred == 1))

=== RESULTS FOR BOTH VIDEOS ===

Parked Cars for McGill Drive
14

Moving Cars for McGill Drive
41

Parked Cars for St-Catherines Drive
79

Moving Cars for St-Catherines Drive
1


## Summary

| Video        | Detected Pedestrians | Counted Pedestrians | Detected Parked Cars | Counted Parked Cars | Detected Moving Cars | Counted Moving Cars |
|--------      |--------              |--------             | --------             |--------             |             -------- | --------            | 
|McGill        | 35                   |34                   |    14                |       16            |       41             |    19               |
|St-Cat        | 82                   |88                   |         79           |58                   |               1      |  5                  |



## Discussion 
### Pedestrians
This implementation managed to find 35 pedestrians and 82 pedestrians in the McGill video and the St-Catherines. From my manual counting, the McGill video has 34 pedestrians and the St-Catherines has 88 pedestrians. Meaning that the application has an average of 5% error rate in terms of accuracy.

### Cars
This implementation had a lot of inaccuracies as you can see on the table. Except for the Parked Cars of the McGill video, the program did not accurately count the number of cars in that video.

### Issues
Since there is not a huge amount of data to work on, the clustering did not have the best outcome, especially in the second video in which cars moved sideways. Furthermore, due to the way it was implemented, the model took a while to measure and run making not ideal to debug. 

## Conclusion
In conclusion, YOLOv8 and DeepSORT detects and tracks pretty well the pedestrians having a relatively low error rate in terms of accuracy. However, it seems that this implementation does not count cars in the same quality as the pedestrians. Furthermore, the K-means clustering model has difficulty differentiating the parked and moving cars. In the future, it would be great to have a larger dataset to better differentiate moving elements from immobile elements. Also, some fine tuning might be needed to prevent the double counting of cars. Resizing the frames before calculating the optical flow would be good to reduce computational power.