**Name**: Sreeja Guduri <br>
**Assignment - 3**<br>
**Roll number**: 2021102007

<h1><u>Question - 1</h1></u>

<h4><u>PART-1</u></h4>

* For this question, we are expected to detect faces from the first 30 seconds of a scence from the movie 'Forrest Gump'.
* 'ffmpeg' is used to extract the frames (assuming 24fps) from the first 30 seconds. This gives us about 720 frames.
<br>

The command used is: 'ffmpeg -i Forrest.mp4 -ss 00:00:00 -t 00:00:30 -vf fps=24 frames/frame_%04d.png'

In [2]:
import cv2 as cv
import matplotlib.pyplot as plt
import time
import numpy as np
import os

<h4><u>PART-2 & PART-3</u></h4>

* The Viola Jones Harr Cascades face detector (from the OpenCV library) is used to detect faces. This algorithm uses Harr features, an integral image and a cascade classifier to determine if an object is a face or not.

* The average processing time per frame is 0.031230 seconds. This means this algorithm is suitable for real time face detection.

*  The main paramters that could change the processing time are:
1. <u>scaleFactor:</u> specifies how much the image size is reduced at each image scale. If it is a large value, the processing time is less but we could miss the fine details of the object, making the detection less accurate. A smaller value means more processing time but higher accuracy.
2. <u>minNeighbours:</u> specifies how many neighbouring pixels we look at when considering a feature. Higher the number of neighbours we need to look at, slower in the computation.
3. The XML file used is: **haarcascade_frontalface_alt.xml**, which is a lot better than **haarcascade_frontalface_default.xml** because it detects a lot of false positives.

* The video with the detected faces is then saved using the command: 'ffmpeg -i Forrest.mp4 -ss 00:00:00 -t 00:00:30 -vf fps=24 frames/frame_%04d.png'

<br>

<br>

Video link: https://drive.google.com/file/d/1pY9bOBxm4xIQ2BV7aGYxC_GgN7y4xLGQ/view?usp=sharing

Three cases when the detector works/fails:
1. The Harr Casade Detector works only for frontal faces (faces directly facing the camera). Thus, parts of the video where they are facing to the side are not accurately detected. 
2. When the people in the scene are far away and their faces are small, the face detector doesn't work well either. This could be because the feature detector is big and thus missed smaller, finer details of the small faces. 
3. The face detector works the best when the faces are close to the camera, facing front and are well-lit and illuminated.

In [3]:
face_cascade = cv.CascadeClassifier(cv.data.haarcascades + 'haarcascade_frontalface_alt.xml')
mean = 0
box_coord = []

for i in range(1, 721):
    frame = cv.imread(f'frames/frame_{i:04d}.png')
    gray_frame = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)

    start_time = time.time()
    faces = face_cascade.detectMultiScale(gray_frame, scaleFactor=1.1, minNeighbors=4, minSize=(30, 30))
    end_time = time.time()
    mean += end_time - start_time
    for (x, y, w, h) in faces:
        cv.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
        
    #save to a new folder
    cv.imwrite(f'output_frames/frame_{i:04d}.png', frame)
    box_coord.append(faces)
    
mean /= 720
print(f'The mean processing time for each frame is {mean} seconds')
cv.destroyAllWindows()   

The mean processing time for each frame is 0.030082630780008106 seconds


In [4]:
print(box_coord[30])

[[143 213 122 122]]


<h4><u>PART-4</u></h4>

* A naive face tracking algorithm is implemented to keep a track of the various faces that are detected.
*  The idea is to compare the faces detected in two consecutive frames and give two faces the same ID if their Intersection over Union (IoU) is greater than 0.5.
* If a new face is encountered, we give it a new ID and if a face doesn't appear in the next frame, we end its track.

<br> 

A total of 57 unique tracks were created in the first 30 seconds. There were certain false positives that caused the number to go high also.

<br>

As is evident from the video - https://drive.google.com/file/d/1lEgDH2x8ZV3q289k01wv1Lht-RM24KKB/view?usp=sharing
 - the tracking is not very reliable. Faces are tracked with the same ID if the repv scene was similar and did not move a lot. 
* At 0:01, we see that the girls face is assosciated with both IDs 4 and 5 even though they should get the same ID. This is because girls face is not detected in the frame in between those two frames (possibly because she was facing sideways). Thus, the algorithm assumes ends the track and starts a new one when the face is detected again.

* At 0:12 also there is a case of a false positive (the boys' ear is wrongly detected as a face) and so this throws off the whole face tracking process because it is assigned a unique face ID (eventhough it is not a face).

* However different people never get assosciated with the same track. 

In [8]:
def iou(box1, box2):
    x1_box1, y1_box1, x2_box1, y2_box1 = box1
    x1_box2, y1_box2, x2_box2, y2_box2 = box2
    
    # Calculate intersection coordinates
    x1_intersection = max(x1_box1, x1_box2)
    y1_intersection = max(y1_box1, y1_box2)
    x2_intersection = min(x2_box1, x2_box2)
    y2_intersection = min(y2_box1, y2_box2)
    

    area = max(0, x2_intersection - x1_intersection) * max(0, y2_intersection - y1_intersection)
    
    
    area_box1 = (x2_box1 - x1_box1) * (y2_box1 - y1_box1)
    area_box2 = (x2_box2 - x1_box2) * (y2_box2 - y1_box2)
    union_area = area_box1 + area_box2 - area
    
    # Calculate IoU score
    if union_area > 0 :
        iou = area / union_area
    else: 
        iou = 0
    
    return iou

In [6]:
# boxcoord has the coordinates of the first 30 secs
track_id = 0
face_tracks = []

for i in range(720):
        face_match = []
        
        for (x,y,w,h) in box_coord[i]:
                matched = False
                for track in face_tracks:
                        prev_frame,prev_id,prev_box = track
                        if i - prev_frame==1:
                                if iou(prev_box, (x,y,x+w,y+h)) > 0.5:
                                        face_match.append((i,prev_id,(x,y,x+w,y+h)))
                                        matched = True
                                        break
                if not matched:
                        face_match.append((i,track_id,(x,y,x+w,y+h)))
                        track_id += 1  
                        
        face_tracks.extend(face_match)      

unique_tracks = set()
for track in face_tracks:
    frame_num, face_id, _ = track
    unique_tracks.add(face_id)

num_unique_tracks = len(unique_tracks)
print("Number of unique tracks created:", num_unique_tracks)
        

Number of unique tracks created: 57


In [7]:
for i, track in enumerate(face_tracks):
    frame_num, face_id, box = track
    frame = cv.imread(f'output_frames/frame_{frame_num+1:04d}.png')
    cv.putText(frame, f'ID: {face_id}', (box[0], box[1] - 10), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    cv.imwrite(f'output_frames/frame_{frame_num+1:04d}.png', frame)


<h1><u>QUESTION-2</u></h1>

The link to the google collab with the second question is here: https://colab.research.google.com/drive/1vDlGA4tDDnAGXs6cRdxAARZugZSWefrM?usp=sharing

