## Edit to fill this

__Name__: Sushant Kothari

__Email id__: sk619kothari@gmail.com


To start go to file tab and create a copy of this notebook on your own drive

# __Objective:__ Given the input video file localize and draw bounding boxes around the face of characters.

- Candidate can use any methohd or platform to tackle this problem. not a fan of colab downlaod the video on to your system using [video](https://drive.google.com/file/d/1nyeeqBJyDr2zphBDQ9ruh99JBdYm4nPH/view?usp=sharing) and upload the solution back here with your code attached.

- You are free to use any model or module, either trained by you or state-of-the-art.
- The code should be well-documented. One can also use markdown cells to write your approach for every step.
- In case of plagiarism, the candidate will be immediately rejected. You can use some helper code available online but must be appropriately referenced.

In [None]:
# run this to download the video file as test_video.mp4
! gdown --fuzzy https://drive.google.com/file/d/1nyeeqBJyDr2zphBDQ9ruh99JBdYm4nPH/view?usp=sharing --o test_video.m4

Downloading...
From: https://drive.google.com/uc?id=1nyeeqBJyDr2zphBDQ9ruh99JBdYm4nPH
To: /content/test_video.m4
100% 7.54M/7.54M [00:00<00:00, 36.2MB/s]


In [None]:
# Input video path
path_video = "/content/test_video.m4"

### Your approach here

# Step 1: Install Required Libraries

In this step, we install the necessary libraries for face detection and video processing. These libraries are:
- `ultralytics` (for YOLOv8 model)
- `opencv-python` (for video handling and image processing)
- `numpy` (for numerical operations)



In [None]:
# Step 1: Install Required Libraries
!pip install ultralytics opencv-python numpy




### Step 2: Import Libraries
```markdown
### Step 2: Import Libraries

We need to import the required libraries for video processing and face detection. These include:
- `cv2`: For OpenCV functions like reading and writing video files.
- `numpy`: For handling arrays and matrices.
- `YOLO` from `ultralytics`: For utilizing the YOLO model for face detection.
- `imageio`, `matplotlib.pyplot`: For displaying video and processing frames.
- `resize` from `skimage`: For resizing images (if necessary).
- `HTML` from `IPython.display`: For displaying HTML elements like the processed video.

We will now import these libraries:


In [None]:
# Step 2: Import Libraries
import cv2
import numpy as np
from ultralytics import YOLO
import imageio
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from skimage.transform import resize
from IPython.display import HTML
import os


Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


### Step 3: Download and Load YOLOv8 Face Detection Model

To perform face detection, we will use a YOLOv8 model specifically trained for face detection. This model will offer better accuracy than the general COCO-trained YOLOv8 model for detecting faces.

In this step, we download the YOLOv8 face detection model (`yolov8l-face.pt`) from the following GitHub repository:

- **Repository**: [YOLOv8 Face Detection Model](https://github.com/akanametov/yolov8-face)
- **Model File**: [yolov8l-face.pt](https://github.com/akanametov/yolov8-face/releases/download/v0.0.0/yolov8l-face.pt)

You can download the model using the following command:


In [None]:
"""
Step 3: Download and Load YOLOv8 Face Detection Model
We'll use a YOLOv8 model specifically trained for face detection.
This provides better accuracy for our specific task compared to
the general COCO-trained model.
"""
!wget https://github.com/akanametov/yolov8-face/releases/download/v0.0.0/yolov8l-face.pt

def display_video(video):
    """
    Display the processed video in the notebook
    Args:
        video: List of frames to display
    Returns:
        Animation object for display
    """
    fig = plt.figure(figsize=(3, 3))
    mov = []
    for i in range(len(video)):
        img = plt.imshow(video[i], animated=True)
        plt.axis('off')
        mov.append([img])

    anime = animation.ArtistAnimation(fig, mov, interval=50, repeat_delay=1000)
    plt.close()
    return anime
def process_video_yolo(input_path, output_path, conf_threshold=0.5):
    """
    Process video for face detection using YOLOv8

    Args:
        input_path: Path to input video file
        output_path: Path to save processed video
        conf_threshold: Confidence threshold for face detection (default: 0.5)

    Returns:
        List of processed frames
    """
    # Initialize YOLOv8 face detection model
    model = YOLO('yolov8l-face.pt')

    # Open video capture
    cap = cv2.VideoCapture(input_path)

    # Get video properties
    frame_width = int(cap.get(3))
    frame_height = int(cap.get(4))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Initialize video writer
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))

    frames = []
    frame_count = 0

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        frame_count += 1

        # Run YOLOv8 inference on the frame
        results = model(frame, conf=conf_threshold)

        # Process detections
        for result in results:
            boxes = result.boxes
            for box in boxes:
                # Get face coordinates
                x1, y1, x2, y2 = box.xyxy[0]
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)

                # Draw face bounding box
                cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)

                # Add confidence score
                conf = float(box.conf[0])
                cv2.putText(frame, f'{conf:.2f}', (x1, y1-10),
                          cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

        # Add frame counter
        cv2.putText(frame, f'Frame: {frame_count}', (10, 30),
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

        frames.append(frame)
        out.write(frame)

    # Clean up
    cap.release()
    out.release()
    cv2.destroyAllWindows()

    return frames



--2025-01-09 11:59:23--  https://github.com/akanametov/yolov8-face/releases/download/v0.0.0/yolov8l-face.pt
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://github.com/akanametov/yolo-face/releases/download/v0.0.0/yolov8l-face.pt [following]
--2025-01-09 11:59:23--  https://github.com/akanametov/yolo-face/releases/download/v0.0.0/yolov8l-face.pt
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/592261808/42fc440b-3870-4808-87d6-dd8d59d28c9e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20250109%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250109T115923Z&X-Amz-Expires=300&X-Amz-Signature=079ed795ce8dc1ac3399e3a9b191295d968e3c46928bfbda79094fa552475834&X-Amz-SignedHeaders=host&re

# Step 4: Process the Video

In this step, we process the input video to detect faces using the YOLOv8 model. The frames are processed and saved into an output video file.


In [None]:
# Step 4: Process the Video
path_video = "/content/test_video.m4"
output_path = "/content/output_video_face.mp4"

print("Starting video processing...")
frames = process_video_yolo(path_video, output_path)
print("Video processing completed!")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Speed: 7.1ms preprocess, 3657.4ms inference, 1.1ms postprocess per image at shape (1, 3, 544, 960)


0: 544x960 1 face, 3734.6ms
0: 544x960 1 face, 3734.6ms
Speed: 7.3ms preprocess, 3734.6ms inference, 1.5ms postprocess per image at shape (1, 3, 544, 960)
Speed: 7.3ms preprocess, 3734.6ms inference, 1.5ms postprocess per image at shape (1, 3, 544, 960)


0: 544x960 1 face, 4920.4ms
0: 544x960 1 face, 4920.4ms
Speed: 9.6ms preprocess, 4920.4ms inference, 1.1ms postprocess per image at shape (1, 3, 544, 960)
Speed: 9.6ms preprocess, 4920.4ms inference, 1.1ms postprocess per image at shape (1, 3, 544, 960)


0: 544x960 1 face, 3629.5ms
0: 544x960 1 face, 3629.5ms
Speed: 7.2ms preprocess, 3629.5ms inference, 1.1ms postprocess per image at shape (1, 3, 544, 960)
Speed: 7.2ms preprocess, 3629.5ms inference, 1.1ms postprocess per image at shape (1, 3, 544, 960)


0: 544x960 1 face, 3662.1ms
0: 544x960 1 face, 3662.1ms
Speed: 6.8

# Step 5: Display the Processed Video

In this step, we resize the frames of the processed video and display the result within the notebook.


In [None]:
# Step 5: Display the Processed Video
video = [resize(frame, (256, 256))[..., :3] for frame in frames]
HTML(display_video(video).to_html5_video())

# Step 6: Save and Enable Download

In this step, we check if the processed video has been saved successfully, and if so, we enable the user to download the video directly.


In [None]:
# Step 6: Save and Enable Download
if os.path.exists(output_path):
    print(f"Video saved successfully at: {output_path}")
    from google.colab import files
    files.download(output_path)
else:
    print("Error: Output video file not found.")

Video saved successfully at: /content/output_video_face.mp4


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

#### __Q1:__ What is the video processing library and localization model you used?

__ANS__ : For this face detection project, I utilized OpenCV (cv2) for video processing tasks such as reading and manipulating video frames. The YOLOv8 face detection model, specifically the yolov8l-face variant, was the primary model used for detecting faces, as it is fine-tuned for this purpose. Supporting libraries like NumPy were used for array operations, and the Ultralytics package facilitated seamless YOLO integration. For visualization, I incorporated Matplotlib and ImageIO to present the results effectively. Each tool played a critical role in building a cohesive and efficient system


#### __Q2:__ If given enough resources, time or data. what better approach you might have implemented?

__ANS__ : With additional resources, I would explore creating an ensemble model by combining YOLOv8 with other face detection models such as RetinaFace and MTCNN. This approach would enhance detection accuracy by leveraging the strengths of multiple models. Integrating face tracking would also be a priority, enabling the system to follow specific faces across frames instead of detecting them individually. Adding advanced features like emotion recognition or age estimation could expand the application's scope. Moreover, optimizing the system for real-time performance through GPU acceleration and techniques like frame skipping would significantly enhance efficiency.

#### __Q3:__ Explain some real life use cases of Object detection or localization. If you have a project using these also explain that problem statement.
__ANS__ : Object detection has widespread applications across various domains. In retail, it aids in analyzing customer behavior and managing queues. In security, it’s used for monitoring public spaces, identifying threats, and locating missing individuals. Autonomous vehicles rely heavily on object detection for recognizing pedestrians, vehicles, and traffic signs. In healthcare, it plays a crucial role in medical image analysis and social distancing enforcement during pandemics. Additionally, in manufacturing, it ensures quality control by identifying product defects with precision.

I have personally worked on several projects leveraging object detection and localization techniques. For instance, in a lane detection system, I used computer vision to identify lane boundaries, contributing to safer navigation in autonomous driving applications. In a harmful object detection project, I designed a solution capable of identifying hazardous objects like sharp tools or weapons in public or private spaces, enhancing safety and security. My drowsiness detection system utilized facial landmarks to detect early signs of fatigue in drivers, promoting road safety. Additionally, I created an intrusion detection system that leveraged object detection to identify unauthorized entries in restricted areas. These experiences have broadened my expertise in applying object detection to solve diverse and impactful real-world problems across various industries.

#### __Q4:__ Explain breifly model architectue of ResNet?
__ANS__ : ResNet, short for Residual Network, revolutionized deep learning by addressing the challenge of training deep neural networks. Traditional deep networks often struggled with degradation as more layers were added. ResNet introduced "skip connections," which allow information to bypass certain layers, enabling the network to learn more effectively. This architecture focuses on learning residuals—the difference between the predicted and actual outputs—rather than directly mapping inputs to outputs. ResNet models range from ResNet-18 to ResNet-152, offering flexibility based on the complexity of the task. Its ability to train deep networks efficiently has made ResNet a cornerstone of modern computer vision applications.