# Driver Alert System Prototype for Smart India Hackathon

This Jupyter Notebook contains a prototype version of our Driver Safety system for the Smart India Hackathon

## Drowsiness Detection using Mediapipe in Python

This is the prototype version of the final Driver Safety System. This prototype version contains the drowsiness detection system using Mediapipe in Python. The final version will contain the drowsiness detection system, the distraction detection system and the alert system along with all other functionalities. 

As far as the prototype is concerned, we need 3 things to create a drowsiness detection,' system. 

1. `Face Detection` - To detect the face in the frame. We can use any camera for this purpose.
2. `Facial Landmark Detection` - To detect the facial landmarks in the face detected. We will use Mediapipe for this purpose. We use the pre built ``MediaPipe Face Detection`` and ``MediaPipe Face Mesh`` models for this purpose.
3. `Drowsiness Detection` - To detect the drowsiness of the driver. We will use the EAR (Eye Aspect Ratio) for this purpose. This approach has been detailed in the simple yet robust research paper Real-Time Eye Blink Detection using Facial Landmarks
(https://vision.fe.uni-lj.si/cvww2016/proceedings/papers/05.pdf) by Soukupová and Čech in 2016. In the above-linked paper, the authors have described their approach to blink detection. An eye blink is a speedy closing and reopening action. For this, the authors use an SVM classifier to detect eye blink as a pattern of EAR values in a short temporal window. 



In order to detect drowsiness, all we have to keep in mind is that **our eyes close when we feel drowsy** and hence we can use the EAR to detect drowsiness. 

The EAR is calculated as follows: 

<img src="images\04-driver-drowsiness-detection-EAR-equation-1-768x167.png"> 

The EAR formula returns a single scalar quantity that reflects the level of eye-opening.


<img src="images\03-driver-drowsiness-detection-EAR-points-768x297.png">

### Our approach works as follows- 

1. First, we declare two threshold values and a counter.
    1. EAR_thresh: A threshold value to check whether the current EAR value is within range.
    2. D_TIME: A counter variable to track the amount of time passed with current EAR < EAR_THRESH. 
    3. WAIT_TIME: To determine whether the amount of time passed with EAR < EAR_THRESH exceeds the permissible limit. 
2. When the application starts, we record the current time (in seconds) in a variable t1 and read the incoming frame.
3. Next, we preprocess and pass the frame through Mediapipe’s Face Mesh solution pipeline. 
4. We retrieve the relevant (Pi) eye landmarks if any landmark detections are available. Otherwise, reset t1 and D_TIME (D_TIME is also reset over here to make the algorithm consistent).
5. If detections are available, calculate the average EAR value for both eyes using the retrieved eye landmarks.
6. If the current EAR < EAR_THRESH, add the difference between the current time t2 and t1 to D_TIME. Then reset t1 for the next frame as t2.
7. If the D_TIME >= WAIT_TIME, we sound the alarm or move on to the next frame.

### Landmark Detection Using Mediapipe Face Mesh In Python

We will use the pre built ``MediaPipe Face Detection`` and ``MediaPipe Face Mesh`` models for this purpose.

Since we are focusing on driver drowsiness detection, out of the 468 points, we only need landmark points belonging to the eye regions. The eye regions have 32 landmark points (16 points each). For calculating the EAR, we require only 12 points (6 for each eye).

the model first utilizes face detection along with a facial landmark detection model. For face detection, the pipeline uses the BlazeFace model, which has a very high inference speed. The BlazeFace model is a lightweight model that is optimized for mobile devices. It is based on the Single Shot Detector (SSD) framework with a custom lightweight backbone network. The model is trained using the Quantization Aware Training (QAT) technique to reduce the model size and inference latency. The model is trained on the WIDERFACE dataset, which contains 32,203 images and 393,703 faces with a high degree of variability in scale, pose, and occlusion as depicted in the image below.

### The Eye Aspect Ratio (EAR) Technique

1. We will use Mediapipe’s Face Mesh solution to detect and retrieve the relevant landmarks in the eye region (points P1 – P6 in the below image).
2. After retrieving the relevant points, the Eye Aspect Ratio (EAR) is computed between the height and width of the eye.

The EAR is mostly constant when an eye is open and gets close to zero, while closing an eye is partially person, and head pose insensitive. The aspect ratio of the open eye has a small variance among individuals. It is fully invariant to a uniform scaling of the image and in-plane rotation of the face. Since eye blinking is performed by both eyes synchronously, the EAR of both eyes is averaged. 

First, we have to calculate Eye Aspect Ratio for each eye:

<img src="images\11-driver-drowsiness-detection-EAR-equation-left-right-740x340.webp">

For calculating the final EAR value, the authors suggest averaging the two EAR values.

<img src="images\12-driver-drowsiness-detection-AVG_EAR-equation-768x193.webp">



Now that we have a decent idea of how our drowsiness detection works, let’s move on to the implementation part.

### Implementatation

The first thing we need to do is to import the necessary libraries. 

1. `cv2` - OpenCV is a library of programming functions mainly aimed at real-time computer vision. We will use OpenCV to read the video stream from the webcam.
2. `numpy` - NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. We will use NumPy to perform numerical computations.
3. `matplotlib` - Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. We will use Matplotlib to plot the graphs.
4. `mediapipe` - MediaPipe offers ready-to-use yet customizable Python solutions as a prebuilt Python package. We will use MediaPipe to detect the facial landmarks.

In order to install the above libraries, run the following command in your terminal:

```pip install -r requirements.txt```   

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import mediapipe as mp

mp_facemesh = mp.solutions.face_mesh
mp_drawing  = mp.solutions.drawing_utils
denormalize_coordinates = mp_drawing._normalized_to_pixel_coordinates

%matplotlib inline

The next thing that needs to be done is to calculate the landmark points for the eyes. We will use the pre built ``MediaPipe Face Detection`` and ``MediaPipe Face Mesh`` models for this purpose.

Since we are focusing on driver drowsiness detection, out of the 468 points, we only need landmark points belonging to the eye regions. The eye regions have 32 landmark points (16 points each). For calculating the EAR, we require only 12 points (6 for each eye).

In the following code snippet, we are working with MediaPipe's FaceMesh in order to extract the eye landmarks for the left and the right eyes. The code provides combined landmark points in order for us to plot them on the face. 

1. ``mp_facemesh.FACEMESH_LEFT_EYE`` - This is the list of landmark points for the left eye.
2. ``mp_facemesh.FACEMESH_RIGHT_EYE`` - This is the list of landmark points for the right eye. 
3. ``all_left_eye_idxs`` and ``all_right_eye_idxs`` are created by converting these lists into regular Python lists and then flattening them using ``np.ravel``. (Flattening essentially converts a nested list into a single-dimensional list and removes duplicates.) 
4. The ``set()`` function is used to remove any duplicate landmarks that may have been present in the flattened lists. This ensures that each landmark point is unique within its respective eye.
5. Finally, ``all_idxs`` is created by taking the union of ``all_left_eye_idxs`` and ``all_right_eye_idxs``. This combines all the unique landmark points from both the left and right eyes into a single set for plotting purposes.

After running this code, ``all_left_eye_idxs`` and ``all_right_eye_idxs`` will contain the unique landmark points for the left and right eyes, respectively, and all_idxs will contain all the unique landmark points for both eyes combined, which you can use for visualization or further processing.

In [None]:
# Landmark points corresponding to left eye
all_left_eye_idxs = list(mp_facemesh.FACEMESH_LEFT_EYE)
all_left_eye_idxs = set(np.ravel(all_left_eye_idxs)) # flatten and remove duplicates
print("Left eye landmarks:", all_left_eye_idxs)

# Landmark points corresponding to right eye
all_right_eye_idxs = list(mp_facemesh.FACEMESH_RIGHT_EYE)
all_right_eye_idxs = set(np.ravel(all_right_eye_idxs)) # flatten and remove duplicates
print("Right eye landmarks:", all_right_eye_idxs)

# Combined for plotting use - Landmark points for both eye
all_idxs = all_left_eye_idxs.union(all_right_eye_idxs)

In [None]:
# The chosen 12 points:   P1,  P2,  P3,  P4,  P5,  P6
chosen_left_eye_idxs  = [362, 385, 387, 263, 373, 380]
chosen_right_eye_idxs = [33,  160, 158, 133, 153, 144]

all_chosen_idxs = chosen_left_eye_idxs + chosen_right_eye_idxs

Our next step is to load the images and videos. We will use the OpenCV library for this purpose. 

1. `cv2.imread()` - This function loads an image from the specified file.
2. `cv2.cvtColor()` - This function converts an image from one color space to another. We will use this function to convert the image from BGR to RGB color space.
3. `np.ascontigousarray()` - This function creates a contiguous array (data is stored in a single continuous block of memory) from the given array (non-contiguous array). We will use this function to convert the image to a contiguous array.

In [None]:
# Load an image. 

image = cv2.imread("test-open-eyes.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # convert to RGB

image = np.ascontiguousarray(image)

imgH, imgW, _ = image.shape

plt.imshow(image);

the `mp_face_mesh` module contains the MediaPipe Face Mesh solution. To use this module, we need to first create a `FaceMesh` object. We can then use this object to process the incoming frames. 

The `FaceMesh` object can be created as follows:

In [None]:
# Running inference using static_image_mode 

with mp_facemesh.FaceMesh(
    static_image_mode=True,        # Default=False
    max_num_faces=1,               # Default=1
    refine_landmarks=False,        # Default=False
    min_detection_confidence=0.5,  # Default=0.5
    min_tracking_confidence= 0.5,  # Default=0.5
) as face_mesh:
    
    results = face_mesh.process(image)

print(bool(results.multi_face_landmarks)) # Indicates whether any detections are available or not.

1. ``landmark_0`` is assigned the coordinates ``(X, Y, Z)`` of the first facial landmark (index 0) from the detected face landmarks of the first face in the frame. This assumes that results.multi_face_landmarks is a list of face landmarks, and you're working with the first face in the list.

2. ``landmark_0_x`` is calculated as the X-coordinate of ``landmark_0``, scaled by the width of the image ``(imgW)``.

3. ``landmark_0_y`` is calculated as the Y-coordinate of ``landmark_0``, scaled by the height of the image ``(imgH)``.

4. ``landmark_0_z`` is calculated as the Z-coordinate of ``landmark_0``, scaled by the width of the image ``(imgW)``. The documentation indicates that Z-coordinates are typically scaled by the width of the image.

5. The X, Y, and Z coordinates of landmark_0 are printed.

6. An empty line is printed for formatting.

7. The total number of ``landmarks`` in ``results.multi_face_landmarks[0].landmark`` and is printed.

Based on the code and its output, you are extracting the coordinates of the first facial ``landmark`` (landmark_0) and scaling them to match the dimensions of the original image. The X and Y coordinates are scaled by the width and height of the image, respectively, while the Z coordinate is scaled by the width of the image as per the documentation.

The last print statement indicates the total number of landmarks available for the first detected face, which helps you understand how many landmarks are associated with each face in the frame.

In [None]:
# Get the first landmark point on the first detected face

landmark_0 = results.multi_face_landmarks[0].landmark[0]
print(landmark_0)

landmark_0_x = landmark_0.x * imgW
landmark_0_y = landmark_0.y * imgH
landmark_0_z = landmark_0.z * imgW # according to documentation

print("X:", landmark_0_x)
print("Y:", landmark_0_y)
print("Z:", landmark_0_z)

print()
print("Total Length of '.landmark':", len(results.multi_face_landmarks[0].landmark))

This code below defines a function named `plot` that visualizes facial landmarks detected by a model like MediaPipe's FaceMesh. Here's a breakdown of what the code does:

1. **Function Arguments**: The function takes several arguments, all of which are optional and have default values. These include images (`img_dt`, `img_eye_lmks`, `img_eye_lmks_chosen`), face landmarks (`face_landmarks`), and parameters for drawing (`ts_thickness`, `ts_circle_radius`, `lmk_circle_radius`, `name`).

2. **Drawing Utilities Initialization**: The function initializes drawing utilities for plotting the face mesh tessellation.

3. **Landmarks Plotting**: It iterates over all landmarks and plots circles at the coordinates of each landmark. It differentiates between all eye landmarks and chosen landmarks.

4. **Image Plotting**: It plots three images: one with the face mesh tessellation, one with all eye landmarks, and one with chosen landmarks. These are displayed as subplots in a matplotlib figure.



In [None]:
def plot(
    *,
    img_dt,
    img_eye_lmks=None,
    img_eye_lmks_chosen=None,
    face_landmarks=None,
    ts_thickness=1,
    ts_circle_radius=2,
    lmk_circle_radius=3,
    name="1",
):
    # For plotting Face Tessellation
    image_drawing_tool = img_dt 
    
     # For plotting all eye landmarks
    image_eye_lmks = img_dt.copy() if img_eye_lmks is None else img_eye_lmks
    
    # For plotting chosen eye landmarks
    img_eye_lmks_chosen = img_dt.copy() if img_eye_lmks_chosen is None else img_eye_lmks_chosen

    # Initializing drawing utilities for plotting face mesh tessellation
    connections_drawing_spec = mp_drawing.DrawingSpec(
        thickness=ts_thickness, circle_radius=ts_circle_radius, color=(255, 255, 255)
    )

    # Initialize a matplotlib figure.
    fig = plt.figure(figsize=(20, 15))
    fig.set_facecolor("white")

    # Draw landmarks on face using the drawing utilities.
    mp_drawing.draw_landmarks(
        image=image_drawing_tool,
        landmark_list=face_landmarks,
        connections=mp_facemesh.FACEMESH_TESSELATION,
        landmark_drawing_spec=None,
        connection_drawing_spec=connections_drawing_spec,
    )

    # Get the object which holds the x, y and z coordinates for each landmark
    landmarks = face_landmarks.landmark

    # Iterate over all landmarks.
    # If the landmark_idx is present in either all_idxs or all_chosen_idxs,
    # get the denormalized coordinates and plot circles at those coordinates.

    for landmark_idx, landmark in enumerate(landmarks):
        if landmark_idx in all_idxs:
            pred_cord = denormalize_coordinates(landmark.x, landmark.y, imgW, imgH)
            cv2.circle(image_eye_lmks, pred_cord, lmk_circle_radius, (255, 255, 255), -1)

        if landmark_idx in all_chosen_idxs:
            pred_cord = denormalize_coordinates(landmark.x, landmark.y, imgW, imgH)
            cv2.circle(img_eye_lmks_chosen, pred_cord, lmk_circle_radius, (255, 255, 255), -1)


    # Plot post-processed images
    plt.subplot(1, 3, 1)
    plt.title("Face Mesh Tessellation", fontsize=18)
    plt.imshow(image_drawing_tool)
    plt.axis("off")

    plt.subplot(1, 3, 2)
    plt.title("All eye landmarks", fontsize=18)
    plt.imshow(image_eye_lmks)
    plt.axis("off")

    plt.subplot(1, 3, 3)
    plt.imshow(img_eye_lmks_chosen)
    plt.title("Chosen landmarks", fontsize=18)
    plt.axis("off")
#     plt.subplots_adjust(left=0.02, right=0.98, top=None, bottom=0.4, hspace=1.0)
#     plt.savefig(f'image_{name}.png', dpi=200.0, bbox_inches="tight")
    plt.show()
    plt.close()
    return

This code snippet below is checking if there are any face landmarks detected in the image. If there are (`if results.multi_face_landmarks:`), it iterates over each detected face. For each face, it calls the `plot` function to visualize the facial landmarks on a copy of the original image.

Here's a breakdown:

1. `if results.multi_face_landmarks:`: This line checks if any face landmarks have been detected in the image.

2. `for face_id, face_landmarks in enumerate(results.multi_face_landmarks):`: This line starts a loop that iterates over each detected face. For each face, it provides an ID (`face_id`) and the facial landmarks (`face_landmarks`).

3. `_ = plot(img_dt=image.copy(), face_landmarks=face_landmarks)`: This line calls the `plot` function, which visualizes the facial landmarks on a copy of the original image. The function's return value is not stored (hence the `_`), indicating that we're only interested in the side effect of the function (i.e., the visualization) and not its return value.

Please note that this code assumes that `results` is a variable containing the output of a facial landmark detection model, and `image` is the image on which detections were performed.

In [None]:
# If detections are available.
if results.multi_face_landmarks:
    
    # Iterate over detections of each face. Here, we have max_num_faces=1, 
    # so there will be at most 1 element in the 'results.multi_face_landmarks' list            
    # Only one iteration is performed.

    for face_id, face_landmarks in enumerate(results.multi_face_landmarks):    
        _ = plot(img_dt=image.copy(), face_landmarks=face_landmarks)

In [None]:
def distance(point_1, point_2):
    """Calculate l2-norm between two points"""
    dist = sum([(i - j) ** 2 for i, j in zip(point_1, point_2)]) ** 0.5
    return dist

The function `get_ear` calculates the Eye Aspect Ratio (EAR) for one eye based on facial landmarks. Here's a breakdown of what it does:

1. **Function Arguments**: The function takes several arguments:
   - `landmarks`: A list of detected landmarks.
   - `refer_idxs`: Index positions of the chosen landmarks in order P1, P2, P3, P4, P5, P6.
   - `frame_width` and `frame_height`: The dimensions of the captured frame.

2. **Coordinate Calculation**: For each landmark index in `refer_idxs`, it denormalizes the coordinates and appends them to `coords_points`.

3. **Distance Calculation**: It calculates the Euclidean distance between pairs of points (P2-P6, P3-P5, and P1-P4).

4. **EAR Calculation**: It computes the Eye Aspect Ratio (EAR) using the formula `(P2_P6 + P3_P5) / (2.0 * P1_P4)`.

5. **Error Handling**: If an error occurs during these calculations (for example, if a landmark is not detected), it sets `ear` to 0.0 and `coords_points` to None.

6. **Return Values**: It returns the calculated EAR and the list of coordinates.

This function can be used in applications like drowsiness detection, where a decrease in EAR can indicate that a person is starting to fall asleep.

In [None]:
def get_ear(landmarks, refer_idxs, frame_width, frame_height):
    """
    Calculate Eye Aspect Ratio for one eye.

    Args:
        landmarks: (list) Detected landmarks list
        refer_idxs: (list) Index positions of the chosen landmarks
                            in order P1, P2, P3, P4, P5, P6
        frame_width: (int) Width of captured frame
        frame_height: (int) Height of captured frame

    Returns:
        ear: (float) Eye aspect ratio
    """
    try:
        # Compute the euclidean distance between the horizontal
        coords_points = []
        for i in refer_idxs:
            lm = landmarks[i]
            coord = denormalize_coordinates(lm.x, lm.y, frame_width, frame_height)
            coords_points.append(coord)

        # Eye landmark (x, y)-coordinates
        P2_P6 = distance(coords_points[1], coords_points[5])
        P3_P5 = distance(coords_points[2], coords_points[4])
        P1_P4 = distance(coords_points[0], coords_points[3])

        # Compute the eye aspect ratio
        ear = (P2_P6 + P3_P5) / (2.0 * P1_P4)

    except:
        ear = 0.0
        coords_points = None

    return ear, coords_points

Calculate `EAR` for the previously detected landmarks.

In [None]:
landmarks = face_landmarks.landmark

In [None]:
left_ear, _  = get_ear(landmarks, chosen_left_eye_idxs, imgW, imgH)
right_ear, _ = get_ear(landmarks, chosen_right_eye_idxs, imgW, imgH)

EAR = (left_ear + right_ear) / 2

print("left_ear: ", left_ear)
print("right_ear:", right_ear)
print("Avg. EAR: ", EAR)

In [None]:
def calculate_avg_ear(landmarks, left_eye_idxs, right_eye_idxs, image_w, image_h):
    # Calculate Eye aspect ratio 
    left_ear,  _ = get_ear(landmarks, left_eye_idxs,  imgW, imgH)
    right_ear, _ = get_ear(landmarks, right_eye_idxs, imgW, imgH)

    Avg_EAR = (left_ear + right_ear) / 2
    return Avg_EAR

Th code displayed below is designed to calculate the Eye Aspect Ratio (EAR) for two images: one with the eyes open (`test-open-eyes.jpg`) and one with the eyes closed (`test-close-eyes.jpg`). Here's a breakdown of what it does:

1. **Image Loading**: It loads the two images using OpenCV's `imread` function and reverses the color channels from BGR to RGB.

2. **Image Processing**: For each image, it ensures that the image data is stored in a contiguous block of memory and retrieves the image dimensions.

3. **FaceMesh Model**: It initializes a MediaPipe FaceMesh model with `refine_landmarks=True` to improve landmark accuracy.

4. **Landmark Detection**: It processes each image with the FaceMesh model to detect facial landmarks.

5. **EAR Calculation**: If landmarks are detected, it calculates the average EAR for the left and right eyes using the `calculate_avg_ear` function (which is not shown in this snippet).

6. **EAR Display**: It overlays the calculated EAR onto a copy of the original image.

7. **Visualization**: It calls the `plot` function to visualize the facial landmarks on a copy of the original image.

This code could be used in applications like drowsiness detection, where a decrease in EAR can indicate that a person is starting to fall asleep. Please note that this code assumes that functions like `calculate_avg_ear` and `plot`, as well as variables like `chosen_left_eye_idxs` and `chosen_right_eye_idxs`, are defined elsewhere in your script.

In [None]:
image_eyes_open  = cv2.imread("test-open-eyes.jpg")[:, :, ::-1]
image_eyes_close = cv2.imread("test-close-eyes.jpg")[:, :, ::-1]

for idx, image in enumerate([image_eyes_open, image_eyes_close]):
    
    image = np.ascontiguousarray(image)
    imgH, imgW, _ = image.shape

    # Creating a copy of the original image for plotting the EAR value
    custom_chosen_lmk_image = image.copy()

    # Running inference using static_image_mode
    with mp_facemesh.FaceMesh(refine_landmarks=True) as face_mesh:
        results = face_mesh.process(image)

        # If detections are available.
        if results.multi_face_landmarks:

            # Iterate over detections of each face. Here, we have max_num_faces=1, so only one iteration is performed.
            for face_id, face_landmarks in enumerate(results.multi_face_landmarks):

                landmarks = face_landmarks.landmark
                EAR = calculate_avg_ear(landmarks, chosen_left_eye_idxs, chosen_right_eye_idxs, imgW, imgH)

                # Print the EAR value on the custom_chosen_lmk_image.
                cv2.putText(custom_chosen_lmk_image, f"EAR: {round(EAR, 2)}", (1, 24), 
                            cv2.FONT_HERSHEY_COMPLEX, 0.9, (255, 255, 255), 2
                )

                plot(img_dt=image.copy(),img_eye_lmks_chosen=custom_chosen_lmk_image, face_landmarks=face_landmarks,
                     ts_thickness=1, ts_circle_radius=3, lmk_circle_radius=3
                )