# Landmarks extraction using OpenCV and Mediapipe Holistic

This script will import the videos using OpenCV and extract the landmarks using Mediapipe Holistic.
The Mediapipe Holistic model will extract the keypoints from the following models:
- Pose Landmark.
- Hand Landmark (for both hands).
- Face Landmark.

Created by: Marcus Vinicius da Silva Fernandes. 2023-06-11.

#### References:
- https://mediapipe-studio.webapps.google.com/home
- https://www.geeksforgeeks.org/face-and-hand-landmarks-detection-using-python-mediapipe-opencv/
- https://www.youtube.com/watch?v=pG4sUNDOZFg
- https://www.youtube.com/watch?v=0JU3kpYytuQ
- https://arrow.apache.org/docs/python/index.html

### Importing necessary libraries

In [1]:
import cv2
import mediapipe as mp
import os
import csv
import numpy as np
import pandas as pd

### Set up of the Holistic model by Mediapipe

It will run the following models:
- pose_landmarks
- face_landmarks
- left_hand_landmarks
- right_hand_landmarks

In [2]:
mp_holistic = mp.solutions.holistic  # for landmarks detection.

### Accessing the videos

Set up the paths of folders to locate the videos and the list (csv file) that associates the name of the video to the corresponding word in English.

In [3]:
# Set up of the videos path
videos_path = '/Users/marcus/Library/CloudStorage/OneDrive-Personal/Documentos/Loyalist_College/AISC2006/predictions_wlasl_for_gd/videos/'

# Set up of the extracted landmarks save path
landmarks_path = '/Users/marcus/Library/CloudStorage/OneDrive-Personal/Documentos/Loyalist_College/AISC2006/predictions_wlasl_for_gd/landmarks/'

### Landmarks detection function

In [4]:
# Function to detect the landmarks in each frame or image
def landmark_detection(frame, model):
    # Color conversion because mediapipe's landmark detection model expects RGB frames as input.
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  # color conversion BGR to RGB.
    frame.flags.writeable = False  # frame is not writeable.
    results = model.process(frame)  # landmarks detection.
    frame.flags.writeable = True  # frame is writeable.
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)  # color conversion RGB to BGR.
    return frame, results

### Landmarks coordinates extraction function

It will :
- Extract the coordinates from the parameter 'results'.
- Only x and y coordinates are saved
- Store them into a numpy array.
    - 'flatten' function will write all the coordinates in a single array, so the length will be:
        - Pose: 2 coordinates x 33 landmarks = 66 values.
        - Left hand: 2 coordinates x 21 landmarks = 42 values.
        - Right hand: 2 coordinates x 21 landmarks = 42 values.
        - Face: 2 coordinates x 468 landmarks = 936 values.
        - Each row (each frame) will have a total of 1086 values after concatenation.
    - It will store zeros if the parameter 'results' has no value for the model (e.g. it can happen when the hand was not visible and therefore was not identified).

In [5]:
# Function to extract the coordinates of the detected landmarks
def landmark_extraction(results, frame_idx):
    
    if results.face_landmarks:
        face = np.array([[frame_idx, str(frame_idx) + '-face-' + str(idx), 'face', idx, coordinate.x, coordinate.y, coordinate.z] for idx, coordinate in enumerate(results.face_landmarks.landmark)])
    else:
        face = np.array([[frame_idx, str(frame_idx) + '-face-' + str(idx), 'face', idx, 0, 0, 0] for idx in range(468)])

    if results.left_hand_landmarks:
        left_hand = np.array([[frame_idx, str(frame_idx) + '-left_hand-' + str(idx), 'left_hand', idx, coordinate.x, coordinate.y, coordinate.z] for idx, coordinate in enumerate(results.left_hand_landmarks.landmark)])
    else:
        left_hand = np.array([[frame_idx, str(frame_idx) + '-left_hand-' + str(idx), 'left_hand', idx, 0, 0, 0] for idx in range(21)])

    if results.pose_landmarks:
        pose = np.array([[frame_idx, str(frame_idx) + '-pose-' + str(idx), 'pose', idx, coordinate.x, coordinate.y, coordinate.z] for idx, coordinate in enumerate(results.pose_landmarks.landmark)])
    else:
        pose = np.array([[frame_idx, str(frame_idx) + '-pose-' + str(idx), 'pose', idx, 0, 0, 0] for idx in range(33)])
    
    if results.right_hand_landmarks:
        right_hand = np.array([[frame_idx, str(frame_idx) + '-right_hand-' + str(idx), 'right_hand', idx, coordinate.x, coordinate.y, coordinate.z] for idx, coordinate in enumerate(results.right_hand_landmarks.landmark)])
    else:
        right_hand = np.array([[frame_idx, str(frame_idx) + '-right_hand-' + str(idx), 'right_hand', idx, 0, 0, 0] for idx in range(21)])
    
    return np.concatenate([face, left_hand, pose, right_hand])


### Main code for detection and extraction
- Loading the videos and converting them into frames by OpenCV.
- For each frame, the function landmark_detection will be called to make the detections.

In [6]:
# Capturing the video frames from the files in the video path
for item in os.listdir(videos_path):
    if item.endswith('.mp4'):  # working with video files only
        cap = cv2.VideoCapture(videos_path + item)

        # List that will receive the landmark's coordinates for each video
        landmarks_list = []
        frame_idx = 1
        
        # Set mediapipe model
        with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
                
            # Looping through all the frames
            while cap.isOpened():  # making sure it is reading frames

                # Reading the frames
                ret, frame = cap.read()
                if not ret:  # in case a frame wasn't successfully read or the last frame was already worked on
                    break

                # Resizing every frame to a commom value
                frame = cv2.resize(frame, (256, 256))

                # Making detections
                image, results = landmark_detection(frame, holistic)
                
                # Extracting landmarks
                # The list for each video will have: 1086 columns (landmark's coordinates) and number of rows equal to the number of frames of the video
                landmarks_list.append(landmark_extraction(results, frame_idx))
                
                landmarks_array = np.concatenate(landmarks_list, axis=0)
                frame_idx += 1

                cv2.waitKey(10)
            cap.release()
            cv2.destroyAllWindows()

        index_col = np.arange(543 * (frame_idx - 1)).reshape(-1, 1)
        landmarks_array = np.hstack((index_col, landmarks_array))

        # Saving the NumPy array
        # np.save(landmarks_path + '/' + item.split(".mp4")[0], landmarks_array)
        
        # Storing the array into parquet file
        column_names = ['index', 'frame', 'row_id', 'type', 'landmark_index', 'x', 'y', 'z']
        df = pd.DataFrame(landmarks_array, columns=column_names)  # creating a dataframe
        df.to_parquet(landmarks_path + '/' + item.split(".mp4")[0] + '.parquet', index=False)  # saving the dataframe in a parquet file

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
