# Landmarks extraction using OpenCV and Mediapipe Holistic

This script will import the videos using OpenCV and extract the landmarks using Mediapipe Holistic.
The Mediapipe Holistic model will extract the keypoints from the following models:
- Pose Landmark.
- Hand Landmark (for both hands).
- Face Landmark.

Created by: Marcus Vinicius da Silva Fernandes. 2023-06-11.

#### References:
- https://mediapipe-studio.webapps.google.com/home
- https://www.geeksforgeeks.org/face-and-hand-landmarks-detection-using-python-mediapipe-opencv/
- https://www.youtube.com/watch?v=pG4sUNDOZFg
- https://www.youtube.com/watch?v=0JU3kpYytuQ
- https://arrow.apache.org/docs/python/index.html

### Importing necessary libraries

In [1]:
import cv2
import mediapipe as mp
import os
import csv
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq

### Set up of the Holistic model by Mediapipe

It will run the following models:
- pose_landmarks
- face_landmarks
- left_hand_landmarks
- right_hand_landmarks

In [2]:
mp_holistic = mp.solutions.holistic  # for landmarks detection.

### Accessing the videos

Set up the paths of folders to locate the videos and the list (csv file) that associates the name of the video to the corresponding word in English.

In [3]:
# Set up of the videos path
videos_path = '/Users/marcus/Library/CloudStorage/OneDrive-Personal/Documentos/Loyalist_College/AISC2006/WLASL_videos_clean_1000v/'

# Set up of the extracted landmarks save path
landmarks_path = '/Users/marcus/Library/CloudStorage/OneDrive-Personal/Documentos/Loyalist_College/AISC2006/resized_extracted_landmarks_xy/'

### Landmarks detection function

In [4]:
# Function to detect the landmarks in each frame or image
def landmark_detection(frame, model):
    # Color conversion because mediapipe's landmark detection model expects RGB frames as input.
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  # color conversion BGR to RGB.
    frame.flags.writeable = False  # frame is not writeable.
    results = model.process(frame)  # landmarks detection.
    frame.flags.writeable = True  # frame is writeable.
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)  # color conversion RGB to BGR.
    return frame, results

### Landmarks coordinates extraction function

It will :
- Extract the coordinates from the parameter 'results'.
- Only x and y coordinates are saved
- Store them into a numpy array.
    - 'flatten' function will write all the coordinates in a single array, so the length will be:
        - Pose: 2 coordinates x 33 landmarks = 66 values.
        - Left hand: 2 coordinates x 21 landmarks = 42 values.
        - Right hand: 2 coordinates x 21 landmarks = 42 values.
        - Face: 2 coordinates x 468 landmarks = 936 values.
        - Each row (each frame) will have a total of 1086 values after concatenation.
    - It will store zeros if the parameter 'results' has no value for the model (e.g. it can happen when the hand was not visible and therefore was not identified).

In [5]:
# Function to extract the coordinates of the detected landmarks
def landmark_extraction(results):
    pose = np.array([[coordinate.x, coordinate.y] for coordinate in results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else np.zeros(33 * 2)
    left_hand = np.array([[coordinate.x, coordinate.y] for coordinate in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21 * 2)
    right_hand = np.array([[coordinate.x, coordinate.y] for coordinate in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21 * 2)
    face = np.array([[coordinate.x, coordinate.y] for coordinate in results.face_landmarks.landmark]).flatten() if results.face_landmarks else np.zeros(468 * 2)
    return np.concatenate([pose, left_hand, right_hand, face])

### Convert and store the numpy array into parquet file function

In [6]:
# Function to convert and store the numpy array into parquet file
def parquet_writer(np_array, video_id):
    np_array_flat = np_array.flatten()
    pa_array = pa.array(np_array_flat)  # converting the numpy array into a pyarrow array
    table = pa.Table.from_arrays([pa_array], names=[video_id])  # creating a table
    writer = pq.ParquetWriter(landmarks_path + video_id + '.parquet', table.schema)  # Create a Parquet file writer
    writer.write_table(table)  # Write the table to the Parquet file
    writer.close()  # Close the Parquet file writer
    return

### Main code for detection and extraction
- Loading the videos and converting them into frames by OpenCV.
- For each frame, the function landmark_detection will be called to make the detections.

In [7]:
# Capturing the video frames from the files in the video path
for item in os.listdir(videos_path):
    if item.endswith('.mp4'):  # working with video files only
        cap = cv2.VideoCapture(videos_path + item)

        # List that will receive the landmark's coordinates for each video
        landmarks_list = []

        # Set mediapipe model
        with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
                
            # Looping through all the frames
            while cap.isOpened():  # making sure it is reading frames

                # Reading the frames
                ret, frame = cap.read()
                if not ret:  # in case a frame wasn't successfully read or the last frame was already worked on
                    break

                # Resizing every frame to a commom value
                frame = cv2.resize(frame, (256, 256))

                # Making detections
                image, results = landmark_detection(frame, holistic)
                
                # Extracting landmarks
                # The list for each video will have: 1086 columns (landmark's coordinates) and number of rows equal to the number of frames of the video
                landmarks_list.append(landmark_extraction(results))

                cv2.waitKey(10)
            cap.release()
            cv2.destroyAllWindows()

        # Saving the NumPy array
        np.save(landmarks_path + '/' + item.split(".mp4")[0], np.array(landmarks_list))
        
        # Converting and storing the array into parquet file
        # parquet_writer(np.array(landmarks_list), item.split('.mp4')[0])

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
[NULL @ 0x299abcf40] Invalid NAL unit size (71678 > 10776).
[NULL @ 0x299abcf40] missing picture in access unit with size 10780
[h264 @ 0x299a76870] Invalid NAL unit size (71678 > 10776).
[h264 @ 0x299a76870] Error splitting the input into NAL units.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x299ab2da0] stream 1, offset 0x2a27a7: partial file
[h264 @ 0x114fd6be0] Invalid NAL unit size (745 > 472).
[h264 @ 0x114fd6be0] Error splitting the input into NAL units.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15fd65b40] stream 1, offset 0x3b468: partial file
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x15fd65b40] stream 1, offset 0x3b7d3: partial file
