# MediaPipe pilot code

The sections below are pilot codes for setting up MediaPipe landmark detection models. The models all use the PC camera as an input source.
The code is designed so that all sections are independent. So, to execute a section, simply run the Starter Code section, then the section corresponding to the task to be performed.

Features implemented:
- Hand static poses classification
- Face landmarks detection
- Body landmarks detection
- Holistic landmarks detection
  
*Useful links*
- [MediaPipe API reference](https://developers.google.com/mediapipe/api/solutions)
- [MediaPipe source code](https://github.com/google/mediapipe/blob/master/mediapipe/python)

## Starter code

### Imports

In [3]:
import cv2
import numpy as np
import mediapipe as mp
# import tensorflow as tf
# from tensorflow.keras.models import load_model
from mediapipe.python.solutions.drawing_utils import DrawingSpec

### Utility Functions and glocal parameters

In [4]:
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

landmarks_style = DrawingSpec(color=(49, 209, 255))
connections_style = DrawingSpec(color=(255, 0, 0), thickness=1)

# Pose model complexities
model_complexities = {'light': 0, 'medium': 1, 'heavy': 2}

## Hand static poses classification

This section uses the hand detection model included in MediaPipe.   
The detected landmarks are then classified to detect the hand pose among 10 pre-recorded poses.  
This code is a good starting point for the classification of static poses.

### Initialize MediaPipe model

We must specify the number of hands to be detected, as well as the confidence threshold above which the hand is actually detected.

- `mpHands` contains all the MediaPipe models and utility classes ralated to hand detection models
- `mpDraw` instantiate a utility class from MediaPipe to directly draw the detected landmarks on the displayed frame

In [13]:
mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1, min_detection_confidence=0.7)

### Load model used to classify the gestures

In [14]:
# Load the gesture recognizer model
model = load_model('mp_hand_gesture')
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_2 (Flatten)         (None, 42)                0         
                                                                 
 dense_11 (Dense)            (None, 64)                2752      
                                                                 
 dense_12 (Dense)            (None, 128)               8320      
                                                                 
 dense_13 (Dense)            (None, 512)               66048     
                                                                 
 dense_14 (Dense)            (None, 64)                32832     
                                                                 
 dense_15 (Dense)            (None, 32)                2080      
                                                                 
 dense_16 (Dense)            (None, 10)               

In [15]:
# Load class names
f = open('gesture.names', 'r')
class_names = f.read().split('\n')
f.close()
class_names

['okay',
 'peace',
 'thumbs up',
 'thumbs down',
 'call me',
 'stop',
 'rock',
 'live long',
 'fist',
 'smile']

### Main video loop

This loop contains the instructions executed for each frame.
1. the frame is captured from the camera
2. the frame is vertically flipped to be displayed as a mirror
3. the frame is converted as an RGB image (instead of BRG)
4. we use the MediaPipe hands model to predict the landmarks position

In [17]:
# Initialize the webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    # Read each frame from the webcam
    _, frame = cap.read()

    x, y, c = frame.shape

    # Flip the frame vertically
    frame = cv2.flip(frame, 1)
    # Convert the frame to RGB
    frame.flags.writeable = False
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    # Get hand landmark prediction
    result = hands.process(frame)

    frame.flags.writeable = True
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    
    class_name = ''

    # post process the result
    if result.multi_hand_landmarks:
        landmarks = []
        # Here we only have one hand by default but in general the loops allows to handle several hands
        for hands_lms in result.multi_hand_landmarks:
            for lm in hands_lms.landmark:
                
                # Denormalize coordinates
                lmx = int(lm.x * x)
                lmy = int(lm.y * y)
                landmarks.append([lmx, lmy])

            # Drawing landmarks on frames
            mp_drawing.draw_landmarks(
                image=frame, 
                landmark_list=hands_lms, 
                connections=mpHands.HAND_CONNECTIONS,
                connection_drawing_spec=mp_drawing_styles.get_default_hand_connections_style())

            # Predict gesture (probability distribution)
            prediction = model.predict([landmarks], verbose=0)
            classID = np.argmax(prediction)
            class_name = class_names[classID]

    # show the prediction on the frame
    cv2.putText(frame, f'Predicted class: {class_name}', (10, 80), cv2.FONT_HERSHEY_SIMPLEX, 
                   1, (49, 209, 255), 2)
    
    cv2.putText(frame, 'Hand pose estimation', (120, 30), cv2.FONT_HERSHEY_DUPLEX, 1, (49, 209, 255), 2)
    cv2.putText(frame, 'Press Q to exit', (120, 50), cv2.FONT_HERSHEY_PLAIN, 1, (0,0,0), 1)

    # Show the final output
    cv2.imshow("Output", frame) 

    if cv2.waitKey(1) == ord('q'):
        break

# release the webcam and destroy all active windows
cap.release()
cv2.destroyAllWindows()

## Face landmarks detection

### Initialize MediaPipe model

In [12]:
mp_face_mesh = mp.solutions.face_mesh
faces = mp_face_mesh.FaceMesh(max_num_faces=1, refine_landmarks=True)

### Main video loop

In [13]:
# Initialize the webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    # Read each frame from the webcam
    success, frame = cap.read()

    # jump to the next loop if a frame is not captured
    if not success:
        continue

    # Flip the frame vertically
    frame = cv2.flip(frame, 1)
    # Convert the frame to RGB
    frame.flags.writeable = False
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    # Get hand landmark prediction
    result = faces.process(frame)

    frame.flags.writeable = True
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

    if result.multi_face_landmarks:
        for face_lms in result.multi_face_landmarks:
            
            # # Draw face mesh
            # mp_drawing.draw_landmarks(
            #     image=frame,
            #     landmark_list=face_lms,
            #     connections=mp_face_mesh.FACEMESH_TESSELATION,
            #     landmark_drawing_spec=None,
            #     connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_tesselation_style())

            # Draw face contours
            # mp_drawing.draw_landmarks(
            #     image=frame,
            #     landmark_list=face_lms,
            #     connections=mp_face_mesh.FACEMESH_CONTOURS,
            #     landmark_drawing_spec=None,
            #     connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_contours_style())

            # Draw irises
            mp_drawing.draw_landmarks(
                image=frame,
                landmark_list=face_lms,
                connections=mp_face_mesh.FACEMESH_IRISES,
                landmark_drawing_spec=None,
                connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_iris_connections_style())
            


    cv2.putText(frame, 'Face landmarks detection', (120, 30), cv2.FONT_HERSHEY_DUPLEX, 1, (49, 209, 255), 2)
    cv2.putText(frame, 'Press Q to exit', (120, 50), cv2.FONT_HERSHEY_PLAIN, 1, (0,0,0), 1)

    # Show the final output
    cv2.imshow("Output", frame) 

    if cv2.waitKey(1) == ord('q'):
        break

# release the webcam and destroy all active windows
cap.release()
cv2.destroyAllWindows()

## Body landmarks detection

In [18]:
mp_body_mesh = mp.solutions.pose
poses = mp_body_mesh.Pose(smooth_segmentation=True, model_complexity=model_complexities['light'])

In [19]:
# Initialize the webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    # Read each frame from the webcam
    success, frame = cap.read()

    # jump to the next loop if a frame is not captured
    if not success:
        continue

    # Flip the frame vertically
    frame = cv2.flip(frame, 1)
    # Convert the frame to RGB
    frame.flags.writeable = False
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    # Get hand landmark prediction
    result = poses.process(frame)

    frame.flags.writeable = True
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
            
    # Draw face mesh
    mp_drawing.draw_landmarks(
        image=frame,
        landmark_list=result.pose_landmarks,
        connections=mp_body_mesh.POSE_CONNECTIONS,
        landmark_drawing_spec=landmarks_style,
        connection_drawing_spec=connections_style
        )
            


    cv2.putText(frame, 'Pose landmarks detection', (120, 30), cv2.FONT_HERSHEY_DUPLEX, 1, (49, 209, 255), 2)
    cv2.putText(frame, 'Press Q to exit', (120, 50), cv2.FONT_HERSHEY_PLAIN, 1, (0,0,0), 1)

    # Show the final output
    cv2.imshow("Output", frame) 

    if cv2.waitKey(1) == ord('q'):
        break

# release the webcam and destroy all active windows
cap.release()
cv2.destroyAllWindows()

## Holistic model landmarks

[holistic.py source code](https://github.com/google/mediapipe/blob/master/mediapipe/python/solutions/holistic.py)

In [8]:
mp_holistic_mesh = mp.solutions.holistic
holistics = mp_holistic_mesh.Holistic(static_image_mode=False, model_complexity=model_complexities['light'])

### Main loop

In [17]:
# Initialize the webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    # Read each frame from the webcam
    success, frame = cap.read()

    # jump to the next loop if a frame is not captured
    if not success:
        continue

    # Flip the frame vertically
    frame = cv2.flip(frame, 1)
    # Convert the frame to RGB
    frame.flags.writeable = False
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    # Get hand landmark prediction
    result = holistics.process(frame)

    frame.flags.writeable = True
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
            
    # Draw face mesh
    mp_drawing.draw_landmarks(
        image=frame,
        landmark_list=result.face_landmarks,
        connections=mp_holistic_mesh.FACEMESH_TESSELATION,
        landmark_drawing_spec=None,
        connection_drawing_spec=connections_style
        )
            


    cv2.putText(frame, 'Holistic landmarks detection', (120, 30), cv2.FONT_HERSHEY_DUPLEX, 1, (49, 209, 255), 2)
    cv2.putText(frame, 'Press Q to exit', (120, 50), cv2.FONT_HERSHEY_PLAIN, 1, (0,0,0), 1)

    # Show the final output
    cv2.imshow("Output", frame) 

    if cv2.waitKey(1) == ord('q'):
        break

# release the webcam and destroy all active windows
cap.release()
cv2.destroyAllWindows()

NoneType