# Pose Estimation with MoveNet

This Jupyter Notebook demonstrates how to perform real-time pose estimation using the MoveNet model from TensorFlow Lite. The notebook is developed based on the tutorial provided by Nicholas Renotte, which can be found [here](https://www.youtube.com/watch?v=SSW9LzOJSus).

## Workflow Overview

1. **Import Libraries**: Essential libraries such as TensorFlow, NumPy, Matplotlib, and OpenCV are imported.
2. **Load MoveNet Model**: The MoveNet model is loaded from TensorFlow Hub.
3. **Define Helper Functions**: Functions to draw keypoints and connections on the frames are defined.
4. **Real-time Pose Estimation**: Capture video from the webcam, process each frame to detect keypoints, and render the keypoints and connections on the frame.

## Variables

- **EDGES**: A dictionary defining the connections between keypoints.
- **cap**: An OpenCV VideoCapture object for capturing video from the webcam.
- **frame**: A NumPy array representing the current frame captured from the webcam.
- **img**: A TensorFlow tensor representing the resized and padded image.
- **input_details**: A list containing details about the input tensor for the MoveNet model.
- **input_image**: A TensorFlow tensor representing the input image for the MoveNet model.
- **interpreter**: A TensorFlow Lite interpreter object for running the MoveNet model.
- **keypoints_with_scores**: A NumPy array containing the detected keypoints and their confidence scores.
- **output_details**: A list containing details about the output tensor from the MoveNet model.
- **ret**: A boolean indicating whether the frame was successfully captured from the webcam.

This notebook provides a comprehensive guide to implementing real-time pose estimation using the MoveNet model, making it a valuable resource for anyone interested in computer vision and pose estimation.

In [2]:
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt
import cv2

In [3]:
interpreter = tf.lite.Interpreter(model_path="3.tflite")
interpreter.allocate_tensors()

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


In [4]:
interpreter.get_input_details()

[{'name': 'serving_default_input:0',
  'index': 0,
  'shape': array([  1, 192, 192,   3], dtype=int32),
  'shape_signature': array([  1, 192, 192,   3], dtype=int32),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

In [14]:
def draw_keypoints(frame, keypoints, confidence_threshold):
    y, x, c = frame.shape
    shape = np.squeeze(np.multiply(keypoints, [y, x, 1]))

    for kp in shape:
        ky, kx = int(kp[0]), int(kp[1])
        if confidence_threshold > 0.2:
            cv2.circle(frame, (kx, ky), 10, (255, 0, 0), -1)

In [15]:
def draw_connections(frame, keypoints, edges, confidence_threshold):
    y, x, c = frame.shape
    shaped = np.squeeze(np.multiply(keypoints, [y,x,1]))
    
    for edge, color in edges.items():
        p1, p2 = edge
        y1, x1, c1 = shaped[p1]
        y2, x2, c2 = shaped[p2]
        
        if (c1 > confidence_threshold) & (c2 > confidence_threshold):      
            cv2.line(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0,0,255), 5)

In [16]:
EDGES = {
    (0, 1): 'm',
    (0, 2): 'c',
    (1, 3): 'm',
    (2, 4): 'c',
    (0, 5): 'm',
    (0, 6): 'c',
    (5, 7): 'm',
    (7, 9): 'm',
    (6, 8): 'c',
    (8, 10): 'c',
    (5, 6): 'y',
    (5, 11): 'm',
    (6, 12): 'c',
    (11, 12): 'y',
    (11, 13): 'm',
    (13, 15): 'm',
    (12, 14): 'c',
    (14, 16): 'c'
}

This cell captures video from the webcam and performs real-time pose estimation using the MoveNet model. The steps are as follows:

1. **Capture Video**: Initialize the webcam using `cv2.VideoCapture(0)` and continuously capture frames while the webcam is open.
2. **Reshape Image**: Copy the captured frame and resize it to the required dimensions (192x192) using TensorFlow's `resize_with_pad` function. Convert the resized image to a TensorFlow tensor of type `float32`.
3. **Setup Input and Output**: Retrieve the input and output details of the MoveNet model interpreter.
4. **Make Predictions**: Set the input tensor for the interpreter with the reshaped image and invoke the interpreter to make predictions. Extract the keypoints and their confidence scores from the output tensor.
5. **Rendering**: Draw the keypoints and connections on the frame using the `draw_connections` and `draw_keypoints` functions.
6. **Display Frame**: Display the processed frame with keypoints and connections using OpenCV's `imshow` function.
7. **Exit Condition**: Break the loop and release the webcam if the 'q' key is pressed.
8. **Cleanup**: Release the webcam and close all OpenCV windows.

In [17]:
cap = cv2.VideoCapture(0)
while cap.isOpened():
    ret, frame = cap.read()
    
    # Reshape image
    img = frame.copy()
    img = tf.image.resize_with_pad(np.expand_dims(img, axis=0), 192,192)
    input_image = tf.cast(img, dtype=tf.float32)
    
    # Setup input and output 
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    
    # Make predictions 
    interpreter.set_tensor(input_details[0]['index'], np.array(input_image))
    interpreter.invoke()
    keypoints_with_scores = interpreter.get_tensor(output_details[0]['index'])
    
    # Rendering 
    draw_connections(frame, keypoints_with_scores, EDGES, 0.4)
    draw_keypoints(frame, keypoints_with_scores, 0.4)
    
    cv2.imshow('MoveNet Lightning', frame)
    
    if cv2.waitKey(10) & 0xFF==ord('q'):
        break
        
cap.release()
cv2.destroyAllWindows()