# Live 3D Human Pose Estimation with OpenVINO

This notebook demonstrates live 3D Human Pose Estimation with OpenVINO.We use the model [human-pose-estimation-3d-0001](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/human-pose-estimation-3d-0001) from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/). At the bottom of this notebook, you will see live inference results from your webcam. You can also upload a video file.

> NOTE: _To use the webcam, you must run this Jupyter notebook on a computer with a webcam. If you run on a server, the webcam will not work. However, you can still do inference on a video in the final step.  This demo uses WebGL and is intended to work with the browser for better presentation._

## Imports

In [None]:
import collections
import sys
import time
import warnings
from pathlib import Path

import cv2
import ipywidgets as widgets
import numpy as np
from IPython.display import clear_output, display
from openvino.runtime import Core, PartialShape

warnings.filterwarnings("ignore", category=DeprecationWarning)

sys.path.append("../utils")
import notebook_utils as utils

sys.path.append("./modules")
import modules.engine3js as engine
from modules.parse_poses import parse_poses

In [None]:
# use visualization library
engine3D = engine.Engine3js(grid=True, axis=True)

## The model

### Download the model

We use `omz_downloader`, which is a command line tool from the `openvino-dev` package. `omz_downloader` automatically creates a directory structure and downloads the selected model.

If you want to download another model, please change the model name and precision. *Note: This will require a different pose extractor.

In [None]:
# directory where model will be downloaded
base_model_dir = "model"

# model name as named in Open Model Zoo
model_name = "human-pose-estimation-3d-0001"
# selected precision (FP32, FP16)
precision = "FP32"

model_dir = Path("./model").expanduser()

BASE_MODEL_NAME = f"model/public/{model_name}/{model_name}"
model_path = Path(BASE_MODEL_NAME).with_suffix(".pth")
onnx_path = Path(BASE_MODEL_NAME).with_suffix(".onnx")

ir_model_path = f"model/public/{model_name}/{precision}/{model_name}.xml"
model_weights_path = f"model/public/{model_name}/{precision}/{model_name}.bin"
video_path = f"data/face-demographics-walking.mp4"

if not model_path.exists():
    download_command = (
        f"omz_downloader "
        f"--name {model_name} "
        #  f"--precision {precision} "
        f"--output_dir {model_dir}"
    )
    ! $download_command

### Convert Model to OpenVINO IR format
We use `omz_converter` to convert the ONNX format model to the OpenVINO format model.

In [None]:
if not onnx_path.exists():
    convert_command = (
        f"omz_converter "
        f"--name {model_name} "
        # f"--precisions {precision} "
        f"--download_dir {model_dir} "
        f"--output_dir {model_dir}"
    )
    ! $convert_command

### Load the model

Converted models are located in a fixed structure, which indicates vendor, model name and precision.

Only a few lines of code are required to run the model. First, we initialize the Inference Engine. Then we read the network architecture and model weights from the .bin and .xml files to compile it for the desired device.Creates an inference request object used to infer the compiled model.The created request has allocated input and output tensors.

In [None]:
# initialize inference engine
ie_core = Core()
# read the network and corresponding weights from file
model = ie_core.read_model(model=ir_model_path, weights=model_weights_path)
# load the model on the CPU (you can use GPU or MYRIAD as well)
compiled_model = ie_core.compile_model(model=model, device_name="CPU")
infer_request = compiled_model.create_infer_request()
input_tensor_name = model.inputs[0].get_any_name()

# get input and output names of nodes
input_layer = compiled_model.input(0)
output_layers = list(compiled_model.outputs)

# get input size
height, width = list(input_layer.shape)[2:]

## Processing
### Model Inference
The image data is used as the input of the model to obtain the output heatmaps, PAF and features.

In [None]:
def model_infer(scaled_img, stride):
    """
    inference

    Parameters:
        scaled_img: resized image
        stride: int, the stride of the window
    """
    global infer_request, input_tensor_name, model

    # number, channel, height, width
    n, c, h, w = model.inputs[0].shape
    # Remove excess space from the picture
    img = scaled_img[
        0 : scaled_img.shape[0] - (scaled_img.shape[0] % stride),
        0 : scaled_img.shape[1] - (scaled_img.shape[1] % stride),
    ]
    # Adapt the input of the model to the image size
    if h != img.shape[0] or w != img.shape[1]:
        model.reshape(
            {input_tensor_name: PartialShape([n, c, img.shape[0], img.shape[1]])}
        )
        compiled_model = ie_core.compile_model(model, "CPU")
        infer_request = compiled_model.create_infer_request()

    img = np.transpose(img, (2, 0, 1))[
        None,
    ]
    infer_request.infer({input_tensor_name: img})
    # The set of three inference results is obtained
    results = {
        name: infer_request.get_tensor(name).data[:]
        for name in {"features", "heatmaps", "pafs"}
    }
    # Get the results
    results = (results["features"][0], results["heatmaps"][0], results["pafs"][0])

    return results


def rotate_poses(poses_3d, R, t):
    """
    Rotating human coordinates

    Parameters:
        poses_3d: The position of the human body in the 3-D coordinate system
        R: int, the rotation matrix
        t: int, the translation matrix
    """
    R_inv = np.linalg.inv(R)
    for pose_id in range(poses_3d.shape[0]):
        pose_3d = poses_3d[pose_id].reshape((-1, 4)).transpose()
        pose_3d[0:3] = np.dot(R_inv, pose_3d[0:3] - t)
        poses_3d[pose_id] = pose_3d.transpose().reshape(-1)

    return poses_3d

### Draw 2D Pose Overlays
We need to define some connections between joints in advance, so that we can draw the structure of the human body in the image when we get the results.
Joints are drawn as circles and limbs are drawn as lines. The code is based on the [3D Human Pose Estimation Demo](https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/human_pose_estimation_3d_demo/python) from Open Model Zoo.

In [None]:
# # 3D edge index array
body_edges = np.array(
    [
        [11, 10],
        [10, 9],
        [9, 0],
        [0, 3],
        [3, 4],
        [4, 5],
        [0, 6],
        [6, 7],
        [7, 8],
        [0, 12],
        [12, 13],
        [13, 14],
        [0, 1],
        [1, 15],
        [15, 16],
        [1, 17],
        [17, 18],
    ]
)

body_edges_2d = np.array(
    [
        [0, 1],  # neck - nose
        [1, 16],
        [16, 18],  # nose - l_eye - l_ear
        [1, 15],
        [15, 17],  # nose - r_eye - r_ear
        [0, 3],
        [3, 4],
        [4, 5],  # neck - l_shoulder - l_elbow - l_wrist
        [0, 9],
        [9, 10],
        [10, 11],  # neck - r_shoulder - r_elbow - r_wrist
        [0, 6],
        [6, 7],
        [7, 8],  # neck - l_hip - l_knee - l_ankle
        [0, 12],
        [12, 13],
        [13, 14],
    ]
)  # neck - r_hip - r_knee - r_ankle


def draw_poses(img, poses_2d):
    for pose in poses_2d:
        pose = np.array(pose[0:-1]).reshape((-1, 3)).transpose()

        was_found = pose[2] > 0

        # Draw joints.
        for edge in body_edges_2d:
            if was_found[edge[0]] and was_found[edge[1]]:
                cv2.line(
                    img,
                    tuple(pose[0:2, edge[0]].astype(np.int32)),
                    tuple(pose[0:2, edge[1]].astype(np.int32)),
                    (255, 255, 0),
                    4,
                    cv2.LINE_AA,
                )
        # Draw limbs.
        for kpt_id in range(pose.shape[1]):
            if pose[2, kpt_id] != -1:
                cv2.circle(
                    img,
                    tuple(pose[0:2, kpt_id].astype(np.int32)),
                    3,
                    (0, 255, 255),
                    -1,
                    cv2.LINE_AA,
                )

### Main Processing Function

Run 3D pose estimation on the specified source. Either a webcam or a video file.

In [None]:
def run_pose_estimation(source=0, flip=False, engine3D=engine3D, use_popup=False):

    base_height = 256  # default
    fx = -1  # default
    stride = 8
    if use_popup:
        # display the 3D human pose in this notebook, and origin frame in popup window
        display(engine3D.renderer)
        title = "Press ESC to Exit"
        cv2.namedWindow(
            title, cv2.WINDOW_FREERATIO
        )  # cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)
        cv2.resizeWindow(title, (1000, 1000))
    else:
        # set the 2D image box, show both human pose and image in the notebook
        imgbox = widgets.Image(format="jpg", height=300, width=400)
        display(widgets.HBox([engine3D.renderer, imgbox]))

    player = None
    line_tmp = None
    skeleton = engine.Skeleton(body_edges=body_edges)

    try:
        # create video player to play with target fps  video_path
        # get the frame from camera
        # You can skip first N frames to fast forward video. change 'skip_first_frames'
        player = utils.VideoPlayer(source, flip=flip, fps=30, skip_first_frames=10)
        # start capturing
        player.start()

        processing_times = collections.deque()

        while True:
            # grab the frame
            frame = player.next()
            if frame is None:
                print("Source ended")
                break
            input_scale = base_height / frame.shape[0]

            # resize image and change dims to fit neural network input
            # (see https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/human-pose-estimation-3d-0001)
            scaled_img = cv2.resize(frame, dsize=None, fx=input_scale, fy=input_scale)

            if fx < 0:  # Focal length is unknown
                fx = np.float32(0.8 * frame.shape[1])

            # inference start
            start_time = time.time()
            # get results
            inference_result = model_infer(scaled_img, stride)
            poses_3d, poses_2d = parse_poses(
                inference_result, input_scale, stride, fx, True
            )
            # inference stop
            stop_time = time.time()
            processing_times.append(stop_time - start_time)
            # use processing times from last 200 frames
            if len(processing_times) > 200:
                processing_times.popleft()

            processing_time = np.mean(processing_times) * 1000
            fps = 1000 / processing_time
            cv2.putText(
                frame,
                f"Inference time: {processing_time:.1f}ms ({fps:.1f} FPS)",
                # f"Lib size: {sys.getsizeof(engine3D) / 1024 / 1024:.1f}MB",
                (20, 40),
                cv2.FONT_HERSHEY_COMPLEX,
                1,
                (0, 0, 255),
                1,
                cv2.LINE_AA,
            )

            if len(poses_3d) > 0:
                # From here, you can rotate the 3D point positions using the function "draw_poses",
                # or you can directly make the correct mapping below to properly display the object image on the screen
                poses_3d_copy = poses_3d.copy()
                x = poses_3d_copy[:, 0::4]
                y = poses_3d_copy[:, 1::4]
                z = poses_3d_copy[:, 2::4]
                poses_3d[:, 0::4], poses_3d[:, 1::4], poses_3d[:, 2::4] = (
                    -z + np.ones(poses_3d[:, 2::4].shape) * 200,
                    -y + np.ones(poses_3d[:, 2::4].shape) * 100,
                    -x,
                )

                poses_3d = poses_3d.reshape(poses_3d.shape[0], 19, -1)[:, :, 0:3]
                people = skeleton(poses_3d=poses_3d)

                try:
                    engine3D.scene.remove(line_tmp)
                except Exception:
                    pass

                engine3D.scene.add(people)
                line_tmp = people

                # draw 2D
                draw_poses(frame, poses_2d)

            else:
                try:
                    engine3D.scene.remove(line_tmp)
                except Exception:
                    pass

            if use_popup:
                cv2.imshow(title, frame)
                key = cv2.waitKey(1)
                # escape = 27, use ESC to except
                if key == 27:
                    break
            else:
                # encode numpy array to jpg
                imgbox.value = cv2.imencode(
                    ".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90]
                )[1].tobytes()

            engine3D.renderer.render(engine3D.scene, engine3D.cam)

            # # set frames limit, if you need
            # frames_processed += 1
            # if video_writer.isOpened() and (frames_processed <= 0 or frames_processed <= 2000):
            #     video_writer.write(frame)

    except KeyboardInterrupt:
        print("Interrupted")
    except RuntimeError as e:
        print(e)
    finally:
        if player is not None:
            # stop capturing
            player.stop()
        if use_popup:
            cv2.destroyAllWindows()
        if line_tmp:
            engine3D.scene.remove(line_tmp)

## Run

### Run Live Pose Estimation

Run using a webcam as the video input. By default, the primary webcam is set with `source=0`. If you have multiple webcams, each one will be assigned a consecutive number starting at 0. Set `flip=True` when using a front-facing camera. Some web browsers, especially Mozilla Firefox, may cause flickering. If you experience flickering, set `use_popup=True`.

*Note: To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a server (e.g. Binder), the webcam will not work.*

*Note: Popup mode may not work if you run this notebook on a remote computer (e.g. Binder).*

In [None]:
run_pose_estimation(source=0, flip=True, engine3D=engine3D, use_popup=False)
clear_output()

### Run Pose Estimation on a Video File

If you don't have a webcam, you can still run this demo with a video file. Any [format supported by OpenCV](https://docs.opencv.org/4.5.1/dd/d43/tutorial_py_video_display.html) will work. 
You can click and move your mouse over the picture on the left to interact.

In [None]:
run_pose_estimation(source=video_path, flip=False, engine3D=engine3D, use_popup=False)