![Degirum banner](https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/degirum_banner.png)
## Hand Tracking and Palm Keypoint Detection Sample

This notebook is an example of how to perform hand detection with tracking with following detection of palm keypoints.
The combined result of hand detection and palm keypoints detection is used to imitate mouse operation.

This script works with the following inference options:

1. Run inference on DeGirum Cloud Platform;
2. Run inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN;
3. Run inference on DeGirum ORCA accelerator directly installed on your computer.

To try different options, you need to specify the appropriate `hw_location` option.

When running this notebook locally, you need to specify your cloud API access token in the [env.ini](../../env.ini) file, located in the same directory as this notebook.

When running this notebook in Google Colab, the cloud API access token should be stored in a user secret named `DEGIRUM_CLOUD_TOKEN`.

In [None]:
# make sure degirum-tools package and other dependencies are installed
!pip show degirum-tools || pip install degirum-tools

#### Specify where you want to run your inferences, model zoo url, model name and image source

In [None]:
# hw_location: where you want to run inference
#     @cloud to use DeGirum cloud
#     @local to run on local machine
#     IP address for AI server inference
# model_zoo_url: url/path for model zoo
#     cloud_zoo_url: valid for @cloud, @local, and ai server inference options
#     '': ai server serving models from local folder
#     path to json file: single model zoo in case of @local inference
# video_source: video source for inference
#     camera index for local camera
#     URL of RTSP stream
#     URL of YouTube Video
#     path to video file (mp4 etc)
hw_location = "@cloud"
model_zoo_url = "degirum/public"
video_source = "https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/HandPalm.mp4"

#### The rest of the cells below should run without any modifications

In [None]:
import degirum as dg, degirum_tools
import cv2, numpy as np

# load hand bbox detection model
hand_model = dg.load_model(
    model_name= "yolo_v5s_hand_det--512x512_quant_n2x_orca1_1",
    inference_host_address=hw_location,
    zoo_url=model_zoo_url,
    token=degirum_tools.get_token(),
    overlay_show_probabilities=False,
    overlay_show_labels=False,
    overlay_line_width=1,
)

# load palm landmarks detection model
palm_model = dg.load_model(
    model_name= "mobilenet_v2_hand_landmarks--224x224_float_n2x_orca1_1",
    inference_host_address=hw_location,
    zoo_url=model_zoo_url,
    token=degirum_tools.get_token(),
    overlay_show_probabilities=False,
    overlay_show_labels=False,
    overlay_line_width=1,
)

# create object tracker
tracker = degirum_tools.ObjectTracker(
    track_thresh=0.35,
    track_buffer=100,
    match_thresh=0.9999,
    anchor_point=degirum_tools.AnchorPoint.CENTER,
)

# create object selector to track only one hand
selector = degirum_tools.ObjectSelector(
    top_k=1,
    use_tracking=True,
    selection_strategy=degirum_tools.ObjectSelectionStrategies.LARGEST_AREA,
)

# attach object tracker and object selector analyzers to hand model
degirum_tools.attach_analyzers(hand_model, [tracker, selector])

# create compound model for hand detection and palm landmarks detection
model = degirum_tools.CroppingAndDetectingCompoundModel(
    hand_model,
    palm_model,
    crop_extent=30.0,
)

# define palm landmarks to be filtered
palm_landmarks = [
    "ThumbTip",
    "IndexFingerTip",
    "IndexFingerMCP",
    "MiddleFingerTip",
    "MiddleFingerMCP",
    "Wrist",
]

# create low-pass filters for palm landmarks
lpf = {
    pt: degirum_tools.FIRFilterLP(normalized_cutoff=0.1, taps_cnt=7, dimension=3)
    for pt in palm_landmarks
}

# inference loop
with degirum_tools.Display("AI Camera") as display:
    for inference_result in degirum_tools.predict_stream(model, video_source):
        image = inference_result.image_overlay
        if inference_result.results:
            for r in inference_result.results:

                if "landmarks" in r:

                    def landmark(results, label):
                        try:
                            return next(
                                r for r in results if r.get("label", "") == label
                            ).get("landmark", None)
                        except StopIteration:
                            return None

                    # filter landmark coordinates
                    coords = {
                        pt: lpf[pt](landmark(r["landmarks"], pt)).astype(int)
                        for pt in palm_landmarks
                    }

                    # detect proximity of thumb tip to other finger tips
                    d1 = np.linalg.norm(coords["ThumbTip"] - coords["IndexFingerTip"])
                    d2 = np.linalg.norm(coords["ThumbTip"] - coords["MiddleFingerTip"])
                    thr = 0.2 * (
                        np.linalg.norm(coords["IndexFingerMCP"] - coords["Wrist"])
                        + np.linalg.norm(coords["MiddleFingerMCP"] - coords["Wrist"])
                    )
                    clicked = (d1 + d2) * 0.5 < thr

                    # draw position of index finger tip and click status
                    pos = coords["IndexFingerTip"][:2]
                    cv2.circle(image, pos, 7, (0, 0, 255))
                    if clicked:
                        degirum_tools.put_text(
                            image,
                            "CLICK!",
                            pos + 10,
                            font_color=(0, 0, 255),
                        )

        display.show(image)