# Demonstrating the Pose Evaluation Repo

Demonstrating how to use the _pose evaluation_ toolkit (https://github.com/sign-language-processing/pose-evaluation).

Demonstrates:
* How to reconstruct the metrics from our paper.
* How to use them to score poses, with signatures.
* How to score poses with different lengths, missing/undetected keypoints, or different keypoint formats.


```
@misc{pose-evaluation2025,
    title={Meaningful Pose-Based Sign Language Evaluation},
    author={Zifan Jiang, Colin Leong, Amit Moryossef, Anne Göhring, Annette Rios, Oliver Cory, Maksym Ivashechkin, Neha Tarigopula, Biao Zhang, Rico Sennrich, Sarah Ebling},
    howpublished={\url{https://github.com/sign-language-processing/pose-evaluation}},
    year={2025}
}
```

## Install from source

It will likely ask you to restart the kernel. Do so, then skip to the imports.

In [None]:
!git clone https://github.com/sign-language-processing/pose-evaluation.git

fatal: destination path 'pose-evaluation' already exists and is not an empty directory.


In [None]:
%cd pose-evaluation

/content/pose-evaluation


In [None]:
!pip install -e .

Obtaining file:///content/pose-evaluation
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting sign_language_segmentation@ git+https://github.com/sign-language-processing/segmentation (from pose-evaluation==0.0.1)
  Cloning https://github.com/sign-language-processing/segmentation to /tmp/pip-install-1f5eu698/sign-language-segmentation_f9733d4a7bc448bdb9a89568f43eec91
  Running command git clone --filter=blob:none --quiet https://github.com/sign-language-processing/segmentation /tmp/pip-install-1f5eu698/sign-language-segmentation_f9733d4a7bc448bdb9a89568f43eec91
  Resolved https://github.com/sign-language-processing/segmentation to commit 4ac7b10b9878b6c60bbc14ba8ebe09af386f0cfe
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?2

## Imports

In [None]:
from pathlib import Path

from pose_format import Pose

from pose_evaluation.metrics.distance_measure import AggregatedPowerDistance
from pose_evaluation.metrics.distance_metric import DistanceMetric
from pose_evaluation.metrics.dtw_metric import DTWDTAIImplementationDistanceMeasure
from pose_evaluation.metrics.embedding_distance_metric import EmbeddingDistanceMetric
from pose_evaluation.metrics.pose_processors import (
    HideLegsPosesProcessor,
    NormalizePosesProcessor,
    ReduceHolisticPoseProcessor,
    ZeroPadShorterPosesProcessor,
    get_standard_pose_processors,
)
from pose_evaluation.evaluation.create_metrics import construct_metric

# DTW$p$
DTW$p$ is one of two top pose-similarity metrics from the paper.

DTWp=DTW+Trim+Default0.0+Hands-Only


(Masked Fill is 10.0 by default, and thus not mentioned)

## Construct metric via convenience function

In [None]:


###############################################
# Construct DTWp via convenience function, aka
# startendtrimmed_unnormalized_hands_defaultdist0.0_nointerp_dtw_fillmasked10.0_dtaiDTWAggregatedDistanceMetricFast
DTWp = construct_metric(
    distance_measure=DTWDTAIImplementationDistanceMeasure(name="dtaiDTWAggregatedDistanceMeasureFast", use_fast=True),
    default_distance=0.0,
    trim_meaningless_frames=True,
    normalize=False,
    sequence_alignment="dtw",
    keypoint_selection="hands", # keep only hand keypoints for all poses
    masked_fill_value=10.0, # fill masked values with 10.0
    fps=None, # don't interpolate fps
    name = None, # autogenerate name
    )
DTWp.name


'startendtrimmed_unnormalized_hands_defaultdist0.0_nointerp_dtw_fillmasked10.0_dtaiDTWAggregatedDistanceMetricFast'

### Print metric signature

In [None]:
DTWp.get_signature()

startendtrimmed_unnormalized_hands_defaultdist0.0_nointerp_dtw_fillmasked10.0_dtaiDTWAggregatedDistanceMetricFast|higher_is_better:no|pose_preprocessors:[trim_pose|start:yes|end:yes,get_hands_only,fill_masked_or_invalid|fill_val:10.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:0.0|aggregation_strategy:mean|use_fast:yes}

## Construct the same metric from scratch

In [None]:
from pose_evaluation.metrics.pose_processors import (
    AddTOffsetsToZPoseProcessor,
    FillMaskedOrInvalidValuesPoseProcessor,
    FirstFramePadShorterPosesProcessor,
    GetHandsOnlyHolisticPoseProcessor,
    GetYoutubeASLKeypointsPoseProcessor,
    HideLegsPosesProcessor,
    InterpolateAllToSetFPSPoseProcessor,
    MaskInvalidValuesPoseProcessor,
    NormalizePosesProcessor,
    ReduceHolisticPoseProcessor,
    ReducePosesToCommonComponentsProcessor,
    RemoveWorldLandmarksProcessor,
    TrimMeaninglessFramesPoseProcessor,
    ZeroPadShorterPosesProcessor,
)

# select distance measure with default distance
measure = DTWDTAIImplementationDistanceMeasure(name="dtaiDTWAggregatedDistanceMeasureFast", use_fast=True, default_distance=0.0)

# create pose preprocessing pipeline
pose_preprocessors =[]
pose_preprocessors.append(TrimMeaninglessFramesPoseProcessor())
# pose_preprocessors.append(NormalizePosesProcessor()) # this metric doesn't do normalization
pose_preprocessors.append(GetHandsOnlyHolisticPoseProcessor()) # select only the hands
pose_preprocessors.append(FillMaskedOrInvalidValuesPoseProcessor(masked_fill_value=10.0)) # fill masked values with 10.0
# pose_preprocessors.append(InterpolateAllToSetFPSPoseProcessor(fps=None)) # don't interpolate

# reduce pairs of poses to common components
pose_preprocessors.append(ReducePosesToCommonComponentsProcessor())


DTWp_from_scratch = DistanceMetric(
    distance_measure=measure,
    name="DTWp",
    pose_preprocessors=pose_preprocessors,
    )
DTWp_from_scratch.get_signature()

DTWp|higher_is_better:no|pose_preprocessors:[trim_pose|start:yes|end:yes,get_hands_only,fill_masked_or_invalid|fill_val:10.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:0.0|aggregation_strategy:mean|use_fast:yes}

## Compare signatures
Other than the _name_ the two signatures are identical.

In [None]:
print(DTWp.get_signature().format())
print("\t\t\t\t\t\t\t\t\t\t\t\t\t    "+DTWp_from_scratch.get_signature().format())

startendtrimmed_unnormalized_hands_defaultdist0.0_nointerp_dtw_fillmasked10.0_dtaiDTWAggregatedDistanceMetricFast|higher_is_better:no|pose_preprocessors:[trim_pose|start:yes|end:yes,get_hands_only,fill_masked_or_invalid|fill_val:10.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:0.0|aggregation_strategy:mean|use_fast:yes}
													    DTWp|higher_is_better:no|pose_preprocessors:[trim_pose|start:yes|end:yes,get_hands_only,fill_masked_or_invalid|fill_val:10.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:0.0|aggregation_strategy:mean|use_fast:yes}


In [None]:
DTWp_sig_without_name = DTWp.get_signature().format().replace(DTWp.name,"")
DTWp_from_scratch_without_name = DTWp_from_scratch.get_signature().format().replace(DTWp_from_scratch.name, "")
print(DTWp_sig_without_name)
print(DTWp_from_scratch_without_name)
DTWp_sig_without_name == DTWp_from_scratch_without_name

|higher_is_better:no|pose_preprocessors:[trim_pose|start:yes|end:yes,get_hands_only,fill_masked_or_invalid|fill_val:10.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:0.0|aggregation_strategy:mean|use_fast:yes}
|higher_is_better:no|pose_preprocessors:[trim_pose|start:yes|end:yes,get_hands_only,fill_masked_or_invalid|fill_val:10.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:0.0|aggregation_strategy:mean|use_fast:yes}


True

# nDTWp

aka

DTW+Default1.0+MaskFill1.0+Norm.+Hands-Only

## Convenience Function

In [None]:
###############################################
# Construct nDTWp via convenience function, aka
# DTW +Default1.0 +MaskFill1.0 +Norm. +Hands-Only
# untrimmed_normalizedbyshoulders_hands_defaultdist1.0_nointerp_dtw_fillmasked1.0_dtaiDTWAggregatedDistanceMetricFast

nDTWp = construct_metric(
    distance_measure=DTWDTAIImplementationDistanceMeasure(name="dtaiDTWAggregatedDistanceMeasureFast", use_fast=True),
    default_distance=1.0,
    trim_meaningless_frames=False,
    normalize=True,
    sequence_alignment="dtw",
    keypoint_selection="hands", # keep only hand keypoints for all poses
    masked_fill_value=1.0, # fill masked values with 10.0
    fps=None, # don't interpolate fps
    name = None, # autogenerate name
    )
nDTWp.name

'untrimmed_normalizedbyshoulders_hands_defaultdist1.0_nointerp_dtw_fillmasked1.0_dtaiDTWAggregatedDistanceMetricFast'

## From Scratch

In [None]:
from pose_evaluation.metrics.pose_processors import (
    AddTOffsetsToZPoseProcessor,
    FillMaskedOrInvalidValuesPoseProcessor,
    FirstFramePadShorterPosesProcessor,
    GetHandsOnlyHolisticPoseProcessor,
    GetYoutubeASLKeypointsPoseProcessor,
    HideLegsPosesProcessor,
    InterpolateAllToSetFPSPoseProcessor,
    MaskInvalidValuesPoseProcessor,
    NormalizePosesProcessor,
    ReduceHolisticPoseProcessor,
    ReducePosesToCommonComponentsProcessor,
    RemoveWorldLandmarksProcessor,
    TrimMeaninglessFramesPoseProcessor,
    ZeroPadShorterPosesProcessor,
)

# select distance measure with default distance
measure = DTWDTAIImplementationDistanceMeasure(name="dtaiDTWAggregatedDistanceMeasureFast", use_fast=True, default_distance=1.0)

# create pose preprocessing pipeline
pose_preprocessors =[]
# pose_preprocessors.append(TrimMeaninglessFramesPoseProcessor()) # don't trim
pose_preprocessors.append(NormalizePosesProcessor()) # this metric DOES do normalization
pose_preprocessors.append(GetHandsOnlyHolisticPoseProcessor()) # select only the hands
pose_preprocessors.append(FillMaskedOrInvalidValuesPoseProcessor(masked_fill_value=1.0)) # fill masked values with 1.0
# pose_preprocessors.append(InterpolateAllToSetFPSPoseProcessor(fps=None)) # don't interpolate

# reduce pairs of poses to common components
pose_preprocessors.append(ReducePosesToCommonComponentsProcessor())


nDTWp_from_scratch = DistanceMetric(
    distance_measure=measure,
    name="nDTWp",
    pose_preprocessors=pose_preprocessors,
    )
nDTWp_from_scratch.get_signature()

nDTWp|higher_is_better:no|pose_preprocessors:[normalize_poses|scale_factor:1,get_hands_only,fill_masked_or_invalid|fill_val:1.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:1.0|aggregation_strategy:mean|use_fast:yes}

## Compare signatures
Other than the _name_ the two signatures are identical.

In [None]:
nDTWp_sig_without_name = nDTWp.get_signature().format().replace(nDTWp.name,"")
nDTWp_from_scratch_without_name = nDTWp_from_scratch.get_signature().format().replace(nDTWp_from_scratch.name, "")
print(nDTWp_sig_without_name)
print(nDTWp_from_scratch_without_name)
nDTWp_sig_without_name == nDTWp_from_scratch_without_name

|higher_is_better:no|pose_preprocessors:[normalize_poses|scale_factor:1,get_hands_only,fill_masked_or_invalid|fill_val:1.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:1.0|aggregation_strategy:mean|use_fast:yes}
|higher_is_better:no|pose_preprocessors:[normalize_poses|scale_factor:1,get_hands_only,fill_masked_or_invalid|fill_val:1.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:1.0|aggregation_strategy:mean|use_fast:yes}


True

# Demonstration

Let us load in some poses and demonstrate!

We use the _pose-format_ library (https://github.com/sign-language-processing/pose)

In [None]:
house_1_path = Path("/content/pose-evaluation/pose_evaluation/utils/test/test_data/mediapipe/standard_landmarks/colin-1-HOUSE.pose")

### Pose with Mediapipe Landmarks
Here we load in a file with Mediapipe format landmarks. There are 576 keypoints.

In [None]:
house1_pose = Pose.read(house_1_path.read_bytes())
print(house1_pose)

Pose
PoseHeader
Version: 0.20000000298023224
PoseHeaderDimensions(width=1280, height=720, depth=0)
Bounding Box: False
Components:
PoseHeaderComponent: POSE_LANDMARKS
  Format: XYZC
  Points: ['NOSE', 'LEFT_EYE_INNER', 'LEFT_EYE', 'LEFT_EYE_OUTER', 'RIGHT_EYE_INNER', 'RIGHT_EYE', 'RIGHT_EYE_OUTER', 'LEFT_EAR', 'RIGHT_EAR', 'MOUTH_LEFT', 'MOUTH_RIGHT', 'LEFT_SHOULDER', 'RIGHT_SHOULDER', 'LEFT_ELBOW', 'RIGHT_ELBOW', 'LEFT_WRIST', 'RIGHT_WRIST', 'LEFT_PINKY', 'RIGHT_PINKY', 'LEFT_INDEX', 'RIGHT_INDEX', 'LEFT_THUMB', 'RIGHT_THUMB', 'LEFT_HIP', 'RIGHT_HIP', 'LEFT_KNEE', 'RIGHT_KNEE', 'LEFT_ANKLE', 'RIGHT_ANKLE', 'LEFT_HEEL', 'RIGHT_HEEL', 'LEFT_FOOT_INDEX', 'RIGHT_FOOT_INDEX']
  Limbs: 35
  Colors: 1

PoseHeaderComponent: FACE_LANDMARKS
  Format: XYZC
  Points: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '4

Note:
(93, 1, 576, 3)

This implies the file has data for 93 frames, 1 person, 576 keypoints, and 3 dimensions (xyz coordinates).

### Longer Pose with Refined Mediapipe Landmarks.

Mediapipe has an option to "refine" landmarks.


In [None]:
house_2_path = Path("/content/pose-evaluation/pose_evaluation/utils/test/test_data/mediapipe/refined_landmarks/colin-HOUSE-needs-trim.pose")
house2_pose = Pose.read(house_2_path.read_bytes())
print(house2_pose)

Pose
PoseHeader
Version: 0.20000000298023224
PoseHeaderDimensions(width=1280, height=720, depth=0)
Bounding Box: False
Components:
PoseHeaderComponent: POSE_LANDMARKS
  Format: XYZC
  Points: ['NOSE', 'LEFT_EYE_INNER', 'LEFT_EYE', 'LEFT_EYE_OUTER', 'RIGHT_EYE_INNER', 'RIGHT_EYE', 'RIGHT_EYE_OUTER', 'LEFT_EAR', 'RIGHT_EAR', 'MOUTH_LEFT', 'MOUTH_RIGHT', 'LEFT_SHOULDER', 'RIGHT_SHOULDER', 'LEFT_ELBOW', 'RIGHT_ELBOW', 'LEFT_WRIST', 'RIGHT_WRIST', 'LEFT_PINKY', 'RIGHT_PINKY', 'LEFT_INDEX', 'RIGHT_INDEX', 'LEFT_THUMB', 'RIGHT_THUMB', 'LEFT_HIP', 'RIGHT_HIP', 'LEFT_KNEE', 'RIGHT_KNEE', 'LEFT_ANKLE', 'RIGHT_ANKLE', 'LEFT_HEEL', 'RIGHT_HEEL', 'LEFT_FOOT_INDEX', 'RIGHT_FOOT_INDEX']
  Limbs: 35
  Colors: 1

PoseHeaderComponent: FACE_LANDMARKS
  Format: XYZC
  Points: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '4

Note:
(172, 1, 586, 3)

This implies the file has data for 93 frames, 1 person, 576 keypoints, and 3 dimensions (xyz coordinates).

In [None]:
type(house2_pose.body.data)

numpy.ma.core.MaskedArray

### Do these have masked/missing values?

In [None]:
import numpy.ma as ma


In [None]:
ma.count_masked(house1_pose.body.data)

6489

In [None]:
frames_with_missing_values = 0
for frame_index, frame in enumerate(house1_pose.body.data):
  if ma.count_masked(frame) > 0:
    # print(f"Frame {frame_index} is missing {ma.count_masked(frame)} values")
    frames_with_missing_values += 1
print(f"There are {frames_with_missing_values} frames with missing values")

There are 54 frames with missing values


In [None]:
ma.count_masked(house2_pose.body.data)

17703

In [None]:
frames_with_missing_values = 0
for frame_index, frame in enumerate(house2_pose.body.data):
  if ma.count_masked(frame) > 0:
    # for person_index, person in enumerate(frame):
      # for keypoint_index, keypoint in enumerate(person):
        # if ma.is_masked(keypoint):
          # print(f"\tKeypoint {keypoint_index} is missing {ma.count_masked(keypoint)}")
    # print(f"Frame {frame_index} is missing {ma.count_masked(frame)} values")
    frames_with_missing_values += 1
print(f"There are {frames_with_missing_values} frames with missing values")

There are 141 frames with missing values


### How to compare these?
* One is much longer than the other. (172 frames vs 93)
* One has more keypoints than the other. (586 keypoints vs 576).
* different keypoints are missing at different times.

Previous pose metrics e.g. DTW-MJE may not define how to deal with these issues.

Fortunately, DTW$p$ and nDTW$p$ already have strategies defined for these and other issues!


#### Get Scores with DTW$p$

In [None]:
DTWp(house1_pose, house2_pose)

1991.93458720359

#### Get Scores with DTW$p$ with signatures

In [None]:
DTWp.score_with_signature(house1_pose, house2_pose)

startendtrimmed_unnormalized_hands_defaultdist0.0_nointerp_dtw_fillmasked10.0_dtaiDTWAggregatedDistanceMetricFast|higher_is_better:no|pose_preprocessors:[trim_pose|start:yes|end:yes,get_hands_only,fill_masked_or_invalid|fill_val:10.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:0.0|aggregation_strategy:mean|use_fast:yes} = 36.58

#### The Preprocessing allows it
The reason this works is because of the preprocessing pipeline included in the metric. Let's examine the pipeline.

In [None]:
DTWp.pose_preprocessors

[trim_pose|start:yes|end:yes,
 get_hands_only,
 fill_masked_or_invalid|fill_val:10.0,
 reduce_poses_to_intersection]

DTWp applies the following preprocessors in this order:
* `trim_pose|start:yes|end:yes` This means it trims starting and ending sections of the video where the hands are below the shoulders (and therefore not signing).
* `get_hands_only` Keeps only the keypoints in the hands.
* `fill_masked_or_invalid|fill_val:10.0`. Fills masked values with 10.0

We can also call preprocess separately from scoring and examine the results

In [None]:
preprocessed_poses = DTWp.process_poses([house1_pose, house2_pose])
house1_pose_preprocessed = preprocessed_poses[0]
house2_pose_preprocessed = preprocessed_poses[1]


In [None]:
house1_pose_preprocessed.body.data.shape


(63, 1, 42, 3)

The first pose has gotten shorted, and only has 42 keypoints now.

In [None]:
house2_pose_preprocessed.body.data.shape

(45, 1, 42, 3)

The second pose has gotten even shorter, despite being much longer originally. In this video I (Colin) deliberately sat for a while not moving, hands out of view, before performing the sign. These parts of the video were removed!

In [None]:
ma.count_masked(house1_pose_preprocessed.body.data)

0

In [None]:
ma.count_masked(house2_pose_preprocessed.body.data)

0

Neither sequence contains masked values now!

#### Get Scores with nDTW$p$

In [None]:
nDTWp(  house1_pose, house2_pose)

4.158450855906123

In [None]:
nDTWp.score_with_signature(house1_pose, house2_pose)

untrimmed_normalizedbyshoulders_hands_defaultdist1.0_nointerp_dtw_fillmasked1.0_dtaiDTWAggregatedDistanceMetricFast|higher_is_better:no|pose_preprocessors:[normalize_poses|scale_factor:1,get_hands_only,fill_masked_or_invalid|fill_val:1.0,reduce_poses_to_intersection]|distance_measure:{dtaiDTWAggregatedDistanceMeasureFast|default_distance:1.0|aggregation_strategy:mean|use_fast:yes} = 4.16

In [None]:
nDTWp.pose_preprocessors

[normalize_poses|scale_factor:1,
 get_hands_only,
 fill_masked_or_invalid|fill_val:1.0,
 reduce_poses_to_intersection]