How Mediapipe calculate the Z World Coordinate? #5261

linahourieh · 2024-03-25T15:31:28Z

Description of issue (what needs changing)

Can you please specify how Mediapipe outputs the Z-coordinates for the hand Detection Module?

Clear description

It is essential to know for developers who want to use mediapipe for developing medical purposes applications

Correct links

No response

Parameters defined

No response

Returns defined

No response

Raises listed and defined

No response

Usage example

No response

Request visuals, if applicable

No response

Submit a pull request?

No response

kuaashish · 2024-03-27T05:14:08Z

Hi @linahourieh,

The hand model utilizes a method called "scaled orthographic projection" or weak perspective, maintaining a constant average depth (Z avg).

Weak-perspective projection combines orthographic projection with scaling to simulate perspective projection, assuming uniform distance from the camera for all points on a 3D object.

We opt for weak perspective because it closely approximates perspective in many scenarios, particularly when the average variation in object depth (delta Z) along the line of sight is small compared to the fixed average depth (Z avg). This approach prevents distant objects from distorting due to perspective, instead uniformly scaling them up or down.

The model predicts a relative depth (z) based on the Z avg of typical hand depth, such as when holding a phone with one hand while the other is tracked, or when both hands are near the phone. The z range is unlimited but scales proportionally with x and y dimensions through weak projection, and shares the same units as x and y.

A primary landmark point (wrist) serves as the reference for other landmark depths, normalized via weak projection with respect to x and y coordinates.

For a deeper understanding, we recommend reviewing this thread.

Thank you!!

linahourieh · 2024-03-28T09:24:20Z

Thank you for the feedback !

So as far as my understanding, regarding the Z axis origin would be the wrist, and a world landmark Z would be an estimation based on scaled orthographic projection.

If I am interested in finding the trajectory of a landmark/tip of finger moving in space, then I would need a camera measuring the 3rd dimension?

google-ml-butler bot assigned ayushgdev Mar 25, 2024

kuaashish assigned kuaashish and unassigned ayushgdev Mar 26, 2024

kuaashish added task:hand landmarker Issues related to hand landmarker: Identify and track hands and fingers type:support General questions labels Mar 26, 2024

kuaashish added the stat:awaiting response Waiting for user response label Mar 27, 2024

google-ml-butler bot removed the stat:awaiting response Waiting for user response label Mar 28, 2024

kuaashish assigned yichunk and unassigned kuaashish Apr 8, 2024

kuaashish added the stat:awaiting googler Waiting for Google Engineer's Response label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How Mediapipe calculate the Z World Coordinate? #5261

How Mediapipe calculate the Z World Coordinate? #5261

linahourieh commented Mar 25, 2024

kuaashish commented Mar 27, 2024 •

edited

linahourieh commented Mar 28, 2024

How Mediapipe calculate the Z World Coordinate? #5261

How Mediapipe calculate the Z World Coordinate? #5261

Comments

linahourieh commented Mar 25, 2024

Description of issue (what needs changing)

Clear description

Correct links

Parameters defined

Returns defined

Raises listed and defined

Usage example

Request visuals, if applicable

Submit a pull request?

kuaashish commented Mar 27, 2024 • edited

linahourieh commented Mar 28, 2024

kuaashish commented Mar 27, 2024 •

edited