Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Mediapipe calculate the Z World Coordinate? #5261

Open
linahourieh opened this issue Mar 25, 2024 · 2 comments
Open

How Mediapipe calculate the Z World Coordinate? #5261

linahourieh opened this issue Mar 25, 2024 · 2 comments
Assignees
Labels
stat:awaiting googler Waiting for Google Engineer's Response task:hand landmarker Issues related to hand landmarker: Identify and track hands and fingers type:support General questions

Comments

@linahourieh
Copy link

Description of issue (what needs changing)

Can you please specify how Mediapipe outputs the Z-coordinates for the hand Detection Module?

Clear description

It is essential to know for developers who want to use mediapipe for developing medical purposes applications

Correct links

No response

Parameters defined

No response

Returns defined

No response

Raises listed and defined

No response

Usage example

No response

Request visuals, if applicable

No response

Submit a pull request?

No response

@kuaashish kuaashish assigned kuaashish and unassigned ayushgdev Mar 26, 2024
@kuaashish kuaashish added task:hand landmarker Issues related to hand landmarker: Identify and track hands and fingers type:support General questions labels Mar 26, 2024
@kuaashish
Copy link
Collaborator

kuaashish commented Mar 27, 2024

Hi @linahourieh,

The hand model utilizes a method called "scaled orthographic projection" or weak perspective, maintaining a constant average depth (Z avg).

Weak-perspective projection combines orthographic projection with scaling to simulate perspective projection, assuming uniform distance from the camera for all points on a 3D object.

We opt for weak perspective because it closely approximates perspective in many scenarios, particularly when the average variation in object depth (delta Z) along the line of sight is small compared to the fixed average depth (Z avg). This approach prevents distant objects from distorting due to perspective, instead uniformly scaling them up or down.

The model predicts a relative depth (z) based on the Z avg of typical hand depth, such as when holding a phone with one hand while the other is tracked, or when both hands are near the phone. The z range is unlimited but scales proportionally with x and y dimensions through weak projection, and shares the same units as x and y.

A primary landmark point (wrist) serves as the reference for other landmark depths, normalized via weak projection with respect to x and y coordinates.

For a deeper understanding, we recommend reviewing this thread.

Thank you!!

@kuaashish kuaashish added the stat:awaiting response Waiting for user response label Mar 27, 2024
@linahourieh
Copy link
Author

Thank you for the feedback !

So as far as my understanding, regarding the Z axis origin would be the wrist, and a world landmark Z would be an estimation based on scaled orthographic projection.

If I am interested in finding the trajectory of a landmark/tip of finger moving in space, then I would need a camera measuring the 3rd dimension?

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Waiting for user response label Mar 28, 2024
@kuaashish kuaashish assigned yichunk and unassigned kuaashish Apr 8, 2024
@kuaashish kuaashish added the stat:awaiting googler Waiting for Google Engineer's Response label Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting googler Waiting for Google Engineer's Response task:hand landmarker Issues related to hand landmarker: Identify and track hands and fingers type:support General questions
Projects
None yet
Development

No branches or pull requests

4 participants