Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/keypoints from mediapipe #1232

Merged
merged 11 commits into from
Jun 13, 2024

Conversation

David-rn
Copy link
Contributor

Description

According to issue #1174, this PR extends the usage of KeyPoints class to add from_mediapipe connector.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

The following Google Colab describes the process used for testing this new feature. It allows to upload a picture to test it. (This Colab feature only worked in Chrome for me)

Any specific deployment considerations

  • This connector needs an additional parameter resolution since the result from mediapipe only provides the normalized coordinates [0.0, 1.0]. I didn't find the equivalent of VideoInfo for images, so the parameter accepts a Tuple of ints.

  • Another consideration is that mediapipe only outputs the prediction of the most prominent person in the image. So, even if there is several people in the image, the output will be only the keypoints from one person. The implementation has taken this into account.

Docs

  • Docs updated? What were the changes:

@CLAassistant
Copy link

CLAassistant commented May 27, 2024

CLA assistant check
All committers have signed the CLA.

@SkalskiP
Copy link
Collaborator

Hi, @David-rn 👋🏻 Thanks a lot for submitting your PR. @LinasKo, could you take a look?

Copy link
Collaborator

@LinasKo LinasKo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a very neat addition to our codebase. Easy to test, immediately visible benefit, neatly coded-up.

I left a few smaller changes, and I'd like to ask for some larger ones:

  1. I'd like to verify that using this inside a PoseLandmarker context block works. If you follow the guide you'll see PoseLandmarkerOptions being created and then with PoseLandmarker.create_from_options(options) as landmarker: called. Inside, Detections.from_mediapipe should work.
  2. Next, and that's something I've overlooked for the longest time didn't - MediaPipe does in fact support more than 1 person! To do it, you need to set the number of poses explicitly by passing num_poses=N to PoseLandmarkerOptions.

Would you mind making sure these work and adding the test to the Colab?
The docstring from_mediapipe doesn't need to mentioned these examples.

supervision/keypoint/core.py Outdated Show resolved Hide resolved
supervision/keypoint/core.py Outdated Show resolved Hide resolved
supervision/keypoint/core.py Outdated Show resolved Hide resolved
supervision/keypoint/core.py Outdated Show resolved Hide resolved
supervision/keypoint/core.py Outdated Show resolved Hide resolved
supervision/keypoint/core.py Outdated Show resolved Hide resolved
supervision/keypoint/core.py Outdated Show resolved Hide resolved
supervision/keypoint/core.py Outdated Show resolved Hide resolved
@David-rn
Copy link
Contributor Author

David-rn commented Jun 2, 2024

Hi @LinasKo! Thanks for the suggestions!

  1. I'd like to verify that using this inside a PoseLandmarker context block works. If you follow the guide you'll see PoseLandmarkerOptions being created and then with PoseLandmarker.create_from_options(options) as landmarker: called. Inside, Detections.from_mediapipe should work.

Regarding the first comment I discover that the use of mp.solutions.pose.Pose (the one I implemented) is a Legacy Solution as of March 1, 2023 as state here. This confusion originated in the example Colab. The one you suggest in your comment is the new way to go. However, the result object is not the same as the legacy one.

For this reason, should we provide support for both types or just focus on the new result? This is quite linked to the next comment.

  1. Next, and that's something I've overlooked for the longest time didn't - MediaPipe does in fact support more than 1 person! To do it, you need to set the number of poses explicitly by passing num_poses=N to PoseLandmarkerOptions.

Regarding this second comment, indeed, it is now possible to obtain more than one pose, but only with the new implementation whose output object is of type PoseLandmarkerResult. This object holds a list of poses whereas the legacy one holds a different structure.

@LinasKo
Copy link
Collaborator

LinasKo commented Jun 2, 2024

  1. How hard would it be to support both? If it's tricky to reason about or the code involved is complicated, let's stick to the new version. Even a few years back, I was already using the newer API, I believe.
  2. Aye, let's add the newer one for sure in that case.

@David-rn
Copy link
Contributor Author

David-rn commented Jun 2, 2024

I think it wasn't that hard, but let me know if it is a nice solution:

I parsed the legacy result into a list (same as the new one) so the loop to fill the new KeyPoints object can remain the same for both results. I also updated the Google Colab to make easier the checking process.

@LinasKo
Copy link
Collaborator

LinasKo commented Jun 6, 2024

Hi @David-rn,

Apologies for the delay! Good to hear it supports both cases - I'm prioritizing to review this next week.

@LinasKo
Copy link
Collaborator

LinasKo commented Jun 11, 2024

Tidied up a few small aspects of this:

  1. ( instead of [ making mypy happy
  2. Removed legacy example. With our current plans to also add support for mediapipe face, this would bloat the docs.
  3. Brought up to speed with develop.
  4. Added inline suggestion to download model task. Initially intended to add a URL outside the code block, but it doesn't render nicely.
  5. Tested on a local webcam.
Code
import mediapipe as mp  # noqa
import cv2
import numpy as np
import supervision as sv


BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = PoseLandmarkerOptions(
    base_options=BaseOptions(model_asset_path="pose_landmarker_lite.task"),
    running_mode=VisionRunningMode.IMAGE,
    num_poses=4,
)

cap = cv2.VideoCapture(0)


with PoseLandmarker.create_from_options(options) as landmarker:
    # pose_landmarker_result = landmarker.detect(mp_image)
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        mp_image = mp.Image(
            image_format=mp.ImageFormat.SRGB,
            data=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        )

        pose_landmarker_result = landmarker.detect(mp_image)

        image_hight, image_width, _ = frame.shape
        keypoints = sv.KeyPoints.from_mediapipe(pose_landmarker_result, (image_width, image_hight))

        edge_annotator = sv.EdgeAnnotator(color=sv.Color.GREEN, thickness=2)
        annotated_image = edge_annotator.annotate(
            scene=frame.copy(),
            key_points=keypoints,
        )

        cv2.imshow("image", annotated_image)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
        
cap.release()

I like how this looks. @SkalskiP, up for giving the code a quick glance?

@SkalskiP
Copy link
Collaborator

Fantastic job, @David-rn and @LinasKo 🔥 ! Merging!

@SkalskiP SkalskiP merged commit f96e776 into roboflow:develop Jun 13, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[KeyPoints] - add from_mediapipe connector allowing Supervision to be used with Google MediaPipe models
4 participants