-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/keypoints from mediapipe #1232
Feat/keypoints from mediapipe #1232
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is a very neat addition to our codebase. Easy to test, immediately visible benefit, neatly coded-up.
I left a few smaller changes, and I'd like to ask for some larger ones:
- I'd like to verify that using this inside a
PoseLandmarker
context block works. If you follow the guide you'll seePoseLandmarkerOptions
being created and thenwith PoseLandmarker.create_from_options(options) as landmarker:
called. Inside,Detections.from_mediapipe
should work. - Next, and that's something I've overlooked for the longest time didn't - MediaPipe does in fact support more than 1 person! To do it, you need to set the number of poses explicitly by passing
num_poses=N
toPoseLandmarkerOptions
.
Would you mind making sure these work and adding the test to the Colab?
The docstring from_mediapipe
doesn't need to mentioned these examples.
Hi @LinasKo! Thanks for the suggestions!
Regarding the first comment I discover that the use of For this reason, should we provide support for both types or just focus on the new result? This is quite linked to the next comment.
Regarding this second comment, indeed, it is now possible to obtain more than one pose, but only with the new implementation whose output object is of type |
|
I think it wasn't that hard, but let me know if it is a nice solution: I parsed the legacy result into a list (same as the new one) so the loop to fill the new KeyPoints object can remain the same for both results. I also updated the Google Colab to make easier the checking process. |
Hi @David-rn, Apologies for the delay! Good to hear it supports both cases - I'm prioritizing to review this next week. |
Tidied up a few small aspects of this:
Codeimport mediapipe as mp # noqa
import cv2
import numpy as np
import supervision as sv
BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
options = PoseLandmarkerOptions(
base_options=BaseOptions(model_asset_path="pose_landmarker_lite.task"),
running_mode=VisionRunningMode.IMAGE,
num_poses=4,
)
cap = cv2.VideoCapture(0)
with PoseLandmarker.create_from_options(options) as landmarker:
# pose_landmarker_result = landmarker.detect(mp_image)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
mp_image = mp.Image(
image_format=mp.ImageFormat.SRGB,
data=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
)
pose_landmarker_result = landmarker.detect(mp_image)
image_hight, image_width, _ = frame.shape
keypoints = sv.KeyPoints.from_mediapipe(pose_landmarker_result, (image_width, image_hight))
edge_annotator = sv.EdgeAnnotator(color=sv.Color.GREEN, thickness=2)
annotated_image = edge_annotator.annotate(
scene=frame.copy(),
key_points=keypoints,
)
cv2.imshow("image", annotated_image)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release() I like how this looks. @SkalskiP, up for giving the code a quick glance? |
Description
According to issue #1174, this PR extends the usage of KeyPoints class to add
from_mediapipe
connector.Type of change
How has this change been tested, please provide a testcase or example of how you tested the change?
The following Google Colab describes the process used for testing this new feature. It allows to upload a picture to test it. (This Colab feature only worked in Chrome for me)
Any specific deployment considerations
This connector needs an additional parameter
resolution
since the result from mediapipe only provides the normalized coordinates [0.0, 1.0]. I didn't find the equivalent of VideoInfo for images, so the parameter accepts a Tuple of ints.Another consideration is that mediapipe only outputs the prediction of the most prominent person in the image. So, even if there is several people in the image, the output will be only the keypoints from one person. The implementation has taken this into account.
Docs