Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pose detection api #39

Merged
merged 7 commits into from
Sep 28, 2023
Merged

Pose detection api #39

merged 7 commits into from
Sep 28, 2023

Conversation

ziyuan-linn
Copy link
Member

Hi everyone, this PR updates the Bodypose (poseDetection) API. The changes here are very similar to the changes made to the handpose API.

  • add detect(), detectStart(), and detectStop() function
  • remove EventEmitter interface
  • add support for p5 preload to ml5.handpose() function (global mode only for now)
  • modify output to expose keypoints by name
  • renamed the model from "poseDetection" to "bodypose" for now

TO-DO:

  • add more examples such as getting a few keypoints by name, detecting a single image, etc

Copy link
Member

@shiffman shiffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I'm good for this to merge, I left one small comment about the example that does need to be addressed now. Other examples to make (which can be done in a new branch)

  • Example for single pose from image
  • Example for parts by keyword

Other general comments / questions:

  • With the PoseNet model there was also a "skeleton" which if my memory is right was a data structure that indicated which body parts are connected. This doesn't need to be part of the ml5 function, but might be a nice example?
  • Should we "future proof" this and build in an optional "string" argument to indicate what model to use. Right now it would only be "BlazePose", is that correct? It might be nice also from a pedagogical standpoint to emphasize there is a specific pre-trained model being used and not a "magic" algorithm.

I'm happy for this to be merged and any of these questions can be moved to separate issues for tracking.

for (let j = 0; j < pose.keypoints.length; j++) {
let keypoint = pose.keypoints[j];
// Only draw a circle if the keypoint's confidence is bigger than 0.2
if (keypoint.score > 0.2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful to show the possibility but I have a few questions about it:

  • The example is simpler to look at without it, should we include it?
  • If we include it, maybe a lower threshold is fine? (0.1?)
  • Is score the term from the model itself? Another option is confidence. I don't have a preference I just seem to remember we're using confidence for the image classifiers (teachable machine, etc) so maybe we should be consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the check to make the example output look less confusing since MoveNet will try to fit all the landmark points inside the video even though a body part is clearly out of frame, but I would also be happy to remove it for simplicity.
(The top image is without the confidence check, and the bottom image has the check).
Screenshot 2023-09-01 141524
Screenshot 2023-09-01 141702

If we decide to include it I think 0.1 would be a good idea.

I believe score is the term used for tf.js landmark models. I do agree that it should be consistent across different features. Personally, I think confidence is a little more intuitive, but I also don't have a strong preference.

Please let me know your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, in that case I think we should keep the if statement, it's confusing to see all the extra points that are out of frame!

I always hesitate to change the name of something from the original source model, but I think we could make the decision to use "confidence" across the board instead of "score". Any thoughts @MOQN @sproutleaf @QuinnHe @gohai? I'm fine either way!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my personal opinion, "confidence" will make more sense to beginners and different naming conventions may easily create confusion. The "scores" are confidence scores in the end 😉.

@ziyuan-linn
Copy link
Member Author

Hi @shiffman, thank you for the review! I think the single pose, parts by keyword, and skeleton examples are great ideas. I would be happy to work on those after merging.

The current model used is MoveNet. It may be a little tricky to implement BlazePose since some of its option arguments are different from MoveNet. Personally, I think the best way to incorporate the BlazePose model would be through the existing "modelType" option. Right now it can be set to "SINGLEPOSE_LIGHTNING", "SINGLEPOSE_THUNDER", and "MULTIPOSE_THUNDER". Maybe it can be set to "movenet_singlepose_lighting", "movenet_singlepose_thunder", "movenet_singlepose_thunder", "blazepose_lite", and "blazepose_full" in the future? We can also rename "modelType" to "model" if it makes more sense. We can give an explanation for each option on the website. I'm willing to look into this further. Please let me know what you think.

@shiffman
Copy link
Member

shiffman commented Sep 1, 2023

Thanks for the detailed explanation! Apologies, I got things backwards (MoveNet vs. BlazePose). What I was imagining was the following:

let bodypose = ml5.bodyPose();            // Default MoveNet
let bodypose = ml5.bodyPose("MoveNet");   // Explicitly using MoveNet
let bodypose = ml5.bodyPose("BlazePose"); // Explicitly using BlazePose (if and when we implement it or other models).

But I see now that there is more nuance here and lots of different versions of the model. What is the default one if none is specified: "MULTIPOSE_THUNDER"? I might suggest implementing something like what I suggested above but the options argument is also available to be even more specific? But maybe that overcomplicates things.

I don't think we need to solve this now. Since it's only MoveNet we can just default to that and let people change the "specific" model with options.

@ziyuan-linn
Copy link
Member Author

Ah, sorry I misunderstood your suggestion earlier. I do think it would be a good idea to have an extra string argument to bodypose() to differentiate between MoveNet and BlazePose. And Yes, the current default model is "MULTIPOSE_THUNDER". I think this can probably stay as is until we get around to implementing Blazepose. The modification should be simple since we are using the handleArguments() function to parse the parameters.

An alternative solution on top of my head is to create a new ml5 model: ml5.bodypose() and ml5.bodypose3D()(?). This might be easier to document on the website and cause less confusion since MoveNet and BlazePose 's config settings are more different than similar, and BlazePose alone has two different runtime, tf.js or MediaPipe.

@ziyuan-linn
Copy link
Member Author

@shiffman I think this branch is ready to merge. Please let me know if there is any issue that needs immediate addressing. I will make updates (add more examples, change "score" to "confidence", add BlazePose...) in other branches. :)

@shiffman
Copy link
Member

Yes, this is great! I'll merge and we can discuss and implement next steps in other branches!

@shiffman shiffman merged commit 2602846 into main Sep 28, 2023
@gohai gohai deleted the pose-detection-api branch October 2, 2023 11:12
@MOQN
Copy link
Member

MOQN commented Oct 7, 2023

@ziyuan-linn Thank you for all of your amazing work! Quinn and I are planning to explore the newly implemented models/features for the website. Let me reach out via email!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants