Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo of body 3d pose in space #3

Closed
gb2111 opened this issue Jul 16, 2021 · 12 comments
Closed

Demo of body 3d pose in space #3

gb2111 opened this issue Jul 16, 2021 · 12 comments

Comments

@gb2111
Copy link

gb2111 commented Jul 16, 2021

Hello.
I was looking at 05_wrist_rom where I can see hand in 3d being properly positioned in all 3 axis, that is left, right and depth with default parameters.
In new mediapipe there is also position in meters but origin is in hips. Could you consider making demo of body pose but located in space same as you have now hand?
Thank you for great examples.

@guanming001
Copy link
Collaborator

Hi @gb2111 thank you for your interest in this project.

I only tried to convert the hand joint to 3D space with reference to camera coordinate by making some simplified assumptions such as the distance between wrist to index finger MCP is around 8 cm and the distance from the hand to camera is around 0.6 m (details can be found in the convert_joint_to_camera_coor function).

However, when I tried to apply similar assumptions to the human body, the results are not as good, perhaps due to the greater diversity in human body dimensions and the distance from the body to single camera view is much harder to estimate due to depth ambiguity.

@gb2111
Copy link
Author

gb2111 commented Jul 18, 2021

But I have seen that you have had some kind of depth estimation and you must have done some updates.
On readme of this repo, we have 08_skeleton_3D gif that put a skeleton in some perspective. While now when I run the demo it is at the source of the camera. I think 01_video before was also in some perspective and now middle of hips is simple in camera source.

Second question. Do you somehow normalize skeleton that it hold always same size?
Thanks.

@guanming001
Copy link
Collaborator

Hi @gb2111

I made some update to try to estimate the body joint in camera coordinate (details can be found in convert_body_joint_to_camera_coor function).

Below are the result when tested on the video taken from The Greatest Showman inspired by the work of VIBE: Video Inference for Human Body Pose and Shape Estimation [CVPR-2020].
When the person is closer to the camera, his 3D joints also appears closer, but take note that the it is still not very stable and accurate as the actual camera intrinsics are unknown and it only uses 4 keypoints to estimate the 3D pose, if you have any suggestions/improvements feel free to make a pull request.
ezgif com-gif-maker

P.S. For the 08_skeleton_3D gif I think the hip was hardcoded and fixed at some distance in front of the camera

@gb2111
Copy link
Author

gb2111 commented Jul 18, 2021

Yes. actually, I have an idea :)
With each frame, we have

  1. 3D model (fixed size, not scaled)
  2. 2D projection of the model to the image.

This is perfect input to SolvePNP from OpenCV. What do you think ? This function would address the situation when a person leans forward that seems to produce jitter on your gif.
Unfortunately, I have no skill in python to use it. Know it from c# mostly.

Also, can you tell if you scale or not the model? If not how do you make it fixed size?

Edit: I tested and by the first look, it seems very well. I will take a closer look tomorrow. Can we add more points like knees to the estimation to improve accuracy?

I hope you can take a look on SolvePNP as well ;)
Best regards.

@guanming001
Copy link
Collaborator

guanming001 commented Jul 19, 2021

I have added two options to the convert_body_joint_to_camera_coor function you can give it a try:

  • scale_body try to keep the body dimensions fixed (e.g. hip width, arm length, leg length)
  • use_solvepnp but sometimes the depth of the joint may be estimated wrongly (i.e. translation in z direction is negative)

@gb2111
Copy link
Author

gb2111 commented Jul 19, 2021

Thank you for adding this! I'd say they both work very well. I can see you are using all 33 landmarks for SolvePNP. I am surprised it works :) Can't tell which one works better. Apart from fact which you mention about SolvePnP direction being negative.
I am not sure why you added scale_body cause you already take it from pose_world_landmarks which are already unified?

@guanming001
Copy link
Collaborator

The scale_body option was just to test if a fixed sized body model works, but anyway the latest version of MediaPipe 0.8.6 already offers quite a useful estimate of real-world 3D body joint so the scale_body option may not be necessary.

@gb2111
Copy link
Author

gb2111 commented Jul 19, 2021

Thanks for clarification
Honestly, I think it works quite well.
Again I appreciate you made these changes.
Edit: If you find anything about pnp function and negative direction I hope you will update repo. If I find anything will let u know :)

@gb2111 gb2111 closed this as completed Jul 19, 2021
@gb2111 gb2111 reopened this Jul 19, 2021
@gb2111 gb2111 closed this as completed Jul 19, 2021
@gb2111
Copy link
Author

gb2111 commented Jul 19, 2021

@guanming001
I wonder if you could do one more try. Namely to use same 4 points to SolvePNP. It seems for me like legs especially when on knees they give bad impact on estimate.
You need to have global variables for rvec and tvec and always pass them to function as argument and read as result.
I am using something similar in C# and it allows to use 4 points when you set useExtrinsicGuess and pass previous rotation and translation vector

(_, rotation_vector, translation_vector) = cv2.solvePnP(
            points_model,
            image_points,
            self.camera_matrix,
            self.dist_coeefs,
            rvec=self.r_vec,
            tvec=self.t_vec,
            useExtrinsicGuess=True)
self.r_vec=rotation_vector
self.t_vec=translation_vector

@gb2111 gb2111 reopened this Jul 19, 2021
@gb2111
Copy link
Author

gb2111 commented Jul 20, 2021

Ok. I improved my python skills and added class variable for rotation and translation and use it as described above. I initialize the translation vector as [0,0,1] so the estimated pose is in a good location always. Now I can use SolvePnP either with 4 or with all landmarks.

When using 4 points need to remove multiplication by rmat = cv2.Rodrigues(rvec)[0] what doesn't seem to be required as we have very good rotation from mediapipe and here we need the only position. So in the end I removed it entirely.

So now we have in theory 3 methods. I cannot tell which one is better. Ideally would be good to compare one by one same video and have always the same view of camera. If you have a code snippet on how to initialize the camera with a view pls share it.

Thanks again for adding pnp to your repo!

@guanming001
Copy link
Collaborator

Thank you for your suggestions!

By initializing the rvec, tvec, and enable useExtrinsicGuess, it allows solvepnp to use 4 landmarks and the issue of negative z translation is gone.

You can initialize the camera view, by simply pressing the key 'r' which will reset to default camera view.

If you want a different view you can change the value of identity matrix in self.pinhole.extrinsic.

Or you can also replace the reset_view function with the below code:

def reset_view(self):
    # Set camera view
    ctr = self.vis.get_view_control()
    ctr.set_up([0,-1,0]) # Set up as -y axis
    ctr.set_front([1,-1,-1])
    ctr.set_lookat([0,0,3])
    ctr.set_zoom(0.5)

@gb2111
Copy link
Author

gb2111 commented Jul 20, 2021

I will give it a try.

Thank you.

@gb2111 gb2111 closed this as completed Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants