Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Doubts #60

Open
manansaxena opened this issue Jul 20, 2019 · 4 comments
Open

Doubts #60

manansaxena opened this issue Jul 20, 2019 · 4 comments

Comments

@manansaxena
Copy link

manansaxena commented Jul 20, 2019

For in the wild videos.
I had some doubts.
1)all the predictions done are in camera space, right ? How to do it in real world co-ordinate system ?

2)what is the location of the root joint as all the predictions are made with it as the origin ?

3)I have the intrinsic and extrinsic camera parameters for my run in the wild video. Where should I input it to make them contribute to the result or how should I pre-process my video with these parameters ?

4)I see that that the root joint is always fixed and you said to regress it for movement. Could you please explain it in a little more detail what you mean ?

5)What should be the default fps of the video - I wanted to get results at 200 fps is it possible ?
Thanks

@dariopavllo
Copy link
Contributor

  1. Yes, all predictions are in camera space. This is the only model to do it as the model has no knowledge of the real coordinate system, and in Human3.6M you have multiple cameras with different world coordinates. To predict in world coordinates you should transform the points after predicting them.

  2. The root joint is at the origin (0, 0, 0). If you want to predict the position of the root joint you have to use the separate trajectory model, although its position may be imprecise.

  3. You can use the extrinsic parameters to transform all poses to camera space. As for the intrinsic parameters, you can use that to undistort the camera (especially if the distortion is significant), although this is not strictly necessary.

  4. To achieve the best precision, it's important to disentangle pose and global position, so we predict the pose relative to the root (hip) joint in our main model, and we use a separate model for the trajectory (optional).

  5. The Human3.6M model was trained on 50 FPS, but even if you use different frame rates it should be somewhat robust to variations. Provided that you have sufficient computing power, you can use any frame rate. Our model is very fast, but the bottleneck may lie in the 2D keypoint detector.

@manansaxena
Copy link
Author

  1. Yes, all predictions are in camera space. This is the only model to do it as the model has no knowledge of the real coordinate system, and in Human3.6M you have multiple cameras with different world coordinates. To predict in world coordinates you should transform the points after predicting them.
  2. The root joint is at the origin (0, 0, 0). If you want to predict the position of the root joint you have to use the separate trajectory model, although its position may be imprecise.
  3. You can use the extrinsic parameters to transform all poses to camera space. As for the intrinsic parameters, you can use that to undistort the camera (especially if the distortion is significant), although this is not strictly necessary.
  4. To achieve the best precision, it's important to disentangle pose and global position, so we predict the pose relative to the root (hip) joint in our main model, and we use a separate model for the trajectory (optional).
  5. The Human3.6M model was trained on 50 FPS, but even if you use different frame rates it should be somewhat robust to variations. Provided that you have sufficient computing power, you can use any frame rate. Our model is very fast, but the bottleneck may lie in the 2D keypoint detector.

Hi,
Thanks a lot for the response.
Could you please explain more how to use the trajectory model for in the wild videos ?
And is the trained model made public by you guys ?
Thanks

@dariopavllo
Copy link
Contributor

The trajectory model predicts the position of the root joint, and uses the same architecture as the pose model (you have 1 output joint instead of 17). As we state in the paper, the reason why we use two models is that they tend to affect each other negatively when trained in a multi-task fashion.

We currently don't have a public pretrained model for the trajectory, but you can train it yourself by running semi-supervised training.

@Walter0807
Copy link

Walter0807 commented Nov 1, 2019

This is the only model to do it as the model has no knowledge of the real coordinate system

I think the 2D keypoints correspond to the pixel space. So does it mean we can't align 2D keypoints and 3D predictions?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants