Doubts #60

manansaxena · 2019-07-20T10:10:29Z

For in the wild videos.
I had some doubts.
1)all the predictions done are in camera space, right ? How to do it in real world co-ordinate system ?

2)what is the location of the root joint as all the predictions are made with it as the origin ?

3)I have the intrinsic and extrinsic camera parameters for my run in the wild video. Where should I input it to make them contribute to the result or how should I pre-process my video with these parameters ?

4)I see that that the root joint is always fixed and you said to regress it for movement. Could you please explain it in a little more detail what you mean ?

5)What should be the default fps of the video - I wanted to get results at 200 fps is it possible ?
Thanks

dariopavllo · 2019-07-31T09:02:52Z

Yes, all predictions are in camera space. This is the only model to do it as the model has no knowledge of the real coordinate system, and in Human3.6M you have multiple cameras with different world coordinates. To predict in world coordinates you should transform the points after predicting them.
The root joint is at the origin (0, 0, 0). If you want to predict the position of the root joint you have to use the separate trajectory model, although its position may be imprecise.
You can use the extrinsic parameters to transform all poses to camera space. As for the intrinsic parameters, you can use that to undistort the camera (especially if the distortion is significant), although this is not strictly necessary.
To achieve the best precision, it's important to disentangle pose and global position, so we predict the pose relative to the root (hip) joint in our main model, and we use a separate model for the trajectory (optional).
The Human3.6M model was trained on 50 FPS, but even if you use different frame rates it should be somewhat robust to variations. Provided that you have sufficient computing power, you can use any frame rate. Our model is very fast, but the bottleneck may lie in the 2D keypoint detector.

manansaxena · 2019-08-01T01:33:25Z

Yes, all predictions are in camera space. This is the only model to do it as the model has no knowledge of the real coordinate system, and in Human3.6M you have multiple cameras with different world coordinates. To predict in world coordinates you should transform the points after predicting them.

The root joint is at the origin (0, 0, 0). If you want to predict the position of the root joint you have to use the separate trajectory model, although its position may be imprecise.

You can use the extrinsic parameters to transform all poses to camera space. As for the intrinsic parameters, you can use that to undistort the camera (especially if the distortion is significant), although this is not strictly necessary.

To achieve the best precision, it's important to disentangle pose and global position, so we predict the pose relative to the root (hip) joint in our main model, and we use a separate model for the trajectory (optional).

The Human3.6M model was trained on 50 FPS, but even if you use different frame rates it should be somewhat robust to variations. Provided that you have sufficient computing power, you can use any frame rate. Our model is very fast, but the bottleneck may lie in the 2D keypoint detector.

Hi,
Thanks a lot for the response.
Could you please explain more how to use the trajectory model for in the wild videos ?
And is the trained model made public by you guys ?
Thanks

dariopavllo · 2019-08-01T11:55:32Z

The trajectory model predicts the position of the root joint, and uses the same architecture as the pose model (you have 1 output joint instead of 17). As we state in the paper, the reason why we use two models is that they tend to affect each other negatively when trained in a multi-task fashion.

We currently don't have a public pretrained model for the trajectory, but you can train it yourself by running semi-supervised training.

Walter0807 · 2019-11-01T16:08:13Z

This is the only model to do it as the model has no knowledge of the real coordinate system

I think the 2D keypoints correspond to the pixel space. So does it mean we can't align 2D keypoints and 3D predictions?

slava-smirnov mentioned this issue Jul 16, 2020

3D trajectory in the wild #145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubts #60

Doubts #60

manansaxena commented Jul 20, 2019 •

edited

Loading

dariopavllo commented Jul 31, 2019

manansaxena commented Aug 1, 2019

dariopavllo commented Aug 1, 2019

Walter0807 commented Nov 1, 2019 •

edited

Loading

Doubts #60

Doubts #60

Comments

manansaxena commented Jul 20, 2019 • edited Loading

dariopavllo commented Jul 31, 2019

manansaxena commented Aug 1, 2019

dariopavllo commented Aug 1, 2019

Walter0807 commented Nov 1, 2019 • edited Loading

manansaxena commented Jul 20, 2019 •

edited

Loading

Walter0807 commented Nov 1, 2019 •

edited

Loading