Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the estimation result on 3DPW using ResNet50 backbone #33

Closed
tinatiansjz opened this issue Sep 1, 2021 · 4 comments
Closed

the estimation result on 3DPW using ResNet50 backbone #33

tinatiansjz opened this issue Sep 1, 2021 · 4 comments

Comments

@tinatiansjz
Copy link

Hi, I'm curious about the quantitative performance of METERO with ResNet50 backbone on the 3DPW dataset, since the official repo doesn't provide the pre-trained models with ResNet50. I'd be grateful if any advice was given.

@tinatiansjz
Copy link
Author

And the joints for estimation are predicted by the network, instead of the regressed ones obtained by using the pre-defined regression matrix. Have I got this right?

@kevinlin311tw
Copy link
Member

kevinlin311tw commented Sep 6, 2021

Q&A1: Unfortunately we didn't try ResNet50 backbone on 3DPW dataset. In our early experiments, we were using Human3.6M for the ablation study of the use of different backbones. We found HRNet gives better results, so we use HRNet for the rest of the experiments.

Q&A2: Yes. We mainly evaluate the 3D joints which are regressed from the 3D mesh via the pre-defined regression matrix. This is because we want to understand the 3D pose of the estimated 3D mesh. In fact, in our early explorations, we have tried to evaluate the 3D joints which are directly predicted by the network. It actually gives very similar results (about ~0.1 mPJPE improvement).

@kevinlin311tw
Copy link
Member

Just want to add more comments about Q2.

Since the 3D joints we used are computed from 3D mesh, you may wonder what if we use transformer to just predict 3D vertices. In fact, in our early explorations, we have tried to use transformer to predict vertices only. But the training was not converging well.

To make it converging, we found that we need to have both joint and vertex queries, and train our transformer to predict both joints and vertices. We think this is probably because in this approach, we can better leverage self-attention mechanism to directly learn non-local interactions between them, which leads to further improvements.

@tinatiansjz
Copy link
Author

Thank you for your clear explanation! My confusion has been dispelled :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants