ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation #24915

amyeroberts · 2023-07-19T11:14:22Z

Model description

ViTPose is used in 2D human pose estimation, a subset of the keypoint detection task #24044

It provides a simple baseline for vision transformer-based human pose estimation. It utilises a pretrained vision transformer backbone to extract features and a simple decoder head to process the extracted features. Despite no elaborate designs in the model, ViTPose obtained state-of-the-art (SOTA) performance of 80.9 AP on the MS COCO Keypoint test-dev set.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Code and weights: https://github.com/ViTAE-Transformer/ViTPose
Paper: https://arxiv.org/abs/2204.12484

@Annbless

ydshieh · 2023-07-20T14:10:13Z

Glad you get something different to work on 🚀 👀 🎉

shauray8 · 2023-07-20T17:33:36Z

Hi, @amyeroberts, I don't know if you are working on this but if not I would be more than happy to take it up.

ydshieh · 2023-07-20T17:57:22Z

Oh, this is the issue page, not the PR page!

amyeroberts · 2023-07-21T12:11:58Z

@shauray8 You're very welcome to take this up! :)

This model presents a new task for the library, so there might be some iterations and discussions on what the inputs and outputs should look like. The model translation should be fairly straightforward though, so I'd suggest starting with a PR that implements that and then on the PR we can figure out what works best.

amyeroberts added the New model label Jul 19, 2023

shauray8 mentioned this issue Jul 21, 2023

[WIP]Add ViTPose to Transformers #25001

Closed

4 tasks

amyeroberts added the Vision label Dec 18, 2023

amyeroberts mentioned this issue May 8, 2024

Add ViTPose #30530

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation #24915

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation #24915

amyeroberts commented Jul 19, 2023

ydshieh commented Jul 20, 2023

shauray8 commented Jul 20, 2023

ydshieh commented Jul 20, 2023

amyeroberts commented Jul 21, 2023

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation #24915

ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation #24915

Comments

amyeroberts commented Jul 19, 2023

Model description

Open source status

Provide useful links for the implementation

ydshieh commented Jul 20, 2023

shauray8 commented Jul 20, 2023

ydshieh commented Jul 20, 2023

amyeroberts commented Jul 21, 2023