Add ViTPose #30530

NielsRogge · 2024-04-28T20:01:37Z

What does this PR do?

This PR adds ViTPose as introduced in ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation.

Here's a demo notebook - note that the API might change: https://colab.research.google.com/drive/15_3gjcC0wtKSH85k76zewt81eUJIEWWA?usp=sharing.

To do:

get rid of cv2 dependency (?)

SangbumChoi · 2024-04-30T01:48:07Z

@NielsRogge Hi Niels! Does this current PR works properly? (I want to do some test for this)

amyeroberts · 2024-05-08T19:26:27Z

Thanks for working on this @NielsRogge!

I can see there's a few bits unfinished e.g. tests. Is there a particular bit of code you'd like me to look at for a maintainers perspective?

For the issue description, there's a related model request is here: #24915

amyeroberts · 2024-05-14T08:38:04Z

@NielsRogge I'm unsubscribing atm, so that I don't get notifications on every new push. You just need to ping me again with my username when it's ready for review and I'll get notified

NielsRogge · 2024-05-16T06:23:54Z

It would be great to have a first round of review as the PR is in a ready state. @amyeroberts

SangbumChoi · 2024-05-16T13:23:03Z

src/transformers/models/vitpose/convert_vitpose_to_hf.py

+
+
+name_to_path = {
+    "vitpose-base-simple": "/Users/nielsrogge/Documents/ViTPose/vitpose-b-simple.pth",


Suggested change

"vitpose-base-simple": "/Users/nielsrogge/Documents/ViTPose/vitpose-b-simple.pth",

"vitpose-base-simple": "https:\/\/4mjpca.sn.files.1drv.com\/y4mip6jbupeZ3YzICoNJYUb6yGEheWXkicKj0tvp1Sfq8BztlH8ieD63z2ZRYiTBzvDxKXFqd_wa5m8NHnBsURmpClZySMSJjS3hxrU2bFArawJ5mAVZsni4LmsfWs_K1dnIzDumXXuanSopYKm0O-Bx5z4JerIfGoE6riAtY_ni5_paFl46jGTE82U8J10Cm3gxHv2DSfOkrgV7SkmUKvnjg\/vitpose-b-simple.pth?download&psid=1",

import requests def download_file(url, local_filename): # Sending requests with stream=True allows downloading large files with requests.get(url, stream=True) as response: response.raise_for_status() # Raise an exception if the request fails with open(local_filename, 'wb') as file: # Efficiently download large files by iterating over content in chunks for chunk in response.iter_content(chunk_size=8192): file.write(chunk) return local_filename # Given URL and the filename to save url = "https://4mjpca.sn.files.1drv.com/y4mip6jbupeZ3YzICoNJYUb6yGEheWXkicKj0tvp1Sfq8BztlH8ieD63z2ZRYiTBzvDxKXFqd_wa5m8NHnBsURmpClZySMSJjS3hxrU2bFArawJ5mAVZsni4LmsfWs_K1dnIzDumXXuanSopYKm0O-Bx5z4JerIfGoE6riAtY_ni5_paFl46jGTE82U8J10Cm3gxHv2DSfOkrgV7SkmUKvnjg/vitpose-b-simple.pth?download&psid=1" local_filename = "vitpose-b-simple.pth" # Execute file download download_file(url, local_filename) print(f"File downloaded as {local_filename}")

I think we can convert this code with the cloud uploaded weight!

amyeroberts

Thanks for adding this model!

I've only done a first high-level pass. Normally I'd ask for the backbone to be added in a separate PR, but as the modeling files are relatively small, I think it's OK.

Main comments are about the image processing: poat-processing should take and return torch tensors; there should be more tests to make sure the pre and post processing work on batched inputs and outputs are as expected, particularly for the custom transforms; cv2 logic should be removed

amyeroberts · 2024-05-21T17:49:12Z

src/transformers/models/vitpose/image_processing_vitpose.py

+if is_cv2_available():
+    # TODO get rid of cv2?
+    import cv2
+


amyeroberts · 2024-05-21T17:54:02Z

src/transformers/models/vitpose/image_processing_vitpose.py

+        cv2_image = (
+            image
+            if input_data_format == ChannelDimension.LAST
+            else to_channel_dimension_format(image, ChannelDimension.LAST, input_data_format)
+        )
+        image = cv2.warpAffine(cv2_image, transformation, size, flags=cv2.INTER_LINEAR)


All cv2 logic should be removed

src/transformers/models/vitpose/image_processing_vitpose.py

src/transformers/models/vitpose/modeling_vitpose.py

src/transformers/models/vitpose/configuration_vitpose.py

src/transformers/image_transforms.py

amyeroberts · 2024-05-21T18:44:43Z

src/transformers/image_transforms.py

+    elif width < aspect_ratio * height:
+        width = height * aspect_ratio
+
+    # pixel std is 200.0


docs/source/en/model_doc/vitpose.md

…kbone

NielsRogge and others added 30 commits May 25, 2022 12:23

First draft

03e4321

Make fixup

84ac7fe

Make forward pass worké

90018b0

Improve code

5ce0b8b

Fix merge

3009a8a

More improvements

8f39773

More improvements

067f593

Make predictions match

7360c22

More improvements

a1b154a

Improve image processor

4bd07c3

Fix model tests

44f694a

Add classic decoder

41c1778

Merge remote-tracking branch 'upstream/main' into add_vitpose

1773f8d

Convert classic decoder

ceb3d3c

Verify image processor

fedf2cc

Fix classic decoder logits

38dedcd

Clean up

4cdbc03

Add post_process_pose_estimation

95aae6d

Improve post_process_pose_estimation

2531c19

Use AutoBackbone

e06d678

Add support for MoE models

c4a7df1

Fix tests, improve num_experts%

b09592c

Improve variable names

04930ec

Fix merge

3432448

Make fixup

676aa5c

More improvements

547d0da

Improve post_process_pose_estimation

4435fd6

Compute centers and scales

db0e72b

Improve postprocessing

027100d

More improvements

fc8e5e0

Add cv2 to doc tests

c4ccdb6

NielsRogge added 5 commits May 6, 2024 15:16

Fix merge

3eb3865

Remove script

e09aa53

Improve conversion script

2203538

Add coco_to_pascal_voc

ee5f191

Add box_to_center_and_scale to image_transforms

dcd4401

NielsRogge requested a review from amyeroberts May 8, 2024 11:27

Update tests

97a0e09

NielsRogge added 3 commits May 11, 2024 14:23

Add integration test

d579009

Fix merge

4cfa299

Fix merge

9b8b4d1

SangbumChoi reviewed May 16, 2024

View reviewed changes

amyeroberts reviewed May 21, 2024

View reviewed changes

NielsRogge added 13 commits May 22, 2024 11:34

Address comments

13ee55f

Replace numpy by pytorch, improve docstrings

3b22ef8

Remove get_input_embeddings

4873d38

Address comments

b32c1aa

Move coco_to_pascal_voc

1a16aa6

Address comment

b84f23c

Fix style

20c44b9

Address comments

7aedeff

Fix test

65ee995

Address comment

f75119a

Merge remote-tracking branch 'upstream/main' into add_vitpose_autobac…

d761e81

…kbone

Remove udp

8588a0c

Remove comment

6238277

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ViTPose #30530

Add ViTPose #30530

NielsRogge commented Apr 28, 2024 •

edited

SangbumChoi commented Apr 30, 2024

amyeroberts commented May 8, 2024

amyeroberts commented May 14, 2024

NielsRogge commented May 16, 2024 •

edited

SangbumChoi May 16, 2024

SangbumChoi May 16, 2024

SangbumChoi May 16, 2024

amyeroberts left a comment

amyeroberts May 21, 2024

amyeroberts May 21, 2024

amyeroberts May 21, 2024



		name_to_path = {
		"vitpose-base-simple": "/Users/nielsrogge/Documents/ViTPose/vitpose-b-simple.pth",

	"vitpose-base-simple": "/Users/nielsrogge/Documents/ViTPose/vitpose-b-simple.pth",
	"vitpose-base-simple": "https:\/\/4mjpca.sn.files.1drv.com\/y4mip6jbupeZ3YzICoNJYUb6yGEheWXkicKj0tvp1Sfq8BztlH8ieD63z2ZRYiTBzvDxKXFqd_wa5m8NHnBsURmpClZySMSJjS3hxrU2bFArawJ5mAVZsni4LmsfWs_K1dnIzDumXXuanSopYKm0O-Bx5z4JerIfGoE6riAtY_ni5_paFl46jGTE82U8J10Cm3gxHv2DSfOkrgV7SkmUKvnjg\/vitpose-b-simple.pth?download&psid=1",

Add ViTPose #30530

Are you sure you want to change the base?

Add ViTPose #30530

Conversation

NielsRogge commented Apr 28, 2024 • edited

What does this PR do?

SangbumChoi commented Apr 30, 2024

amyeroberts commented May 8, 2024

amyeroberts commented May 14, 2024

NielsRogge commented May 16, 2024 • edited

SangbumChoi May 16, 2024

Choose a reason for hiding this comment

SangbumChoi May 16, 2024

Choose a reason for hiding this comment

SangbumChoi May 16, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts May 21, 2024

Choose a reason for hiding this comment

amyeroberts May 21, 2024

Choose a reason for hiding this comment

amyeroberts May 21, 2024

Choose a reason for hiding this comment

NielsRogge commented Apr 28, 2024 •

edited

NielsRogge commented May 16, 2024 •

edited