Add LLaVa 1.6 #29012

NielsRogge · 2024-02-14T06:43:31Z

What does this PR do?

This PR adds the new LLaVa 1.6 model.

To do:

not sure how batched generation works => is supported in case image_sizes are the same for all images
make image_sizes a tensor instead of a list
make sure llava 1.5 still works

HuggingFaceDocBuilderDev · 2024-02-14T20:47:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for iterating on adding this model!

The image processor still needs a bit of work. Namely, numpy equivalents of PIL methods should be written instead of converting back and forth between numpy and PIL which is very inefficient and also causes breaks in assumptions about the data format of the images as they go through the processor.

Please make sure the look over the PR before asking for review - there's a file which should be included in the PR.

Modified forward pass looks OK to me in terms of logic but does make more complex than before. Would be good to have @ArthurZucker's input there if it's OK.

amyeroberts · 2024-03-04T13:41:16Z

src/transformers/models/llava/convert_llava_1_6_to_hf.py

+        dim=0,
+    )
+
+    device = "cuda:3"


to be resolved

amyeroberts · 2024-03-04T13:41:36Z

src/transformers/models/llava/convert_llava_1_6_to_hf.py

+
+    # verify inputs
+    if model_id == "liuhaotian/llava-v1.6-mistral-7b":
+        # replace -200 by 32000 (ask Arthur)


The comment should reflect this

src/transformers/models/llava/test_image_processor.py

src/transformers/models/llava/image_processing_llava.py

NielsRogge · 2024-03-04T17:04:19Z

Thanks, I've addressed all comments!

Will update the conversion script to use cuda:0 when all checkpoints are uploaded (34b models still need to be uploaded)

ArthurZucker

🔴 changes to the LlavaForCausalLM are just to big to say it's still Llava. We can either have a new forward, or we do what we always do, add a new "Architecture" to properly seperate both. I don't mind having it in the same file, but the expected input is different from normal Llava.
🔴 changes are not explained in the slightest, so coming from the llava code I wrote I have no idea what is happening, why we need to iterate of the images etc. This needs to be explained and needs a separate code-path
🟢 on pretty much the other changes (processor tests etc) and leave it to @amyeroberts for them!

ArthurZucker · 2024-03-05T07:43:22Z

src/transformers/models/llava/modeling_llava.py

+
+                    # NOTE we only support multimodal_patch_merge_type == "spatial_unpad"
+                    new_image_features = []
+                    for image_idx, image_feature in enumerate(image_features):


the goal is to not follow the original implementation. Which means more work for use of course but we did not follow the original implementation either and successfully vectorized the forward pass 😉

NielsRogge · 2024-03-08T08:04:34Z

Ok thanks for your review.

🔴 changes to the LlavaForCausalLM are just to big to say it's still Llava. We can either have a new forward, or we do what we always do, add a new "Architecture" to properly seperate both. I don't mind having it in the same file, but the expected input is different from normal Llava.

I feel like adding a whole new architecture is a bit of an overkill, as the only change of llava 1.6 is that pixel values are now of shape 5 instead of 4 (batch_size, num_patches, num_channels, height, width). Llava 1.6 is also still part of the same paper, they just extend it to work on arbitrary image resolutions.

Could you clarify what you mean by a new forward, since the model LlavaForConditionalGeneration can only have a single forward method defined, I assume. Do you mean defining a whole new class as well?

🔴 changes are not explained in the slightest, so coming from the llava code I wrote I have no idea what is happening, why we need to iterate of the images etc. This needs to be explained and needs a separate code-path

I'll add some more comments to explain what's happening.

NielsRogge · 2024-03-17T19:16:25Z

Closing this PR in favor of #29586

NielsRogge added 14 commits February 11, 2024 10:19

First draft

de62574

More improvements

32e8de8

More improvements

4675e20

More improvements

621f956

Improve conversion script

e47d690

Improve conversion script

3af438b

Improve script

5be6e77

Update script

f6fe6ca

More improvements

fd68ed9

Convert logits

9be0258

Add generation

66b7ebc

More improvements

9790272

More improvements

908c122

Make image_sizes a tensor

11a5902

jhc13 mentioned this pull request Feb 16, 2024

Model Request: Addition of new Lava-1.6 models jhc13/taggui#50

Closed

NielsRogge added 14 commits February 16, 2024 20:56

Add support for batched generation

ef82dc5

Use appropriate prompt

adb84f0

More improvements

37e6d16

Make fixup

28405c5

Fix docstrings

c2848ad

Improve conversion script

92228b0

More improvements

b8911d9

Make fixup

e44c47b

Debug

05a5cfe

Merge remote-tracking branch 'upstream/main' into add_llava_1_6

7a71fa2

More improvements

9fa5c0a

Support padding of image

dc46dc1

Merge remote-tracking branch 'upstream/main' into add_llava_1_6

eaf307e

Remove unused image_aspect_ratio

31978d8

NielsRogge added 2 commits March 2, 2024 15:02

Address comment

297d6e0

Rename attribute

77ecb17

nxphi47 mentioned this pull request Mar 2, 2024

LlaVA in MLX ml-explore/mlx-examples#461

Merged

NielsRogge added 8 commits March 2, 2024 15:10

Address comment

c889558

Use height width everywhere

ecbe64b

Use pad

7b4da2f

Rename variables

ce2ea8c

Add image processor tests

a0997ae

Improve tests

947ff66

Improve image processor

a47d4ff

Update modeling

7b91acd

NielsRogge mentioned this pull request Mar 3, 2024

ProcessorMixin doesn't properly instantiate image processors #29414

Open

NielsRogge added 3 commits March 4, 2024 08:48

Address comment

e0dea6a

Address comment

071be69

Make fixup

3539acc

amyeroberts reviewed Mar 4, 2024

View reviewed changes

NielsRogge added 7 commits March 4, 2024 15:27

Address comments

2f6e28d

Add resample and input_data_format

05e3611

Address comments

1ccb416

Fix image processor tests

0f1357a

Add data_format

c5db76e

Add type hints

a70df67

Remove script

0360dbe

ArthurZucker reviewed Mar 5, 2024

View reviewed changes

Test batched generation

adb4a82

NielsRogge mentioned this pull request Mar 11, 2024

Add LLaVa-1.6, bis #29586

Merged

1 task

NielsRogge closed this Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLaVa 1.6 #29012

Add LLaVa 1.6 #29012

NielsRogge commented Feb 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 14, 2024

amyeroberts left a comment

amyeroberts Mar 4, 2024

amyeroberts Mar 4, 2024

NielsRogge commented Mar 4, 2024

ArthurZucker left a comment •

edited

Loading

ArthurZucker Mar 5, 2024

NielsRogge commented Mar 8, 2024 •

edited

Loading

NielsRogge commented Mar 17, 2024

Add LLaVa 1.6 #29012

Add LLaVa 1.6 #29012

Conversation

NielsRogge commented Feb 14, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Feb 14, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Mar 4, 2024

Choose a reason for hiding this comment

amyeroberts Mar 4, 2024

Choose a reason for hiding this comment

NielsRogge commented Mar 4, 2024

ArthurZucker left a comment • edited Loading

Choose a reason for hiding this comment

ArthurZucker Mar 5, 2024

Choose a reason for hiding this comment

NielsRogge commented Mar 8, 2024 • edited Loading

NielsRogge commented Mar 17, 2024

NielsRogge commented Feb 14, 2024 •

edited

Loading

ArthurZucker left a comment •

edited

Loading

NielsRogge commented Mar 8, 2024 •

edited

Loading