Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLaVa 1.6 #29012

Closed
wants to merge 72 commits into from
Closed

Add LLaVa 1.6 #29012

wants to merge 72 commits into from

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Feb 14, 2024

What does this PR do?

This PR adds the new LLaVa 1.6 model.

To do:

  • not sure how batched generation works => is supported in case image_sizes are the same for all images
  • make image_sizes a tensor instead of a list
  • make sure llava 1.5 still works

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on adding this model!

The image processor still needs a bit of work. Namely, numpy equivalents of PIL methods should be written instead of converting back and forth between numpy and PIL which is very inefficient and also causes breaks in assumptions about the data format of the images as they go through the processor.

Please make sure the look over the PR before asking for review - there's a file which should be included in the PR.

Modified forward pass looks OK to me in terms of logic but does make more complex than before. Would be good to have @ArthurZucker's input there if it's OK.

dim=0,
)

device = "cuda:3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be resolved


# verify inputs
if model_id == "liuhaotian/llava-v1.6-mistral-7b":
# replace -200 by 32000 (ask Arthur)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should reflect this

src/transformers/models/llava/test_image_processor.py Outdated Show resolved Hide resolved
src/transformers/models/llava/image_processing_llava.py Outdated Show resolved Hide resolved
src/transformers/models/llava/image_processing_llava.py Outdated Show resolved Hide resolved
src/transformers/models/llava/image_processing_llava.py Outdated Show resolved Hide resolved
src/transformers/models/llava/image_processing_llava.py Outdated Show resolved Hide resolved
src/transformers/models/llava/image_processing_llava.py Outdated Show resolved Hide resolved
src/transformers/models/llava/image_processing_llava.py Outdated Show resolved Hide resolved
src/transformers/models/llava/image_processing_llava.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

Thanks, I've addressed all comments!

Will update the conversion script to use cuda:0 when all checkpoints are uploaded (34b models still need to be uploaded)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 🔴 changes to the LlavaForCausalLM are just to big to say it's still Llava. We can either have a new forward, or we do what we always do, add a new "Architecture" to properly seperate both. I don't mind having it in the same file, but the expected input is different from normal Llava.
  • 🔴 changes are not explained in the slightest, so coming from the llava code I wrote I have no idea what is happening, why we need to iterate of the images etc. This needs to be explained and needs a separate code-path
  • 🟢 on pretty much the other changes (processor tests etc) and leave it to @amyeroberts for them!


# NOTE we only support multimodal_patch_merge_type == "spatial_unpad"
new_image_features = []
for image_idx, image_feature in enumerate(image_features):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the goal is to not follow the original implementation. Which means more work for use of course but we did not follow the original implementation either and successfully vectorized the forward pass 😉

@NielsRogge
Copy link
Contributor Author

NielsRogge commented Mar 8, 2024

Ok thanks for your review.

🔴 changes to the LlavaForCausalLM are just to big to say it's still Llava. We can either have a new forward, or we do what we always do, add a new "Architecture" to properly seperate both. I don't mind having it in the same file, but the expected input is different from normal Llava.

I feel like adding a whole new architecture is a bit of an overkill, as the only change of llava 1.6 is that pixel values are now of shape 5 instead of 4 (batch_size, num_patches, num_channels, height, width). Llava 1.6 is also still part of the same paper, they just extend it to work on arbitrary image resolutions.

Could you clarify what you mean by a new forward, since the model LlavaForConditionalGeneration can only have a single forward method defined, I assume. Do you mean defining a whole new class as well?

🔴 changes are not explained in the slightest, so coming from the llava code I wrote I have no idea what is happening, why we need to iterate of the images etc. This needs to be explained and needs a separate code-path

I'll add some more comments to explain what's happening.

@NielsRogge NielsRogge mentioned this pull request Mar 11, 2024
1 task
@NielsRogge
Copy link
Contributor Author

Closing this PR in favor of #29586

@NielsRogge NielsRogge closed this Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants