Fix `_merge_input_ids_with_image_features` for llava model #28333

VictorSanh · 2024-01-03T23:07:18Z

The method LlavaForConditionalGeneration._merge_input_ids_with_image_features takes care of merging the input_embeds with the hidden states obtained from the vision encoder. The merge output is fed to the language model part of the model.

However, labels was omitted from the merge, and when trying to compute a loss, the shapes of the logits and the labels are not compatible.

This fix ensures that labels is also properly merged.

Dummy reproduction case (still respect the model hidden sizes):

import torch
from transformers import LlavaForConditionalGeneration
model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-13b-hf")

pixel_values = torch.randn(
    (2, 3, 336, 336),
    dtype=torch.float
)
input_ids = torch.tensor(
    [
        [32001, 32001, 1, 15043,  7084, 32000, 29871,    13, 7900],
        [1, 15043,  7084, 29901, 29871, 32000, 29871,    13, 7900]
    ], dtype=torch.long
)
attention_mask = torch.tensor(
    [
        [0, 0, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1]
    ], dtype=torch.long
)

output = model(
    pixel_values=pixel_values,
    input_ids=input_ids,
    attention_mask=attention_mask,
    labels=input_ids,
)

will yield the following error without the fix

    output = model(
  File "/victor/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/victor/code/transformers/src/transformers/models/llava/modeling_llava.py", line 486, in forward
    shift_labels = labels[..., 1:][shift_attention_mask.to(labels.device) != 0].contiguous()
IndexError: The shape of the mask [2, 583] at index 1 does not match the shape of the indexed tensor [2, 8] at index 1

cc @gullalc @younesbelkada @amyeroberts

src/transformers/models/llava/modeling_llava.py

ArthurZucker

Thanks for adding the support for training 😉
Let's add a test as well

src/transformers/models/llava/modeling_llava.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

VictorSanh · 2024-01-04T12:46:12Z

Adressed the comments and moved the dummy test case into proper tests.
let me know if you would like something more involved test wise!

ArthurZucker

LGTM thanks, let's also make sure that loss.backward() works (we usually have an automatic test for this here

transformers/tests/test_modeling_common.py

Line 646 in 90224dd

def test_training(self):

tests/models/llava/test_modeling_llava.py

younesbelkada

Thanks for working on this! I left a single comment about the test

tests/models/llava/test_modeling_llava.py

tests/models/vipllava/test_modeling_vipllava.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

tests/models/vipllava/test_modeling_vipllava.py

younesbelkada

Awesome work @VictorSanh ! Thanks a lot for the fix!

ArthurZucker · 2024-01-09T20:45:54Z

Good to go ! Feel free to merge @VictorSanh 🤗

VictorSanh · 2024-01-09T21:21:53Z

I am not cool enough to have merge access. The time where i am merging stuff whenever i wanted on hf transformers is well passed haha

VictorSanh · 2024-01-09T21:22:50Z

so either you @ArthurZucker or @younesbelkada need to merge lol 😅
but perhaps i can be promoted to core maintainer with that PR @LysandreJik ?

ArthurZucker · 2024-01-10T07:34:18Z

Ooops 🤣

LysandreJik · 2024-01-11T15:15:54Z

Considering it @VictorSanh!

…ce#28333) * fix `_merge_input_ids_with_image_features` for llava model * Update src/transformers/models/llava/modeling_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * adress comments * style and tests * ooops * test the backward too * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update tests/models/vipllava/test_modeling_vipllava.py * style and quality --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

alexandrosXe · 2024-01-30T13:38:28Z

Hi,
I am still getting the following error when I'm trying to finetune the model for ConditionalGeneration in the forward call:

final_labels[batch_indices, text_to_overwrite] = labels[batch_indices, non_image_indices]
IndexError: index 8 is out of bounds for dimension 1 with size 8

The same code works fine if I just change the model to another VLLM like InstructBlip.

Thank you and kind regards,
Alexandros Xenos

VictorSanh · 2024-01-30T14:37:31Z

@alexandrosXe do you have a reproduction case we can start debugging from?

alexandrosXe · 2024-01-30T16:26:49Z

@alexandrosXe do you have a reproduction case we can start debugging from?
@VictorSanh Thank you for replying so fast!
This code can reproduce my error:

from PIL import Image
import requests
from transformers import AutoProcessor, LlavaForConditionalGeneration

model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf")
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")

prompt = "<image>\nUSER: What's the content of the image?\nASSISTANT:"
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image = Image.open(requests.get(url, stream=True).raw)
answer = "The image has a stop sign in the corner of the road"


inputs = processor(text=prompt, images=image, return_tensors="pt")

labels = processor.tokenizer(answer, return_tensors="pt")
label_ids = labels["input_ids"]
label_mask = labels["attention_mask"].bool()
label_ids = label_ids.masked_fill(~label_mask, -100) #We dont count the loss on the padded tokens
loss = model(**inputs, labels = label_ids).loss
print("loss: ", loss)

ArthurZucker · 2024-01-31T01:13:16Z

Can you also make sure you are using the latest version of transformers?

alexandrosXe · 2024-01-31T09:29:23Z

Can you also make sure you are using the latest version of transformers?

@ArthurZucker I am using the transformers 4.37.2 version.

ArthurZucker · 2024-01-31T13:04:32Z

Yes, it seems oss = model(**inputs, labels = inputs["input_ids"]) works well however. Loss was made to be of the same size as the input ids:

        labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):

            Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
            config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
            (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.

from the doc it should be length sequence length

alexandrosXe · 2024-02-01T13:06:05Z

@ArthurZucker Thank you, it was my fault using wrong labels. Now everything works fine!

fix _merge_input_ids_with_image_features for llava model

24ac001

VictorSanh commented Jan 3, 2024

View reviewed changes

src/transformers/models/llava/modeling_llava.py Outdated Show resolved Hide resolved

ArthurZucker reviewed Jan 4, 2024

View reviewed changes

src/transformers/models/llava/modeling_llava.py Outdated Show resolved Hide resolved

src/transformers/models/llava/modeling_llava.py Outdated Show resolved Hide resolved

VictorSanh and others added 3 commits January 4, 2024 11:50

Update src/transformers/models/llava/modeling_llava.py

28be7e1

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

adress comments

ab7dd6c

style and tests

1422a80

ooops

18f43d6

ArthurZucker approved these changes Jan 5, 2024

View reviewed changes

tests/models/llava/test_modeling_llava.py Outdated Show resolved Hide resolved

test the backward too

3dac742

younesbelkada reviewed Jan 8, 2024

View reviewed changes

tests/models/llava/test_modeling_llava.py Outdated Show resolved Hide resolved

VictorSanh commented Jan 8, 2024

View reviewed changes

tests/models/vipllava/test_modeling_vipllava.py Outdated Show resolved Hide resolved

Apply suggestions from code review

25674f0

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

VictorSanh commented Jan 8, 2024

View reviewed changes

tests/models/vipllava/test_modeling_vipllava.py Outdated Show resolved Hide resolved

VictorSanh added 2 commits January 8, 2024 10:52

Update tests/models/vipllava/test_modeling_vipllava.py

c16d58c

style and quality

c3cab27

younesbelkada approved these changes Jan 9, 2024

View reviewed changes

younesbelkada merged commit 0f2f0c6 into huggingface:main Jan 10, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `_merge_input_ids_with_image_features` for llava model #28333

Fix `_merge_input_ids_with_image_features` for llava model #28333

VictorSanh commented Jan 3, 2024 •

edited

ArthurZucker left a comment

VictorSanh commented Jan 4, 2024

ArthurZucker left a comment

younesbelkada left a comment

younesbelkada left a comment

ArthurZucker commented Jan 9, 2024

VictorSanh commented Jan 9, 2024

VictorSanh commented Jan 9, 2024

ArthurZucker commented Jan 10, 2024

LysandreJik commented Jan 11, 2024

alexandrosXe commented Jan 30, 2024

VictorSanh commented Jan 30, 2024

alexandrosXe commented Jan 30, 2024

ArthurZucker commented Jan 31, 2024

alexandrosXe commented Jan 31, 2024

ArthurZucker commented Jan 31, 2024

alexandrosXe commented Feb 1, 2024

Fix _merge_input_ids_with_image_features for llava model #28333

Fix _merge_input_ids_with_image_features for llava model #28333

Conversation

VictorSanh commented Jan 3, 2024 • edited

ArthurZucker left a comment

Choose a reason for hiding this comment

VictorSanh commented Jan 4, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

ArthurZucker commented Jan 9, 2024

VictorSanh commented Jan 9, 2024

VictorSanh commented Jan 9, 2024

ArthurZucker commented Jan 10, 2024

LysandreJik commented Jan 11, 2024

alexandrosXe commented Jan 30, 2024

VictorSanh commented Jan 30, 2024

alexandrosXe commented Jan 30, 2024

ArthurZucker commented Jan 31, 2024

alexandrosXe commented Jan 31, 2024

ArthurZucker commented Jan 31, 2024

alexandrosXe commented Feb 1, 2024

Fix `_merge_input_ids_with_image_features` for llava model #28333

Fix `_merge_input_ids_with_image_features` for llava model #28333

VictorSanh commented Jan 3, 2024 •

edited