Idefics-2-base model fine-tuning throws indexing error #30464

rabiulcste · 2024-04-24T19:16:20Z

System Info

transformers version: 4.40.0.dev0
Platform: Linux-5.15.0-101-generic-x86_64-with-glibc2.17
Python version: 3.8.2
Huggingface_hub version: 0.20.2
Safetensors version: 0.4.2
Accelerate version: 0.29.2
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu118 (True)
Tensorflow version (GPU?): 2.13.1 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@amyeroberts

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Try running the colab script provided with HuggingFaceM4/idefics2-8b-base as model name. This behavior doesn't appear with the instruction-tuned checkpoint HuggingFaceM4/idefics2-8b https://colab.research.google.com/drive/1NtcTgRbSBKN7pYD3Vdx1j9m8pt3fhFDB?usp=sharing
Errors


../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.


tep
    loss = self.compute_loss(model, inputs)
  File "/lib/python3.8/site-packages/transformers/trainer.py", line 3160, in compute_l
oss
    outputs = model(**inputs)
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapp
ed_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_
impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/lib/python3.8/site-packages/transformers/models/idefics2/modeling_idefics2.py
", line 1823, in forward
    outputs = self.model(
  File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapp
ed_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_
impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/lib/python3.8/site-packages/transformers/models/idefics2/modeling_idefics2.py
", line 1609, in forward
    pixel_values = pixel_values.to(dtype=self.dtype)  # fp16 compatibility

Expected behavior

The model should be working fine as the instruction-tuned one. I suppose it might be some tokenization issue.

The text was updated successfully, but these errors were encountered:

BiliBraker · 2024-04-26T04:47:31Z

I have the same issue.

amyeroberts · 2024-04-26T15:19:58Z

Hi @rabiulcste @BiliBraker thanks for reporting!

cc @VictorSanh In case you have an immediate idea why this is happening?

jjkjkj · 2024-04-26T17:34:37Z

Problem you running into is that tokenizer for base model is incorrect and contains <end_of_utterance> token(prbably it's exactly the same as chat model), but base model's embedding layer doesn't have it. So if you reuse dataset/collator code for finetuning chat model and use processor.apply_chat_template function, it'll add non existing token_id and model's embedding layer will freak-out.

import torch
from transformers import AutoProcessor, Idefics2ForConditionalGeneration

processor_base = AutoProcessor.from_pretrained(
    "HuggingFaceM4/idefics2-8b-base"
)
processor_chat = AutoProcessor.from_pretrained(
    "HuggingFaceM4/idefics2-8b"
)

base = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b-base",
        torch_dtype=torch.float16
    )
chat = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b",
        torch_dtype=torch.float16
    )
    
print("Tokenizer chat max token:", max(processor_chat.tokenizer.get_vocab().values()))
print("Tokenizer base max token:", max(processor_base.tokenizer.get_vocab().values()))

print("chat embedding:", chat.base_model.get_submodule('text_model').get_submodule('embed_tokens'))
print("base embedding:", base.base_model.get_submodule('text_model').get_submodule('embed_tokens'))

print("last token:", processor_chat.tokenizer.convert_ids_to_tokens(max(processor_chat.tokenizer.get_vocab().values())))

Tokenizer chat max token: 32002
Tokenizer base max token: 32002
chat embedding: Embedding(32003, 4096, padding_idx=0)
base embedding: Embedding(32002, 4096, padding_idx=0)
last token: <end_of_utterance>

rabiulcste · 2024-04-26T19:43:27Z

@jjkjkj That's a good find. So, for now I just removed the token and it seem to be working

  text = processor.apply_chat_template(messages, add_generation_prompt=False)
  if "base" in args.model_name: # hack to remove the end of utterance token
      text = text.replace("<end_of_utterance>", "")

rabiulcste · 2024-04-26T19:47:14Z

I wanted to mention another issue in the same script. While lora is set to True, I get this error:

Traceback (most recent call last):
  File "/apps/arch/distro/python/3.8/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/apps/arch/distro/python/3.8/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "synth-diffuse/evals/idefics2_fine_tuning.py", line 299, in <module>
    main(args)
  File "synth-diffuse/evals/idefics2_fine_tuning.py", line 103, in main
    model.add_adapter(lora_config)
  File /lib/python3.8/site-packages/transformers/integrations/peft.py", line 264, in add_adapter
    inject_adapter_in_model(adapter_config, self, adapter_name)
  File "/lib/python3.8/site-packages/peft/mapping.py", line 166, in inject_adapter_in_model
    peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name)
  File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 136, in __init__
    super().__init__(model, config, adapter_name)
  File "/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 148, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 325, in inject_adapter
    self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)
  File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 220, in _create_and_replace
    new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
  File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 295, in _create_new_module
    new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)
  File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 1056, in dispatch_default
    new_module = Linear(target, adapter_name, **kwargs)
  File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 356, in __init__
    self.update_layer(
  File "/s/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 126, in update_layer
    self.dora_init(adapter_name)
  File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 191, in dora_init
    lora_weight = lora_B.weight @ lora_A.weight
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

It doesn't occur while QLora is set to True.

amyeroberts · 2024-04-29T09:34:29Z

@rabiulcste Can you open a new issue with this info? This helps us keep better track of what has and hasn't been resolved as well as finding similar issues

VictorSanh · 2024-04-29T14:41:07Z

cc @VictorSanh In case you have an immediate idea why this is happening?

Does not ring a bell unfortunately :/ need to focus on idefics2 2nd release wave but will for sure allocate time to dig in this week if it's not solved by then

rabiulcste · 2024-04-29T16:44:10Z

@rabiulcste Can you open a new issue with this info? This helps us keep better track of what has and hasn't been resolved as well as finding similar issues

Sure, I'll create a new issue then. I have a couple more issues though :) Is it suggested to create a separate issue for each?

amyeroberts · 2024-04-29T17:01:12Z

@rabiulcste Yes please, as long as they're independent.

amyeroberts · 2024-04-29T17:02:57Z

@VictorSanh No need to dig! Issue was found and explained by @jjkjkj here. It was to do with the presence of the <end_of_utterance> token for the base model.

In fact, we can now close this issue :)

amyeroberts added the Multimodal label Apr 25, 2024

amyeroberts closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idefics-2-base model fine-tuning throws indexing error #30464

Idefics-2-base model fine-tuning throws indexing error #30464

rabiulcste commented Apr 24, 2024

BiliBraker commented Apr 26, 2024

amyeroberts commented Apr 26, 2024

jjkjkj commented Apr 26, 2024

rabiulcste commented Apr 26, 2024

rabiulcste commented Apr 26, 2024

amyeroberts commented Apr 29, 2024

VictorSanh commented Apr 29, 2024

rabiulcste commented Apr 29, 2024

amyeroberts commented Apr 29, 2024

amyeroberts commented Apr 29, 2024

Idefics-2-base model fine-tuning throws indexing error #30464

Idefics-2-base model fine-tuning throws indexing error #30464

Comments

rabiulcste commented Apr 24, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BiliBraker commented Apr 26, 2024

amyeroberts commented Apr 26, 2024

jjkjkj commented Apr 26, 2024

rabiulcste commented Apr 26, 2024

rabiulcste commented Apr 26, 2024

amyeroberts commented Apr 29, 2024

VictorSanh commented Apr 29, 2024

rabiulcste commented Apr 29, 2024

amyeroberts commented Apr 29, 2024

amyeroberts commented Apr 29, 2024