Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idefics-2-base model fine-tuning throws indexing error #30464

Closed
2 of 4 tasks
rabiulcste opened this issue Apr 24, 2024 · 10 comments
Closed
2 of 4 tasks

Idefics-2-base model fine-tuning throws indexing error #30464

rabiulcste opened this issue Apr 24, 2024 · 10 comments

Comments

@rabiulcste
Copy link

System Info

  • transformers version: 4.40.0.dev0
  • Platform: Linux-5.15.0-101-generic-x86_64-with-glibc2.17
  • Python version: 3.8.2
  • Huggingface_hub version: 0.20.2
  • Safetensors version: 0.4.2
  • Accelerate version: 0.29.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.2+cu118 (True)
  • Tensorflow version (GPU?): 2.13.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@amyeroberts

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Try running the colab script provided with HuggingFaceM4/idefics2-8b-base as model name. This behavior doesn't appear with the instruction-tuned checkpoint HuggingFaceM4/idefics2-8b https://colab.research.google.com/drive/1NtcTgRbSBKN7pYD3Vdx1j9m8pt3fhFDB?usp=sharing
  2. Errors

../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [126,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDim
Size` failed.


tep
    loss = self.compute_loss(model, inputs)
  File "/lib/python3.8/site-packages/transformers/trainer.py", line 3160, in compute_l
oss
    outputs = model(**inputs)
  File "lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapp
ed_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_
impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/lib/python3.8/site-packages/transformers/models/idefics2/modeling_idefics2.py
", line 1823, in forward
    outputs = self.model(
  File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapp
ed_call_impl
    return self._call_impl(*args, **kwargs)
  File "/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_
impl
    return forward_call(*args, **kwargs)
  File "/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/lib/python3.8/site-packages/transformers/models/idefics2/modeling_idefics2.py
", line 1609, in forward
    pixel_values = pixel_values.to(dtype=self.dtype)  # fp16 compatibility

Expected behavior

The model should be working fine as the instruction-tuned one. I suppose it might be some tokenization issue.

@BiliBraker
Copy link

I have the same issue.

@amyeroberts
Copy link
Collaborator

Hi @rabiulcste @BiliBraker thanks for reporting!

cc @VictorSanh In case you have an immediate idea why this is happening?

@jjkjkj
Copy link

jjkjkj commented Apr 26, 2024

Problem you running into is that tokenizer for base model is incorrect and contains <end_of_utterance> token(prbably it's exactly the same as chat model), but base model's embedding layer doesn't have it. So if you reuse dataset/collator code for finetuning chat model and use processor.apply_chat_template function, it'll add non existing token_id and model's embedding layer will freak-out.

import torch
from transformers import AutoProcessor, Idefics2ForConditionalGeneration

processor_base = AutoProcessor.from_pretrained(
    "HuggingFaceM4/idefics2-8b-base"
)
processor_chat = AutoProcessor.from_pretrained(
    "HuggingFaceM4/idefics2-8b"
)

base = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b-base",
        torch_dtype=torch.float16
    )
chat = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b",
        torch_dtype=torch.float16
    )
    
print("Tokenizer chat max token:", max(processor_chat.tokenizer.get_vocab().values()))
print("Tokenizer base max token:", max(processor_base.tokenizer.get_vocab().values()))

print("chat embedding:", chat.base_model.get_submodule('text_model').get_submodule('embed_tokens'))
print("base embedding:", base.base_model.get_submodule('text_model').get_submodule('embed_tokens'))

print("last token:", processor_chat.tokenizer.convert_ids_to_tokens(max(processor_chat.tokenizer.get_vocab().values())))

Tokenizer chat max token: 32002
Tokenizer base max token: 32002
chat embedding: Embedding(32003, 4096, padding_idx=0)
base embedding: Embedding(32002, 4096, padding_idx=0)
last token: <end_of_utterance>

@rabiulcste
Copy link
Author

@jjkjkj That's a good find. So, for now I just removed the token and it seem to be working

  text = processor.apply_chat_template(messages, add_generation_prompt=False)
  if "base" in args.model_name: # hack to remove the end of utterance token
      text = text.replace("<end_of_utterance>", "")

@rabiulcste
Copy link
Author

I wanted to mention another issue in the same script. While lora is set to True, I get this error:

Traceback (most recent call last):
  File "/apps/arch/distro/python/3.8/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/apps/arch/distro/python/3.8/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "synth-diffuse/evals/idefics2_fine_tuning.py", line 299, in <module>
    main(args)
  File "synth-diffuse/evals/idefics2_fine_tuning.py", line 103, in main
    model.add_adapter(lora_config)
  File /lib/python3.8/site-packages/transformers/integrations/peft.py", line 264, in add_adapter
    inject_adapter_in_model(adapter_config, self, adapter_name)
  File "/lib/python3.8/site-packages/peft/mapping.py", line 166, in inject_adapter_in_model
    peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name)
  File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 136, in __init__
    super().__init__(model, config, adapter_name)
  File "/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 148, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 325, in inject_adapter
    self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)
  File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 220, in _create_and_replace
    new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
  File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 295, in _create_new_module
    new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)
  File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 1056, in dispatch_default
    new_module = Linear(target, adapter_name, **kwargs)
  File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 356, in __init__
    self.update_layer(
  File "/s/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 126, in update_layer
    self.dora_init(adapter_name)
  File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 191, in dora_init
    lora_weight = lora_B.weight @ lora_A.weight
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

It doesn't occur while QLora is set to True.

@amyeroberts
Copy link
Collaborator

@rabiulcste Can you open a new issue with this info? This helps us keep better track of what has and hasn't been resolved as well as finding similar issues

@VictorSanh
Copy link
Member

cc @VictorSanh In case you have an immediate idea why this is happening?

Does not ring a bell unfortunately :/ need to focus on idefics2 2nd release wave but will for sure allocate time to dig in this week if it's not solved by then

@rabiulcste
Copy link
Author

@rabiulcste Can you open a new issue with this info? This helps us keep better track of what has and hasn't been resolved as well as finding similar issues

Sure, I'll create a new issue then. I have a couple more issues though :) Is it suggested to create a separate issue for each?

@amyeroberts
Copy link
Collaborator

@rabiulcste Yes please, as long as they're independent.

@amyeroberts
Copy link
Collaborator

@VictorSanh No need to dig! Issue was found and explained by @jjkjkj here. It was to do with the presence of the <end_of_utterance> token for the base model.

In fact, we can now close this issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants