Generation with HybridCache fails (affecting Gemma-2) #31664

sanchit-gandhi · 2024-06-27T17:10:50Z

System Info

transformers version: 4.43.0.dev0
Platform: macOS-14.5-arm64-arm-64bit
Python version: 3.11.6
Huggingface_hub version: 0.23.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.3.0 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): 0.8.1 (cpu)
Jax version: 0.4.24
JaxLib version: 0.4.24

Who can help?

Reproduction

Generation of Gemma-2 currently fails on main, e.g. with the following toy codesnippet:

from transformers.models.gemma2 import Gemma2ForCausalLM, Gemma2Config
import torch

config = Gemma2Config(num_hidden_layers=1, vocab_size=128, hidden_size=16, intermediate_size=32, num_attention_heads=1, num_key_value_heads=1)
model = Gemma2ForCausalLM(config)

input_ids = torch.ones((1, 10), dtype=torch.int)
model.generate(input_ids)

Traceback:

Traceback (most recent call last):
  File "/Users/sanchitgandhi/transformers/debug_gemma2.py", line 8, in <module>
    model.generate(input_ids)
  File "/Users/sanchitgandhi/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sanchitgandhi/transformers/src/transformers/generation/utils.py", line 1914, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/Users/sanchitgandhi/transformers/src/transformers/generation/utils.py", line 2644, in _sample
    model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/sanchitgandhi/transformers/src/transformers/generation/utils.py", line 1409, in _get_initial_cache_position
    model_kwargs["cache_position"] = torch.arange(past_length, cur_len, device=input_ids.device)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: arange() received an invalid combination of arguments - got (NoneType, int, device=torch.device), but expected one of:
 * (Number end, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (Number start, Number end, *, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (Number start, Number end, Number step, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

Expected behavior

This is fixed by by handling the exception in generate (PR #31661).

cc @ArthurZucker

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-06-28T16:20:27Z

Thanks for fixing!

fst813 · 2024-07-03T02:08:31Z

transformer version: 4.42.3
I have another error:

  File "/home/ss/train_frame/LLaMA-Factory/src/train.py", line 30, in <module>
    main()
  File "/home/ss/train_frame/LLaMA-Factory/src/train.py", line 21, in main
    run_exp()
  File "/home/ss/train_frame/LLaMA-Factory/src/llamafactory/train/tuner.py", line 93, in run_exp
    run_exe(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/home/ss/train_frame/LLaMA-Factory/src/llamafactory/train/tuner.py", line 47, in run_exe
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/home/ss/train_frame/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 107, in run_sft
    predict_results = trainer.predict(dataset, metric_key_prefix="predict", **gen_kwargs)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 244, in predict
    return super().predict(test_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/transformers/trainer.py", line 3717, in predict
    output = eval_loop(
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/transformers/trainer.py", line 3826, in evaluation_loop
    losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/home/ss/train_frame/LLaMA-Factory/src/llamafactory/train/sft/trainer.py", line 99, in prediction_step
    loss, generated_tokens, _ = super().prediction_step(  # ignore the returned labels (may be truncated)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 310, in prediction_step
    generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/transformers/generation/utils.py", line 1914, in generate
    result = self._sample(
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/transformers/generation/utils.py", line 2651, in _sample
    outputs = self(
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/accelerate/utils/operations.py", line 789, in convert_to_fp32
    return recursively_apply(_convert_to_fp32, tensor, test_type=_is_fp16_bf16_tensor)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/accelerate/utils/operations.py", line 118, in recursively_apply
    {
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/accelerate/utils/operations.py", line 119, in <dictcomp>
    k: recursively_apply(
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/accelerate/utils/operations.py", line 126, in recursively_apply
    return func(data, *args, **kwargs)
  File "/home/ss/anaconda3-new/envs/train/lib/python3.10/site-packages/accelerate/utils/operations.py", line 781, in _convert_to_fp32
    return tensor.float()
AttributeError: 'HybridCache' object has no attribute 'float'

sanchit-gandhi mentioned this issue Jun 27, 2024

[HybridCache] Fix get_seq_length method #31661

Merged

LysandreJik closed this as completed in #31661 Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation with HybridCache fails (affecting Gemma-2) #31664

Generation with HybridCache fails (affecting Gemma-2) #31664

sanchit-gandhi commented Jun 27, 2024 •

edited

Loading

ArthurZucker commented Jun 28, 2024

fst813 commented Jul 3, 2024 •

edited

Loading

Generation with HybridCache fails (affecting Gemma-2) #31664

Generation with HybridCache fails (affecting Gemma-2) #31664

Comments

sanchit-gandhi commented Jun 27, 2024 • edited Loading

System Info

Who can help?

Reproduction

Expected behavior

ArthurZucker commented Jun 28, 2024

fst813 commented Jul 3, 2024 • edited Loading

sanchit-gandhi commented Jun 27, 2024 •

edited

Loading

fst813 commented Jul 3, 2024 •

edited

Loading