Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-1.5B eos_token NoneType Error prevents generation #31665

Closed
2 of 4 tasks
dcsuka opened this issue Jun 27, 2024 · 2 comments
Closed
2 of 4 tasks

Qwen2-1.5B eos_token NoneType Error prevents generation #31665

dcsuka opened this issue Jun 27, 2024 · 2 comments

Comments

@dcsuka
Copy link

dcsuka commented Jun 27, 2024

System Info

  • transformers version: 4.42.0
  • Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.31.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.0+cu121 (True)
  • Tensorflow version (GPU?): 2.13.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA A100-SXM4-40GB

Who can help?

@ArthurZucker
@gante

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run the following code using Qwen2-1.5B-Instruct downloaded from HuggingFace:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_kwargs = dict(
    use_cache=False,
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16,
    device_map=None
)
model = AutoModelForCausalLM.from_pretrained("/home/ubuntu/Qwen2-1.5B-Instruct", **model_kwargs).to('cuda')

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct", trust_remote_code=True)
tokenizer.padding_side = "left"
model.generation_config.pad_token_id = tokenizer.pad_token_id

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
messages = [{"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What are fun places to travel?"}]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    padding=True
).to(model.device)
outputs = model.generate(
    input_ids,
    max_new_tokens=16384,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=terminators,
    do_sample=False
)
tokenizer.decode(outputs[y][input_ids.shape[-1]:], skip_special_tokens = True)

Produces the following error:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_15236/399264495.py in <module>
      7     padding=True
      8 ).to(model.device)
----> 9 outputs = model.generate(
     10     input_ids,
     11     max_new_tokens=16384,

~/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
    113     def decorate_context(*args, **kwargs):
    114         with ctx_factory():
--> 115             return func(*args, **kwargs)
    116 
    117     return decorate_context

~/.local/lib/python3.10/site-packages/transformers/generation/utils.py in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   1662 
   1663         device = inputs_tensor.device
-> 1664         self._prepare_special_tokens(generation_config, kwargs_has_attention_mask, device=device)
   1665 
   1666         # decoder-only models must use left-padding for batched generation.

~/.local/lib/python3.10/site-packages/transformers/generation/utils.py in _prepare_special_tokens(self, generation_config, kwargs_has_attention_mask, device)
   1482             generation_config.bos_token_id, self.generation_config.bos_token_id, device=device
   1483         )
-> 1484         eos_token_id = _tensor_or_none(
   1485             generation_config.eos_token_id, self.generation_config.eos_token_id, device=device
   1486         )

~/.local/lib/python3.10/site-packages/transformers/generation/utils.py in _tensor_or_none(token_kwargs, token_self, device)
   1477             if token is None or isinstance(token, torch.Tensor):
   1478                 return token
-> 1479             return torch.tensor(token, device=device, dtype=torch.long)
   1480 
   1481         bos_token_id = _tensor_or_none(

TypeError: 'NoneType' object cannot be interpreted as an integer

Expected behavior

Proper generation would take place.

@dcsuka
Copy link
Author

dcsuka commented Jun 27, 2024

@dcsuka dcsuka closed this as completed Jun 27, 2024
@ArthurZucker
Copy link
Collaborator

thanks for updating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants