'eos_token_id' for llama model.generate is not working #24644

devymex · 2023-07-04T10:57:12Z

System Info

transformers version: 4.30.2
Platform: Linux-5.4.0-137-generic-x86_64-with-glibc2.31
Python version: 3.10.0
Huggingface_hub version: 0.15.1
Safetensors version: 0.3.1
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import transformers, torch

weights_dir = "weights/recovered"
question = 'Hello, there!'

model = transformers.AutoModelForCausalLM.from_pretrained(weights_dir)
model = model.cuda()
print(model.config)
# LlamaConfig {
#   "_name_or_path": "weights/recovered",
#   "architectures": [
#     "LlamaForCausalLM"
#   ],
#   "bos_token_id": 1,
#   "eos_token_id": 2,
#   "hidden_act": "silu",
#   "hidden_size": 4096,
#   "initializer_range": 0.02,
#   "intermediate_size": 11008,
#   "max_position_embeddings": 2048,
#   "model_type": "llama",
#   "num_attention_heads": 32,
#   "num_hidden_layers": 32,
#   "pad_token_id": 0,
#   "rms_norm_eps": 1e-06,
#   "tie_word_embeddings": false,
#   "torch_dtype": "float32",
#   "transformers_version": "4.30.2",
#   "use_cache": true,
#   "vocab_size": 32001
# }

tokenizer = transformers.AutoTokenizer.from_pretrained(weights_dir)
question_ids = tokenizer.encode(question + tokenizer.eos_token, return_tensors='pt')
question_ids = question_ids.cuda()

print(tokenizer.eos_token_id, tokenizer.bos_token_id, tokenizer.pad_token_id)
# 2, 1, 32000

print(question_ids)
# tensor([[    1, 15043, 29892,   727, 29991,   829, 29879, 29958]],
       device='cuda:0')

print(tokenizer.decode(question_ids[0]))
# <s> Hello, there!</s>

outputs = model.generate(
        question_ids,
        eos_token_id=2,
        max_new_tokens=200,
        num_beams=4,
        num_return_sequences=2,
        early_stopping=True
    )
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
# Hello, there!</s>
# Hello, there!</s>
# <s>Hello, there!</s>

No matter how I changing the parameters of model.generate, it always ignores the </s> as the ending token (id:2).

In addition, the skip_special_tokens of tokenizer is not working too.

Where am I doing wrong? Please help, many thanks!

Expected behavior

The model.generate stop at the first time of </s>

The text was updated successfully, but these errors were encountered:

amyeroberts · 2023-07-04T14:59:18Z

cc @ArthurZucker

ArthurZucker · 2023-07-05T03:08:27Z

Hey! A few things to note:

LlamaTokenizerFast (which you are using through the AutoTokenizer API) has been fixed here [Lllama] Update tokenization code to ensure parsing of the special tokens [core] #24042, addressing the issue with special tokens being encode.
You are not sharing any repo, so we can't reproduce potential bugs.
it always ignores the </s> as the ending token what does that mean? Does the generation not stop? Then have a look here LLaMA FastTokenizer does not add eos_token_id at the end. #22794.
skip_special_tokens will work if you have the correct version of LlamaTokenizer.
If you wish to add the ending token in your prompt, set add_eos_token to True. It will be done automatically

Here is a working snippet:

from transformers import LlamaTokenizer, AutoModelForCausalLM, AutoTokenizer
weights_dir = "huggyllama/llama-7b"
question = 'Hello, there!'

# if you want to add eos, set `add_eos_token=True`
tokenizer = LlamaTokenizer.from_pretrained(weights_dir, add_eos_token=True)
question_ids = tokenizer.encode(question, return_tensors='pt')
print(question_ids)
# tensor([[    1, 15043, 29892,   727, 29991,     2]])
print( tokenizer.decode(question_ids[0], skip_special_tokens = True))
# 'Hello, there!'


# if you are not using the correct version of tokenizer, special tokens are wrong
tokenizer = AutoTokenizer.from_pretrained(weights_dir, add_eos_token=True)
print(tokenizer.is_fast)
# True
question_ids = tokenizer.encode('Hello, there!</s>', return_tensors='pt')
print(question_ids)
# tensor([[    1, 15043, 29892,   727, 29991,   829, 29879, 29958,     2]])
question_ids = tokenizer.encode('Hello, there! </s>', return_tensors='pt')
# tensor([[    1, 15043, 29892,   727, 29991,     2,     2]])
print(question_ids)

devymex · 2023-07-06T04:42:15Z

@ArthurZucker Many thanks! add_eos_token=True did the trick!

ArthurZucker closed this as completed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'eos_token_id' for llama model.generate is not working #24644

'eos_token_id' for llama model.generate is not working #24644

devymex commented Jul 4, 2023 •

edited

amyeroberts commented Jul 4, 2023

ArthurZucker commented Jul 5, 2023

devymex commented Jul 6, 2023

'eos_token_id' for llama model.generate is not working #24644

'eos_token_id' for llama model.generate is not working #24644

Comments

devymex commented Jul 4, 2023 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Jul 4, 2023

ArthurZucker commented Jul 5, 2023

devymex commented Jul 6, 2023

devymex commented Jul 4, 2023 •

edited