Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'eos_token_id' for llama model.generate is not working #24644

Closed
1 of 4 tasks
devymex opened this issue Jul 4, 2023 · 3 comments
Closed
1 of 4 tasks

'eos_token_id' for llama model.generate is not working #24644

devymex opened this issue Jul 4, 2023 · 3 comments

Comments

@devymex
Copy link

devymex commented Jul 4, 2023

System Info

  • transformers version: 4.30.2
  • Platform: Linux-5.4.0-137-generic-x86_64-with-glibc2.31
  • Python version: 3.10.0
  • Huggingface_hub version: 0.15.1
  • Safetensors version: 0.3.1
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import transformers, torch

weights_dir = "weights/recovered"
question = 'Hello, there!'

model = transformers.AutoModelForCausalLM.from_pretrained(weights_dir)
model = model.cuda()
print(model.config)
# LlamaConfig {
#   "_name_or_path": "weights/recovered",
#   "architectures": [
#     "LlamaForCausalLM"
#   ],
#   "bos_token_id": 1,
#   "eos_token_id": 2,
#   "hidden_act": "silu",
#   "hidden_size": 4096,
#   "initializer_range": 0.02,
#   "intermediate_size": 11008,
#   "max_position_embeddings": 2048,
#   "model_type": "llama",
#   "num_attention_heads": 32,
#   "num_hidden_layers": 32,
#   "pad_token_id": 0,
#   "rms_norm_eps": 1e-06,
#   "tie_word_embeddings": false,
#   "torch_dtype": "float32",
#   "transformers_version": "4.30.2",
#   "use_cache": true,
#   "vocab_size": 32001
# }

tokenizer = transformers.AutoTokenizer.from_pretrained(weights_dir)
question_ids = tokenizer.encode(question + tokenizer.eos_token, return_tensors='pt')
question_ids = question_ids.cuda()

print(tokenizer.eos_token_id, tokenizer.bos_token_id, tokenizer.pad_token_id)
# 2, 1, 32000

print(question_ids)
# tensor([[    1, 15043, 29892,   727, 29991,   829, 29879, 29958]],
       device='cuda:0')

print(tokenizer.decode(question_ids[0]))
# <s> Hello, there!</s>

outputs = model.generate(
        question_ids,
        eos_token_id=2,
        max_new_tokens=200,
        num_beams=4,
        num_return_sequences=2,
        early_stopping=True
    )
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
# Hello, there!</s>
# Hello, there!</s>
# <s>Hello, there!</s>

No matter how I changing the parameters of model.generate, it always ignores the </s> as the ending token (id:2).

In addition, the skip_special_tokens of tokenizer is not working too.

Where am I doing wrong? Please help, many thanks!

Expected behavior

The model.generate stop at the first time of </s>

@amyeroberts
Copy link
Collaborator

cc @ArthurZucker

@ArthurZucker
Copy link
Collaborator

Hey! A few things to note:

Here is a working snippet:

from transformers import LlamaTokenizer, AutoModelForCausalLM, AutoTokenizer
weights_dir = "huggyllama/llama-7b"
question = 'Hello, there!'

# if you want to add eos, set `add_eos_token=True`
tokenizer = LlamaTokenizer.from_pretrained(weights_dir, add_eos_token=True)
question_ids = tokenizer.encode(question, return_tensors='pt')
print(question_ids)
# tensor([[    1, 15043, 29892,   727, 29991,     2]])
print( tokenizer.decode(question_ids[0], skip_special_tokens = True))
# 'Hello, there!'


# if you are not using the correct version of tokenizer, special tokens are wrong
tokenizer = AutoTokenizer.from_pretrained(weights_dir, add_eos_token=True)
print(tokenizer.is_fast)
# True
question_ids = tokenizer.encode('Hello, there!</s>', return_tensors='pt')
print(question_ids)
# tensor([[    1, 15043, 29892,   727, 29991,   829, 29879, 29958,     2]])
question_ids = tokenizer.encode('Hello, there! </s>', return_tensors='pt')
# tensor([[    1, 15043, 29892,   727, 29991,     2,     2]])
print(question_ids)

@devymex
Copy link
Author

devymex commented Jul 6, 2023

@ArthurZucker Many thanks! add_eos_token=True did the trick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants