Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline of "text-generation" with model "meta-llama/Llama-2-7b-chat-hf" doesn't respect temperature #25326

Closed
2 of 4 tasks
kechan opened this issue Aug 5, 2023 · 5 comments
Closed
2 of 4 tasks

Comments

@kechan
Copy link

kechan commented Aug 5, 2023

System Info

  • transformers version: 4.28.1
  • Platform: macOS-13.5-arm64-arm-64bit
  • Python version: 3.9.6
  • Huggingface_hub version: 0.13.3
  • Safetensors version: 0.3.1
  • PyTorch version (GPU?): 2.1.0.dev20230331 (False)
  • Tensorflow version (GPU?): 2.12.0 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. instantiate tokenizer and model with "meta-llama/Llama-2-7b-chat-hf"
  2. instantiate a pipeline("text-generation", model, tokenizer, torch_dtype=torch.float16, device=torch.device('mps')
  3. Run: pipeline("what is the recipe of mayonnaise?",
    temperature=0.9,
    top_k=50,
    top_p=0.9,
    max_length=500)
  4. Run it multiple times or with different temperature
  5. The generated text is always the same.

Expected behavior

Expect random variation in generated text with each run.

@kechan
Copy link
Author

kechan commented Aug 5, 2023

model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = LlamaTokenizer.from_pretrained(model_name, use_auth_token=access_token)
model = LlamaForCausalLM.from_pretrained(model_name, use_auth_token=access_token)
pipeline = transformers.pipeline("text-generation", 
                                 model=model,
                                 tokenizer=tokenizer,                                 
                                 torch_dtype=torch.float16, 
                                device = torch.device('mps', index=0)
                                 )
sequences = pipeline("what is the recipe of mayonnaise?", 
                    temperature=0.9, 
                                top_k=50, 
                                top_p=0.9,
                     max_length=500)

for seq in sequences:
  print(seq['generated_text'])

@amyeroberts
Copy link
Collaborator

amyeroberts commented Aug 7, 2023

Hi @kechan, thanks for raising this issue!

You should pass do_sample=True to the pipeline to sample from the logits, otherwise greedy decoding is used for this model.

cc @ArthurZucker

@kechan
Copy link
Author

kechan commented Aug 7, 2023

@amyeroberts Thanks for the help. I totally missed this one! wonder why this is needed if this is known from values of T, top_k, or top_p.

@amyeroberts
Copy link
Collaborator

Best short explanation is this comment: #22405 (comment)

Generate is a very powerful functionality that's had lots of arguments and logic added over time. @gante's doing a lot of work - refactoring, docs, demos - to make this easier for people to use, but there's always a balance between simplifying and keeping backwards compatibility of behaviours :)

@github-actions
Copy link

github-actions bot commented Sep 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants