Pipeline of "text-generation" with model "meta-llama/Llama-2-7b-chat-hf" doesn't respect temperature #25326

kechan · 2023-08-05T04:36:09Z

System Info

transformers version: 4.28.1
Platform: macOS-13.5-arm64-arm-64bit
Python version: 3.9.6
Huggingface_hub version: 0.13.3
Safetensors version: 0.3.1
PyTorch version (GPU?): 2.1.0.dev20230331 (False)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

instantiate tokenizer and model with "meta-llama/Llama-2-7b-chat-hf"
instantiate a pipeline("text-generation", model, tokenizer, torch_dtype=torch.float16, device=torch.device('mps')
Run: pipeline("what is the recipe of mayonnaise?",
temperature=0.9,
top_k=50,
top_p=0.9,
max_length=500)
Run it multiple times or with different temperature
The generated text is always the same.

Expected behavior

Expect random variation in generated text with each run.

The text was updated successfully, but these errors were encountered:

kechan · 2023-08-05T04:39:35Z

model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = LlamaTokenizer.from_pretrained(model_name, use_auth_token=access_token)
model = LlamaForCausalLM.from_pretrained(model_name, use_auth_token=access_token)

pipeline = transformers.pipeline("text-generation", 
                                 model=model,
                                 tokenizer=tokenizer,                                 
                                 torch_dtype=torch.float16, 
                                device = torch.device('mps', index=0)
                                 )

sequences = pipeline("what is the recipe of mayonnaise?", 
                    temperature=0.9, 
                                top_k=50, 
                                top_p=0.9,
                     max_length=500)

for seq in sequences:
  print(seq['generated_text'])

amyeroberts · 2023-08-07T16:28:14Z

Hi @kechan, thanks for raising this issue!

You should pass do_sample=True to the pipeline to sample from the logits, otherwise greedy decoding is used for this model.

cc @ArthurZucker

kechan · 2023-08-07T20:35:33Z

@amyeroberts Thanks for the help. I totally missed this one! wonder why this is needed if this is known from values of T, top_k, or top_p.

amyeroberts · 2023-08-08T08:58:14Z

Best short explanation is this comment: #22405 (comment)

Generate is a very powerful functionality that's had lots of arguments and logic added over time. @gante's doing a lot of work - refactoring, docs, demos - to make this easier for people to use, but there's always a balance between simplifying and keeping backwards compatibility of behaviours :)

github-actions · 2023-09-04T08:02:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline of "text-generation" with model "meta-llama/Llama-2-7b-chat-hf" doesn't respect temperature #25326

Pipeline of "text-generation" with model "meta-llama/Llama-2-7b-chat-hf" doesn't respect temperature #25326

kechan commented Aug 5, 2023

kechan commented Aug 5, 2023

amyeroberts commented Aug 7, 2023 •

edited

Loading

kechan commented Aug 7, 2023

amyeroberts commented Aug 8, 2023

github-actions bot commented Sep 4, 2023

Pipeline of "text-generation" with model "meta-llama/Llama-2-7b-chat-hf" doesn't respect temperature #25326

Pipeline of "text-generation" with model "meta-llama/Llama-2-7b-chat-hf" doesn't respect temperature #25326

Comments

kechan commented Aug 5, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

kechan commented Aug 5, 2023

amyeroberts commented Aug 7, 2023 • edited Loading

kechan commented Aug 7, 2023

amyeroberts commented Aug 8, 2023

github-actions bot commented Sep 4, 2023

amyeroberts commented Aug 7, 2023 •

edited

Loading