model.generate temperature parameter is completely ineffective #22405

AndreaSottana · 2023-03-27T16:47:42Z

System Info

transformers version: 4.26.1
Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.31
Python version: 3.10.4
Huggingface_hub version: 0.12.1
PyTorch version (GPU?): 1.13.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes

Who can help?

@gante @sg

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Hello,
I am trying to generate text using different models and different temperature parameters. I have noticed, however, that while changing hyperparameters such as num_beams affects the output text, changing the temperature parameter doesn't seem to do anything, and setting a temperature to 0.0 or 1.0 (very different) always leads to the same output. This has been observed across multiple different language models. I suspect this might be a bug such that the set temperature is not shown to the model. In order to reproduce, run the example below (feel free to try with different text samples to convince yourself it's not a one-off occurrence)

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoConfig
from accelerate import init_empty_weights, infer_auto_device_map
import torch

tokenizer = AutoTokenizer.from_pretrained('bigscience/T0pp')  # feel free to try a different model
config = AutoConfig.from_pretrained('bigscience/T0pp')

max_memory={i: "24GiB" for i in range(torch.cuda.device_count())}
with init_empty_weights():
    model = AutoModelForSeq2SeqLM.from_config(config)
    device_map = infer_auto_device_map(model, no_split_module_classes=['T5Block'])
    print(device_map)
    device_map['lm_head'] = 0

model = AutoModelForSeq2SeqLM.from_pretrained('bigscience/T0pp', device_map=device_map, load_in_8bit=True, max_memory=max_memory)
text = "Complete the following story: Once upon a time there was a "
input_ids = tokenizer.encode(text, return_tensors='pt').to(0)
for temp in [0.0, 1.0]:
    beam_outputs = model.generate(
        input_ids, 
        max_length=512,
        num_beams=5,
        no_repeat_ngram_size=4,
        temperature=temp,
        num_return_sequences=1, 
        early_stopping=True,
    )
    print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True))

Expected behavior

I would expect the two printed outputs to be different. I understand that occasionally they might be the same, but I've tried with over 1,000 different inputs, and the generated outputs with temperature=0 and temperature=1 are ALWAYS the same which means there is something wrong

The text was updated successfully, but these errors were encountered:

gante · 2023-03-27T17:11:38Z

Hey @AndreaSottana 👋

I would recommend reading our blog post on how to generate.

TL;DR -- there are several generation modes, and not all .generate() parameters are active for a given generation mode. In particular, the popular temperature, top_p, and top_k are only active when do_sample=True is also passed. Some tasks benefit from do_sample=True, while others do not. Popular tools like ChatGPT operate with sampling.

We are aware that our .generate() has too many options and too little checks/examples, we are working on it 🤗

AndreaSottana · 2023-03-28T14:26:05Z

Hi @gante
Thank you very much for clarifying, I wasn't aware that some parameters were not effective when do_sample=False. Closing the issue for now

AndreaSottana closed this as completed Mar 28, 2023

zetavg mentioned this issue Apr 17, 2023

Text looping when using inference and go full determinism zetavg/LLaMA-LoRA-Tuner#4

Closed

pseudotensor mentioned this issue Apr 22, 2023

input_ids are not moved to GPU h2oai/h2ogpt#62

Closed

JellePiepenbrock mentioned this issue Jun 5, 2023

How to use LogitsWarper within .generate()? #24021

Closed

4 tasks

amyeroberts mentioned this issue Aug 8, 2023

Pipeline of "text-generation" with model "meta-llama/Llama-2-7b-chat-hf" doesn't respect temperature #25326

Closed

4 tasks

Gnurro mentioned this issue Nov 8, 2023

double check huggingface backend, apparently does not get passed temperature parameter correctly clp-research/clembench#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model.generate temperature parameter is completely ineffective #22405

model.generate temperature parameter is completely ineffective #22405

AndreaSottana commented Mar 27, 2023

gante commented Mar 27, 2023

AndreaSottana commented Mar 28, 2023

model.generate temperature parameter is completely ineffective #22405

model.generate temperature parameter is completely ineffective #22405

Comments

AndreaSottana commented Mar 27, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

gante commented Mar 27, 2023

AndreaSottana commented Mar 28, 2023