-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior in generate when output_scores=True #17424
Comments
Great find @shijie-wu, We've settled on outputting the processed scores since those are the ones that determine the next token, e.g. argmax and sample is taken on those scores. Given the name of the flag ( It's a good point that people might need the "raw scores" though. I think it's sensible to output the output logits of the model in this case as this would be the most understandable & consistent across generation methods. E.g. every LM model outputs logits which is the "rawest" score, so I'd be fine with adding a |
@patrickvonplaten regarding flag for the logits: on paper yes... but we are starting to get many boolean flags to control the output of internal variables (related issue: #17016, where it is requested the output of past key values). I wonder whether there is a better way to collect and expose the internals of generate for advanced uses 🤔 |
For testing purpose, especially for PT/TF generation equivalence test, I think it would be better to be able to return the raw scores from the models --> so we can identify which parts get wrong if any test failure occur. |
Having |
Just a general comment that may seem obvious to some but I feel like it's always good to restate common options when dealing with such issues (rampant too many options + enabling users to do powerful things), #Idea number 1:
#Idea number 2:
#Idea number 3: In general for power users wanting to access internals, I think, enabling tons of options to flag what needs to be outputted is just asking for general computing as parameters. Exposing the internals seem like a better option. #Idea number 4: It's OK to say no, more is not always better. |
Thank you for sharing the options! Option 3 seems to be the fastest way to enable returning raw logits without any code change. However, I just go though the relevant path. It seems the user provided IMO, callbacks like custom |
System Info
main branch
Who can help?
@patrickvonplaten, @Narsil, @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
In
generate
whenoutput_scores=True
, the behavior is inconsistent. Ingreedy_search
mode, the scores are raw logitstransformers/src/transformers/generation_utils.py
Lines 1690 to 1695 in 740a157
but in
sample
mode (and various beam search modes), the scores are processed logitstransformers/src/transformers/generation_utils.py
Lines 1945 to 1954 in 740a157
Expected behavior
In
generate
whenoutput_scores=True
, the returned scores should be consistent. It could either be raw logits or the processed logits. While for my usecase, I only need raw logits. There might be some usecases which require the processed logits. So there're multiple options:output_scores=True
output_scores=True
output_scores=True
, and raw logits whenoutput_raw_scores=True
The text was updated successfully, but these errors were encountered: