-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation issues with seq2seq LMs #23413
Comments
Hey @abarbet 👋 This issue may arise when beam search, sampling, and long outputs are used together. A potential bug on PyTorch itself compounds it. You can read the full story in this issue. TL;DR -- my immediate suggestion would be to avoid using |
Ah thank you, that issue is very helpful! Do you have any idea why we would see a similar error in The only thing I can think of if it's not caused by a sampling bug is some kind of destructive learning in the PPO step that causes token distributions to get completely out of whack. |
@abarbet It may be due to this PyTorch issue, where the sampling step may pick very low probability tokens that it shouldn't and, in turn, cause computations to derail. Try running your script with PT 1.x instead of 2.0! |
For me, this issue also occurs with pytorch 1.13.1 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hello, has a fix been found for this issue? Using the latest version of
edit: can confirm now that |
@yungsinatra0 The issue should only be gone with the next PT release (i.e. |
System Info
transformers
version: 4.27.1Who can help?
@ArthurZucker @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
This has most recently arisen in using
trlX
to do reinforcement learning onflan-T5
. I wrote an issue on their own repo, but there seems to be no response, and it is somewhat more suited to be an issue in this repo since it has to do withtransformers
code at its core.The main issue is that
generate
with a seq2seq model, namelyflan-t5
, sometimes generates the following error:RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
. This has been well documented in other issues like this one, but the behavior in that issue is more custom than callinggenerate
in its standard configuration.Here is a code example to reproduce:
NB:
temperature
seems to be one of the main causes of this issue, as removing this kwarg from the generate call does not produce the error in the above case. However, that is not true of all cases. I have seen the error in mytrlX
training loops with kwargs as simple as:{"max_new_tokens": 512, "do_sample": True, "top_k": 0, "top_p": 1}
. Thus it seems this error is not always related to temperature.Expected behavior
The expected behavior in this case would be for the sampling to work every time instead of having strange edge cases where tokens are unreachable.
The text was updated successfully, but these errors were encountered: