-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usage of past_key_values produces different output than the whole sequence at once #26344
Comments
Hey! Thanks for opening and issue. This is pretty much a duplicate of #25420, where we deep dive into this! |
This comment was marked as spam.
This comment was marked as spam.
Hey @IvanSedykh 👋 As Arthur wrote, this is a duplicate of #25420 -- you can find a detailed answer here |
Hi @gante ! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers 4.33.1
Who can help?
@ArthurZucker @younesbelkada @gan
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
when I use
past_key_values
the model produces not the same logits as when I input the whole sequence at once.Please, follow the code snippet below for more details.
Expected behavior
If I've got the idea of kv_caching correctly the outputs should be exactly the same. This is important because the
generate
method heavily relies onpast_key_values
. So if there is a bug somewhere, it affects a lot of applications.The text was updated successfully, but these errors were encountered: