New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[T5] Fix Cross Attention position bias #4499
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4499 +/- ##
==========================================
- Coverage 77.83% 77.82% -0.02%
==========================================
Files 123 123
Lines 20514 20514
==========================================
- Hits 15968 15964 -4
- Misses 4546 4550 +4
Continue to review full report at Codecov.
|
Hi @ZhuBaohe, Thansk for your PR! Can you explain a bit more in-detail what the fix is doing here? :-) |
I fixes a bug that the variable encoder_decoder_position_bias was incorrectly assigned by cross-attention weights, not by cross-attention position bias. See Line 745 of the file modeling_t5.py as follow:
encoder_decoder_position_bias should be assigned by layer_outputs[5] instead of layer_outputs[4] . |
Great, I agree with you. Previously the attention weights of the cross attention layer were taken instead of the bias. @LysandreJik @thomwolf I am quite surprised that we did not see an error earlier. I checked the slow tests and the summarization / translation results are equivalent as before. So good to merge for me! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thanks @ZhuBaohe
Surprising indeed @patrickvonplaten , I did fix a similar bug when implementing T5. We should switch to NamedTuples one day 😄 |
This PR fixes the Cross Attention position bias assignment in Class T5Stack.