[T5] Fix Cross Attention position bias #4499

ZhuBaohe · 2020-05-21T14:28:30Z

This PR fixes the Cross Attention position bias assignment in Class T5Stack.

codecov-commenter · 2020-05-21T14:34:37Z

Codecov Report

Merging #4499 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4499      +/-   ##
==========================================
- Coverage   77.83%   77.82%   -0.02%     
==========================================
  Files         123      123              
  Lines       20514    20514              
==========================================
- Hits        15968    15964       -4     
- Misses       4546     4550       +4

Impacted Files	Coverage Δ
src/transformers/modeling_t5.py	`83.53% <100.00%> (ø)`
src/transformers/modeling_tf_t5.py	`95.16% <100.00%> (ø)`
src/transformers/hf_api.py	`93.06% <0.00%> (-4.96%)`	⬇️
src/transformers/file_utils.py	`73.85% <0.00%> (+0.41%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a086527...e9775b2. Read the comment docs.

patrickvonplaten · 2020-05-24T22:24:56Z

Hi @ZhuBaohe,

Thansk for your PR! Can you explain a bit more in-detail what the fix is doing here? :-)

ZhuBaohe · 2020-05-25T07:58:22Z

@patrickvonplaten

I fixes a bug that the variable encoder_decoder_position_bias was incorrectly assigned by cross-attention weights, not by cross-attention position bias.

See Line 745 of the file modeling_t5.py as follow:

# layer_outputs = hidden-states,                   -> 0
                  key-value-states,                -> 1
                  (self-attention weights),        -> 2                               
                  (self-attention position bias),  -> 3  
                  (cross-attention weights),       -> 4 
                  (cross-attention position bias)  -> 5

encoder_decoder_position_bias should be assigned by layer_outputs[5] instead of layer_outputs[4] .

patrickvonplaten · 2020-05-26T11:20:19Z

Great, I agree with you. Previously the attention weights of the cross attention layer were taken instead of the bias.

@LysandreJik @thomwolf I am quite surprised that we did not see an error earlier. I checked the slow tests and the summarization / translation results are equivalent as before.

So good to merge for me!

LysandreJik

Indeed, thanks @ZhuBaohe

thomwolf · 2020-05-26T12:59:26Z

Surprising indeed @patrickvonplaten , I did fix a similar bug when implementing T5.

We should switch to NamedTuples one day 😄

fix

0fb94bc

fix1

e9775b2

patrickvonplaten changed the title ~~fix T5~~ [T5] Fix Cross Attention position bias May 26, 2020

patrickvonplaten requested review from thomwolf and LysandreJik May 26, 2020 11:21

LysandreJik approved these changes May 26, 2020

View reviewed changes

LysandreJik merged commit a163c9c into huggingface:master May 26, 2020

ZhuBaohe deleted the t5 branch May 26, 2020 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[T5] Fix Cross Attention position bias #4499

[T5] Fix Cross Attention position bias #4499

ZhuBaohe commented May 21, 2020 •

edited by patrickvonplaten

codecov-commenter commented May 21, 2020 •

edited

patrickvonplaten commented May 24, 2020

ZhuBaohe commented May 25, 2020 •

edited

patrickvonplaten commented May 26, 2020

LysandreJik left a comment

thomwolf commented May 26, 2020

[T5] Fix Cross Attention position bias #4499

[T5] Fix Cross Attention position bias #4499

Conversation

ZhuBaohe commented May 21, 2020 • edited by patrickvonplaten

codecov-commenter commented May 21, 2020 • edited

Codecov Report

patrickvonplaten commented May 24, 2020

ZhuBaohe commented May 25, 2020 • edited

patrickvonplaten commented May 26, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

thomwolf commented May 26, 2020

ZhuBaohe commented May 21, 2020 •

edited by patrickvonplaten

codecov-commenter commented May 21, 2020 •

edited

ZhuBaohe commented May 25, 2020 •

edited