New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deberta v2 code simplification #15732
Deberta v2 code simplification #15732
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Thanks @guillaume-be.
Waiting for a quick review from @BigBird01.
@guillaume-be please ping me again in ~1 week to merge that PR if it hasn't moved then. Thanks for your contribution!
Hello @LysandreJik , @BigBird01 , Thank you! |
@@ -766,7 +766,7 @@ def disentangled_attention_bias(self, query_layer, key_layer, relative_pos, rel_ | |||
|
|||
score = 0 | |||
# content->position | |||
if "c2p" in self.pos_att_type: | |||
if "c2p" in self.pos_att_type or "p2p" in self.pos_att_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be keep unchanged. Actually p2p is not used in our current public model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this line unchanged I suggest removing the p2p support entirely (as it is the case for Deberta). As otherwise the variable c2p_pos
used by the p2p
attention would not be initialized. I will push an update to this PR make these changes.
@@ -843,7 +843,7 @@ def disentangled_attention_bias(self, query_layer, key_layer, relative_pos, rel_ | |||
|
|||
score = 0 | |||
# content->position | |||
if "c2p" in self.pos_att_type: | |||
if "c2p" in self.pos_att_type or "p2p" in self.pos_att_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above comment.
LGTM, also pinging @anton-l as it affects SEW/SEW-D |
Hello @LysandreJik , @anton-l |
Hi @guillaume-be, no further changes required from my side (SEW-D models), feel free to merge if everything else is ok 🙂 |
Thanks for your PR, merging! |
* Removed spurious substraction * Fixed condition checking for attention type * Fixed sew_d copy of DeBERTa v2 attention * Removed unused `p2p` attention type from DebertaV2-class models * Fixed docs style
What does this PR do?
This PR simplifies and fixes the code for the DeBERTa V2 disentangled attention bias calculation:
x - x
always resulting in 0p2p
is in the attention type is performed at the wrong position (transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py
Line 781 in 2c2a31f
p2c
attention is not used forp2p
, butc2p
is. Currently the execution would fail if the attention types includep2p
but notc2p
: the c2p_pos variable usedtransformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py
Line 813 in 2c2a31f
transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py
Line 772 in 2c2a31f
p2p
attention flag to the right position and simplifies thep2c
attention calculation.Who can review?
@LysandreJik
@BigBird01