Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deberta v2 code simplification #15732

Merged

Conversation

guillaume-be
Copy link
Contributor

What does this PR do?

This PR simplifies and fixes the code for the DeBERTa V2 disentangled attention bias calculation:

  1. Removes a spurious substraction of the type x - x always resulting in 0
  2. Fixes condition checking for the attention type and attention score calculation. In the current version, the additional check to see if p2p is in the attention type is performed at the wrong position (
    if "p2c" in self.pos_att_type or "p2p" in self.pos_att_type:
    ). The p2c attention is not used for p2p, but c2p is. Currently the execution would fail if the attention types include p2p but not c2p: the c2p_pos variable used would not be defined at
    c2p_pos = torch.clamp(relative_pos + att_span, 0, att_span * 2 - 1)
    . This PR moves the check for the p2p attention flag to the right position and simplifies the p2c attention calculation.

Who can review?

@LysandreJik
@BigBird01

@HuggingFaceDocBuilder
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thanks @guillaume-be.

Waiting for a quick review from @BigBird01.

@guillaume-be please ping me again in ~1 week to merge that PR if it hasn't moved then. Thanks for your contribution!

@guillaume-be
Copy link
Contributor Author

Hello @LysandreJik , @BigBird01 ,
As discussed a couple of weeks back I wanted to reach out to see if any further work would be required on this PR.

Thank you!

@@ -766,7 +766,7 @@ def disentangled_attention_bias(self, query_layer, key_layer, relative_pos, rel_

score = 0
# content->position
if "c2p" in self.pos_att_type:
if "c2p" in self.pos_att_type or "p2p" in self.pos_att_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be keep unchanged. Actually p2p is not used in our current public model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this line unchanged I suggest removing the p2p support entirely (as it is the case for Deberta). As otherwise the variable c2p_pos used by the p2p attention would not be initialized. I will push an update to this PR make these changes.

@@ -843,7 +843,7 @@ def disentangled_attention_bias(self, query_layer, key_layer, relative_pos, rel_

score = 0
# content->position
if "c2p" in self.pos_att_type:
if "c2p" in self.pos_att_type or "p2p" in self.pos_att_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the above comment.

@LysandreJik
Copy link
Member

LGTM, also pinging @anton-l as it affects SEW/SEW-D

@guillaume-be
Copy link
Contributor Author

Hello @LysandreJik , @anton-l
I was wondering if there is further work required on this PR?
Thank you!

@anton-l
Copy link
Member

anton-l commented Mar 20, 2022

Hi @guillaume-be, no further changes required from my side (SEW-D models), feel free to merge if everything else is ok 🙂

@LysandreJik
Copy link
Member

Thanks for your PR, merging!

@LysandreJik LysandreJik merged commit 319cbbe into huggingface:master Mar 21, 2022
FrancescoSaverioZuppichini pushed a commit that referenced this pull request Mar 24, 2022
* Removed spurious substraction

* Fixed condition checking for attention type

* Fixed sew_d copy of DeBERTa v2 attention

* Removed unused `p2p` attention type from DebertaV2-class models

* Fixed docs style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants