-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bert2Bert] allow bert2bert + relative embeddings #14324
[Bert2Bert] allow bert2bert + relative embeddings #14324
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! IMO, we can make this better by adding an init argument to the attention layer with this flag.
# cross attention cannot have relative position embeddings | ||
cross_attention_config = copy.deepcopy(config) | ||
cross_attention_config.position_embedding_type = "absolute" | ||
|
||
self.crossattention = BertAttention(cross_attention_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hackish. If we start to turn this argument on and off, maybe it should be an init argument? It can default to None
in which case we take the config value (so this is 100% backward compatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agree
Thanks @patrickvonplaten for fixing this! It looks correct that relative position embeddings should not be used on cross-attention layers. |
…/patrickvonplaten/transformers into fix_bert2bert_relative_pos_embds
…into fix_bert2bert_relative_pos_embds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, LGTM!
Merging now to prevent merge conflicts |
What does this PR do?
Fixes #14010
Everything is explained in #14010. IMO, Bert2Bert like models should not (and also cannot really) make use of positional bias in the cross_attention_layers. The PR forces cross attention layers to always use
"absolute"
position encodings.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.