-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support various BERT relative position embeddings (2nd) #8276
Support various BERT relative position embeddings (2nd) #8276
Conversation
Hey @zhiheng-huang, it would be great if you could take a look at the failing tests :-) |
Hey @patrickvonplaten, I fixed all failed tests except check_code_quality. Currently the relative embedding is implemented for BERT only. In check_code_quality, |
Hey @zhiheng-huang, Sadly there is still a problem with the git commit history. As you can see 54 files are changed in this PR. Could you make sure to keep the commit tree clean. It is not really possible to review the PR otherwise :-/ Try to make use of |
In the worst case, you can just make the changes to the files you intend to change without |
Rebased and removed the unintended merge commit. @patrickvonplaten, can you comment on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this PR as it extends BERT functionality for all BERT-like models. What do you think @LysandreJik @sgugger ?
src/transformers/modeling_albert.py
Outdated
@@ -216,7 +216,6 @@ def __init__(self, config): | |||
# position_ids (1, len position emb) is contiguous in memory and exported when serialized | |||
self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1))) | |||
|
|||
# Copied from transformers.modeling_bert.BertEmbeddings.forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new feature would render BERT and ALBERT embedding different, which I think is fine
@@ -307,7 +307,6 @@ class LongformerEmbeddings(nn.Module): | |||
Same as BertEmbeddings with a tiny tweak for positional embeddings indexing. | |||
""" | |||
|
|||
# Copied from transformers.modeling_bert.BertEmbeddings.__init__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longformer can also not have this new BertEmbedding feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a strong opinion on this, but it looks useful and the implementation seems robust. We absolutely need some tests for this, however.
Hi @patrickvonplaten @LysandreJik, I see one approval already, is it ready to merge? If not, can you point to the embedding (for example absolute position embedding) unit tests so I can try to come up with similar tests? |
Regarding tests, I think adding integration tests in the The BERT model doesn't have any such tests right now, but you can take inspiration from the You could add a couple of tests, each testing that you get the expected results: this will ensure the implementation will not diverge in the future. If you need a checkpoint, you can use Let me know if you need anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I put my approval to quickly here.
So 1) I just fixed the remaining tests. One thing which is important to know is that the BertEmbeddings
are also used by LayoutLM
, Roberta
, ELECTRA
, Longformer
, and Albert
via our copy mechanism. The new functionality however only applies also to Roberta
, ELECTRA
, and LayoutLM
, so I removed the copy mechanism for BertEmbeddings now for Longformer and Albert. For the other models we have to make sure that the new embedding work as well =>
- We need tests verifying that those embedding can be used for all
LayoutLM
,Roberta
,ELECTRA
, andBERT
. I think it can be as simple as running a forward pass with those new embeddings in eachtests/test_modeling_<bert, roberta, electra, layoutlm>.py
file.
Let me know if you need implementing those! After that I think we are good to merge
@patrickvonplaten @LysandreJik
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job @zhiheng-huang and thanks a lot for bearing with me through the PR.
The tests look great!
I made the docstring a bit prettier.
@LysandreJik and @sgugger can you take a last look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thanks a lot for your PR!
Any reason ALBERT and Longformer don't get this new functionality? (But RoBERTa and ELECTRA do?)
tests/test_modeling_bert.py
Outdated
@require_sentencepiece | ||
@require_tokenizers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests do not require sentencepiece nor tokenizers as far as I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed for both test_modeling_bert.py
and test_modeling_roberta.py
.
@zhiheng-huang - Let me fix the CI later, don't worry about it :-) |
Great question! I ALBERT should get this functionality (I just added it - great catch!). Longformer has weird attention_scores which does not work with those embeddings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, love the tests. Thanks!
Good to merge! Thanks a mille @zhiheng-huang! |
Thanks! @patrickvonplaten @sgugger @LysandreJik |
What does this PR do?
Creating a new PR for #8108 to keep cleaner git history/commits.
The default BERT model
bert-base-uncased
was pre-trained with absolute position embeddings. We provide three pre-trained models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model training, but with different relative position embeddings (Shaw et al., Self-Attention with Relative Position Representations, https://arxiv.org/abs/1803.02155 and Huang et al., Improve Transformer Models with Better Relative Position Embeddings, https://arxiv.org/abs/2009.13658, accepted in findings of EMNLP 2020). We show how to fine-tune these pre-trained models on SQuAD1.1 data set. Our proposed relative position embedding method can boost the BERT base model (with default absolute position embedding) from f1 score of 88.52 to 90.54 with similar training/inference speed. It also boosts thebert-large-uncased-whole-word-masking
model from 93.15 to 93.52 with 3 additional fine-tune epochs. See examples/question-answering/README.md for more details.Fixes # (issue)
#8108
Before submitting
Pull Request section?
to the it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@patrickvonplaten @LysandreJik @julien-c