Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support various BERT relative position embeddings (2nd) #8276

Merged
merged 21 commits into from
Nov 24, 2020
Merged

Support various BERT relative position embeddings (2nd) #8276

merged 21 commits into from
Nov 24, 2020

Conversation

zhiheng-huang
Copy link
Contributor

@zhiheng-huang zhiheng-huang commented Nov 3, 2020

What does this PR do?

Creating a new PR for #8108 to keep cleaner git history/commits.

The default BERT model bert-base-uncased was pre-trained with absolute position embeddings. We provide three pre-trained models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model training, but with different relative position embeddings (Shaw et al., Self-Attention with Relative Position Representations, https://arxiv.org/abs/1803.02155 and Huang et al., Improve Transformer Models with Better Relative Position Embeddings, https://arxiv.org/abs/2009.13658, accepted in findings of EMNLP 2020). We show how to fine-tune these pre-trained models on SQuAD1.1 data set. Our proposed relative position embedding method can boost the BERT base model (with default absolute position embedding) from f1 score of 88.52 to 90.54 with similar training/inference speed. It also boosts the bert-large-uncased-whole-word-masking model from 93.15 to 93.52 with 3 additional fine-tune epochs. See examples/question-answering/README.md for more details.

Fixes # (issue)

#8108

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to the it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@patrickvonplaten @LysandreJik @julien-c

@patrickvonplaten
Copy link
Contributor

Hey @zhiheng-huang,

it would be great if you could take a look at the failing tests :-)

@julien-c julien-c added the model card Related to pretrained model cards label Nov 6, 2020
@zhiheng-huang
Copy link
Contributor Author

zhiheng-huang commented Nov 6, 2020

Hey @patrickvonplaten, I fixed all failed tests except check_code_quality. Currently the relative embedding is implemented for BERT only. In check_code_quality, utils/check_copies.py tries to copy the relative embedding implementation from BERT model to other models including albert, electra, roberta etc. I understand this may make the relative embedding methods ready to be used in those models. However, we haven't pre-trained those type of models with relative embedding and thus cannot assess their effectiveness. Please advise if I should fix this failing test (by ensuring relative embedding implementation copied to those BERT variants) or leave it as is.

@patrickvonplaten
Copy link
Contributor

Hey @zhiheng-huang,

Sadly there is still a problem with the git commit history. As you can see 54 files are changed in this PR. Could you make sure to keep the commit tree clean. It is not really possible to review the PR otherwise :-/

Try to make use of git rebase to avoid appending the master's commit history to your branch maybe

@patrickvonplaten
Copy link
Contributor

In the worst case, you can just make the changes to the files you intend to change without rebasing or merging and then I can review and merge/rebase for you.

@zhiheng-huang
Copy link
Contributor Author

zhiheng-huang commented Nov 6, 2020

Rebased and removed the unintended merge commit. @patrickvonplaten, can you comment on the utils/check_copies.py question so we can move forward?

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this PR as it extends BERT functionality for all BERT-like models. What do you think @LysandreJik @sgugger ?

@@ -216,7 +216,6 @@ def __init__(self, config):
# position_ids (1, len position emb) is contiguous in memory and exported when serialized
self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1)))

# Copied from transformers.modeling_bert.BertEmbeddings.forward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new feature would render BERT and ALBERT embedding different, which I think is fine

@@ -307,7 +307,6 @@ class LongformerEmbeddings(nn.Module):
Same as BertEmbeddings with a tiny tweak for positional embeddings indexing.
"""

# Copied from transformers.modeling_bert.BertEmbeddings.__init__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longformer can also not have this new BertEmbedding feature

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have a strong opinion on this, but it looks useful and the implementation seems robust. We absolutely need some tests for this, however.

@zhiheng-huang
Copy link
Contributor Author

Hi @patrickvonplaten @LysandreJik, I see one approval already, is it ready to merge? If not, can you point to the embedding (for example absolute position embedding) unit tests so I can try to come up with similar tests?

@LysandreJik
Copy link
Member

Regarding tests, I think adding integration tests in the test_modeling_bert.py would be nice. What do you think @patrickvonplaten?

The BERT model doesn't have any such tests right now, but you can take inspiration from the RobertaModelIntegrationTest class in test_modeling_roberta.py, which you can find here.

You could add a couple of tests, each testing that you get the expected results: this will ensure the implementation will not diverge in the future. If you need a checkpoint, you can use lysandre/tiny-bert-random, which is a very small model (with random values), so it will be very light on the CI.

Let me know if you need anything.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I put my approval to quickly here.

So 1) I just fixed the remaining tests. One thing which is important to know is that the BertEmbeddings are also used by LayoutLM, Roberta, ELECTRA, Longformer, and Albert via our copy mechanism. The new functionality however only applies also to Roberta, ELECTRA, and LayoutLM, so I removed the copy mechanism for BertEmbeddings now for Longformer and Albert. For the other models we have to make sure that the new embedding work as well =>

  1. We need tests verifying that those embedding can be used for all LayoutLM, Roberta, ELECTRA, and BERT. I think it can be as simple as running a forward pass with those new embeddings in each tests/test_modeling_<bert, roberta, electra, layoutlm>.py file.

Let me know if you need implementing those! After that I think we are good to merge

@zhiheng-huang
Copy link
Contributor Author

zhiheng-huang commented Nov 19, 2020

@patrickvonplaten @LysandreJik

  1. Added forward test to ensure forward runs okay for LayoutLM, Roberta, ELECTRA, and BERT for three position embeddings: "absolute", "relative_key", "relative_key_query".
  2. Added integration test for BERT check points bert-base-uncased, zhiheng-huang/bert-base-uncased-embedding-relative-key, and zhiheng-huang/bert-base-uncased-embedding-relative-key-query to ensure that models predictions match expected outputs.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job @zhiheng-huang and thanks a lot for bearing with me through the PR.
The tests look great!

I made the docstring a bit prettier.

@LysandreJik and @sgugger can you take a last look?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks a lot for your PR!
Any reason ALBERT and Longformer don't get this new functionality? (But RoBERTa and ELECTRA do?)

Comment on lines 488 to 489
@require_sentencepiece
@require_tokenizers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests do not require sentencepiece nor tokenizers as far as I can see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed for both test_modeling_bert.py and test_modeling_roberta.py.

@patrickvonplaten
Copy link
Contributor

@zhiheng-huang - Let me fix the CI later, don't worry about it :-)

@patrickvonplaten
Copy link
Contributor

This looks good to me. Thanks a lot for your PR!
Any reason ALBERT and Longformer don't get this new functionality? (But RoBERTa and ELECTRA do?)

Great question! I ALBERT should get this functionality (I just added it - great catch!). Longformer has weird attention_scores which does not work with those embeddings.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, love the tests. Thanks!

@patrickvonplaten
Copy link
Contributor

Good to merge! Thanks a mille @zhiheng-huang!

@patrickvonplaten patrickvonplaten merged commit 2c83b3c into huggingface:master Nov 24, 2020
@zhiheng-huang zhiheng-huang deleted the transformers-relative-embedding-test branch November 24, 2020 22:16
@zhiheng-huang
Copy link
Contributor Author

Good to merge! Thanks a mille @zhiheng-huang!

Thanks! @patrickvonplaten @sgugger @LysandreJik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model card Related to pretrained model cards
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants