Support various BERT relative position embeddings (2nd) #8276

zhiheng-huang · 2020-11-03T22:28:26Z

What does this PR do?

Creating a new PR for #8108 to keep cleaner git history/commits.

The default BERT model bert-base-uncased was pre-trained with absolute position embeddings. We provide three pre-trained models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model training, but with different relative position embeddings (Shaw et al., Self-Attention with Relative Position Representations, https://arxiv.org/abs/1803.02155 and Huang et al., Improve Transformer Models with Better Relative Position Embeddings, https://arxiv.org/abs/2009.13658, accepted in findings of EMNLP 2020). We show how to fine-tune these pre-trained models on SQuAD1.1 data set. Our proposed relative position embedding method can boost the BERT base model (with default absolute position embedding) from f1 score of 88.52 to 90.54 with similar training/inference speed. It also boosts the bert-large-uncased-whole-word-masking model from 93.15 to 93.52 with 3 additional fine-tune epochs. See examples/question-answering/README.md for more details.

Fixes # (issue)

#8108

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to the it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@patrickvonplaten @LysandreJik @julien-c

patrickvonplaten · 2020-11-05T17:45:37Z

Hey @zhiheng-huang,

it would be great if you could take a look at the failing tests :-)

zhiheng-huang · 2020-11-06T03:52:13Z

Hey @patrickvonplaten, I fixed all failed tests except check_code_quality. Currently the relative embedding is implemented for BERT only. In check_code_quality, utils/check_copies.py tries to copy the relative embedding implementation from BERT model to other models including albert, electra, roberta etc. I understand this may make the relative embedding methods ready to be used in those models. However, we haven't pre-trained those type of models with relative embedding and thus cannot assess their effectiveness. Please advise if I should fix this failing test (by ensuring relative embedding implementation copied to those BERT variants) or leave it as is.

patrickvonplaten · 2020-11-06T09:22:32Z

Hey @zhiheng-huang,

Sadly there is still a problem with the git commit history. As you can see 54 files are changed in this PR. Could you make sure to keep the commit tree clean. It is not really possible to review the PR otherwise :-/

Try to make use of git rebase to avoid appending the master's commit history to your branch maybe

patrickvonplaten · 2020-11-06T09:23:59Z

In the worst case, you can just make the changes to the files you intend to change without rebasing or merging and then I can review and merge/rebase for you.

…n_bert.py

zhiheng-huang · 2020-11-06T17:57:48Z

Rebased and removed the unintended merge commit. @patrickvonplaten, can you comment on the utils/check_copies.py question so we can move forward?

patrickvonplaten

I'm fine with this PR as it extends BERT functionality for all BERT-like models. What do you think @LysandreJik @sgugger ?

patrickvonplaten · 2020-11-09T17:51:58Z

src/transformers/modeling_albert.py

@@ -216,7 +216,6 @@ def __init__(self, config):
        # position_ids (1, len position emb) is contiguous in memory and exported when serialized
        self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1)))

-    # Copied from transformers.modeling_bert.BertEmbeddings.forward


This new feature would render BERT and ALBERT embedding different, which I think is fine

patrickvonplaten · 2020-11-09T17:52:42Z

src/transformers/modeling_longformer.py

@@ -307,7 +307,6 @@ class LongformerEmbeddings(nn.Module):
    Same as BertEmbeddings with a tiny tweak for positional embeddings indexing.
    """

-    # Copied from transformers.modeling_bert.BertEmbeddings.__init__


Longformer can also not have this new BertEmbedding feature

LysandreJik

Don't have a strong opinion on this, but it looks useful and the implementation seems robust. We absolutely need some tests for this, however.

zhiheng-huang · 2020-11-13T00:44:46Z

Hi @patrickvonplaten @LysandreJik, I see one approval already, is it ready to merge? If not, can you point to the embedding (for example absolute position embedding) unit tests so I can try to come up with similar tests?

LysandreJik · 2020-11-13T16:22:12Z

Regarding tests, I think adding integration tests in the test_modeling_bert.py would be nice. What do you think @patrickvonplaten?

The BERT model doesn't have any such tests right now, but you can take inspiration from the RobertaModelIntegrationTest class in test_modeling_roberta.py, which you can find here.

You could add a couple of tests, each testing that you get the expected results: this will ensure the implementation will not diverge in the future. If you need a checkpoint, you can use lysandre/tiny-bert-random, which is a very small model (with random values), so it will be very light on the CI.

Let me know if you need anything.

patrickvonplaten

Yeah sorry I put my approval to quickly here.

So 1) I just fixed the remaining tests. One thing which is important to know is that the BertEmbeddings are also used by LayoutLM, Roberta, ELECTRA, Longformer, and Albert via our copy mechanism. The new functionality however only applies also to Roberta, ELECTRA, and LayoutLM, so I removed the copy mechanism for BertEmbeddings now for Longformer and Albert. For the other models we have to make sure that the new embedding work as well =>

We need tests verifying that those embedding can be used for all LayoutLM, Roberta, ELECTRA, and BERT. I think it can be as simple as running a forward pass with those new embeddings in each tests/test_modeling_<bert, roberta, electra, layoutlm>.py file.

Let me know if you need implementing those! After that I think we are good to merge

…-embedding-test

zhiheng-huang · 2020-11-19T00:14:25Z

@patrickvonplaten @LysandreJik

Added forward test to ensure forward runs okay for LayoutLM, Roberta, ELECTRA, and BERT for three position embeddings: "absolute", "relative_key", "relative_key_query".
Added integration test for BERT check points bert-base-uncased, zhiheng-huang/bert-base-uncased-embedding-relative-key, and zhiheng-huang/bert-base-uncased-embedding-relative-key-query to ensure that models predictions match expected outputs.

patrickvonplaten

Awesome job @zhiheng-huang and thanks a lot for bearing with me through the PR.
The tests look great!

I made the docstring a bit prettier.

@LysandreJik and @sgugger can you take a last look?

sgugger

This looks good to me. Thanks a lot for your PR!
Any reason ALBERT and Longformer don't get this new functionality? (But RoBERTa and ELECTRA do?)

sgugger · 2020-11-20T21:13:40Z

tests/test_modeling_bert.py

+@require_sentencepiece
+@require_tokenizers


The tests do not require sentencepiece nor tokenizers as far as I can see.

Good catch. Fixed for both test_modeling_bert.py and test_modeling_roberta.py.

patrickvonplaten · 2020-11-20T22:10:07Z

@zhiheng-huang - Let me fix the CI later, don't worry about it :-)

…-embedding-test

patrickvonplaten · 2020-11-20T22:21:34Z

This looks good to me. Thanks a lot for your PR!
Any reason ALBERT and Longformer don't get this new functionality? (But RoBERTa and ELECTRA do?)

Great question! I ALBERT should get this functionality (I just added it - great catch!). Longformer has weird attention_scores which does not work with those embeddings.

LysandreJik

Very nice, love the tests. Thanks!

patrickvonplaten · 2020-11-24T13:40:49Z

Good to merge! Thanks a mille @zhiheng-huang!

zhiheng-huang · 2020-11-24T22:19:39Z

Good to merge! Thanks a mille @zhiheng-huang!

Thanks! @patrickvonplaten @sgugger @LysandreJik

zhiheng-huang mentioned this pull request Nov 4, 2020

Support various BERT relative position embeddings #8108

Closed

4 tasks

patrickvonplaten self-assigned this Nov 5, 2020

julien-c added the model card Related to pretrained model cards label Nov 6, 2020

zhiheng added 5 commits November 6, 2020 09:45

Support BERT relative position embeddings

358b485

Fix typo in README.md

448552a

Address review comment

17610e0

Fix failing tests

8a12819

[tiny] Fix style_doc.py check by adding an empty line to configuratio…

5297e0f

…n_bert.py

make fix copies

9a56e0f

patrickvonplaten approved these changes Nov 9, 2020

View reviewed changes

patrickvonplaten added 3 commits November 9, 2020 18:46

fix configs of electra and albert and fix longformer

ea1ec24

remove copy statement from longformer

63b0a5e

fix albert

a43f90f

patrickvonplaten reviewed Nov 9, 2020

View reviewed changes

LysandreJik reviewed Nov 9, 2020

View reviewed changes

fix electra

134ac5f

patrickvonplaten suggested changes Nov 16, 2020

View reviewed changes

patrickvonplaten and others added 2 commits November 16, 2020 14:13

Merge remote-tracking branch 'main/master' into transformers-relative…

4cd0170

…-embedding-test

Add bert variants forward tests for various position embeddings

9ac6e9b

[tiny] Fix style for test_modeling_bert.py

51d5984

zhiheng-huang requested a review from patrickvonplaten November 20, 2020 19:05

zhiheng-huang requested a review from LysandreJik November 20, 2020 20:04

patrickvonplaten added 2 commits November 20, 2020 21:45

merge master in

b41f0e7

improve docstring

d96b207

patrickvonplaten approved these changes Nov 20, 2020

View reviewed changes

sgugger approved these changes Nov 20, 2020

View reviewed changes

zhiheng and others added 3 commits November 20, 2020 13:50

[tiny] improve docstring and remove unnecessary dependency

f6f0c57

[tiny] Remove unused import

6ab4f8e

re-add to ALBERT

e295339

patrickvonplaten added 3 commits November 20, 2020 23:16

make embeddings work for ALBERT

429a38b

add test for albert

12be9ef

Merge remote-tracking branch 'main/master' into transformers-relative…

a9464b4

…-embedding-test

LysandreJik approved these changes Nov 23, 2020

View reviewed changes

patrickvonplaten merged commit 2c83b3c into huggingface:master Nov 24, 2020

zhiheng-huang deleted the transformers-relative-embedding-test branch November 24, 2020 22:16

lhoestq mentioned this pull request Nov 26, 2020

Can't load tokenizer for 'facebook/rag-token-base/question_encoder_tokenizer'. #8780

Closed

patrickvonplaten mentioned this pull request Nov 26, 2020

Fix dpr<>bart config for RAG #8808

Merged

5 tasks

patrickvonplaten mentioned this pull request Dec 24, 2020

Add clipping to relative positional embedding #9164

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support various BERT relative position embeddings (2nd) #8276

Support various BERT relative position embeddings (2nd) #8276

zhiheng-huang commented Nov 3, 2020 •

edited

Loading

patrickvonplaten commented Nov 5, 2020

zhiheng-huang commented Nov 6, 2020 •

edited

Loading

patrickvonplaten commented Nov 6, 2020

patrickvonplaten commented Nov 6, 2020

zhiheng-huang commented Nov 6, 2020 •

edited

Loading

patrickvonplaten left a comment

patrickvonplaten Nov 9, 2020

patrickvonplaten Nov 9, 2020

LysandreJik left a comment

zhiheng-huang commented Nov 13, 2020

LysandreJik commented Nov 13, 2020

patrickvonplaten left a comment

zhiheng-huang commented Nov 19, 2020 •

edited

Loading

patrickvonplaten left a comment

sgugger left a comment

sgugger Nov 20, 2020

zhiheng-huang Nov 20, 2020

patrickvonplaten commented Nov 20, 2020

patrickvonplaten commented Nov 20, 2020

LysandreJik left a comment

patrickvonplaten commented Nov 24, 2020

zhiheng-huang commented Nov 24, 2020

		@require_sentencepiece
		@require_tokenizers

Support various BERT relative position embeddings (2nd) #8276

Support various BERT relative position embeddings (2nd) #8276

Conversation

zhiheng-huang commented Nov 3, 2020 • edited Loading

What does this PR do?

Before submitting

Who can review?

patrickvonplaten commented Nov 5, 2020

zhiheng-huang commented Nov 6, 2020 • edited Loading

patrickvonplaten commented Nov 6, 2020

patrickvonplaten commented Nov 6, 2020

zhiheng-huang commented Nov 6, 2020 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

patrickvonplaten Nov 9, 2020

Choose a reason for hiding this comment

patrickvonplaten Nov 9, 2020

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

zhiheng-huang commented Nov 13, 2020

LysandreJik commented Nov 13, 2020

patrickvonplaten left a comment

Choose a reason for hiding this comment

zhiheng-huang commented Nov 19, 2020 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

sgugger Nov 20, 2020

Choose a reason for hiding this comment

zhiheng-huang Nov 20, 2020

Choose a reason for hiding this comment

patrickvonplaten commented Nov 20, 2020

patrickvonplaten commented Nov 20, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Nov 24, 2020

zhiheng-huang commented Nov 24, 2020

zhiheng-huang commented Nov 3, 2020 •

edited

Loading

zhiheng-huang commented Nov 6, 2020 •

edited

Loading

zhiheng-huang commented Nov 6, 2020 •

edited

Loading

zhiheng-huang commented Nov 19, 2020 •

edited

Loading