Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove outdated BERT tips #6217

Merged
merged 4 commits into from
Aug 3, 2020
Merged

Remove outdated BERT tips #6217

merged 4 commits into from
Aug 3, 2020

Conversation

JetRunner
Copy link
Contributor

@JetRunner JetRunner commented Aug 3, 2020

Why remove the tips:

  • BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
    the right rather than the left.

Yes but since we don't provide an option to pad from the left I think it's not necessary.

  • BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked
    tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language
    modeling (CLM) objective are better in that regard.

No. T5 & BART proved it wrong.

  • Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence
    approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence
    prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
    the [CLS] token.

No. [CLS] can do learnable self-attention pooling, which is way much better than parameter-free average pooling especially when fine-tuned. (w.r.t. SentenceBERT)

@JetRunner JetRunner changed the title Remove out-dated BERT tips Remove outdated BERT tips Aug 3, 2020
@codecov
Copy link

codecov bot commented Aug 3, 2020

Codecov Report

Merging #6217 into master will decrease coverage by 0.64%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6217      +/-   ##
==========================================
- Coverage   79.53%   78.88%   -0.65%     
==========================================
  Files         146      146              
  Lines       26586    26586              
==========================================
- Hits        21145    20973     -172     
- Misses       5441     5613     +172     
Impacted Files Coverage Δ
src/transformers/modeling_outputs.py 100.00% <ø> (ø)
src/transformers/tokenization_xlm.py 16.26% <0.00%> (-66.67%) ⬇️
src/transformers/tokenization_marian.py 68.14% <0.00%> (-25.67%) ⬇️
src/transformers/generation_tf_utils.py 86.71% <0.00%> (+0.25%) ⬆️
src/transformers/file_utils.py 80.30% <0.00%> (+0.25%) ⬆️
src/transformers/tokenization_dpr.py 57.65% <0.00%> (+4.50%) ⬆️
src/transformers/tokenization_ctrl.py 96.11% <0.00%> (+17.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6b2f22...8a7c591. Read the comment docs.

@sgugger
Copy link
Collaborator

sgugger commented Aug 3, 2020

I'd personally leave the first two tricks (users may have their custom script for padding and BERT is not good at language generation). For the third, as discussed, I agree.
The docstrings of BaseModelOutputWithPooling should be changed accordingly, but I can take care of that in a separate PR.

@JetRunner
Copy link
Contributor Author

@sgugger I've restored tips no.1 and updated tips no.2. Also took care of BaseModelOutputWithPooling.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! This is good to go IMO.

@JetRunner JetRunner merged commit 3c289fb into master Aug 3, 2020
@JetRunner JetRunner deleted the JetRunner-patch-1 branch August 3, 2020 17:17
Mehrad0711 pushed a commit to Mehrad0711/transformers that referenced this pull request Aug 3, 2020
* Remove out-dated BERT tips

* Update modeling_outputs.py

* Update bert.rst

* Update bert.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants