-
Notifications
You must be signed in to change notification settings - Fork 25.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove outdated BERT tips #6217
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6217 +/- ##
==========================================
- Coverage 79.53% 78.88% -0.65%
==========================================
Files 146 146
Lines 26586 26586
==========================================
- Hits 21145 20973 -172
- Misses 5441 5613 +172
Continue to review full report at Codecov.
|
I'd personally leave the first two tricks (users may have their custom script for padding and BERT is not good at language generation). For the third, as discussed, I agree. |
@sgugger I've restored tips no.1 and updated tips no.2. Also took care of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! This is good to go IMO.
* Remove out-dated BERT tips * Update modeling_outputs.py * Update bert.rst * Update bert.rst
Why remove the tips:
Yes but since we don't provide an option to pad from the left I think it's not necessary.
No. T5 & BART proved it wrong.
No. [CLS] can do learnable self-attention pooling, which is way much better than parameter-free average pooling especially when fine-tuned. (w.r.t. SentenceBERT)