New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strip_accents to basic BertTokenizer. #6280
Add strip_accents to basic BertTokenizer. #6280
Conversation
Strange CI problem with checksum of Torch:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM. Agree with @JetRunner's suggestion.
Codecov Report
@@ Coverage Diff @@
## master #6280 +/- ##
=======================================
Coverage 79.64% 79.64%
=======================================
Files 147 147
Lines 27120 27125 +5
=======================================
+ Hits 21600 21605 +5
Misses 5520 5520
Continue to review full report at Codecov.
|
You can ignore the HASH errors, we're working on solving these but they're unrelated to your PR. |
This is ready to be merged IMO. 😁 |
Wow - this was really fast from first commit to merge. Many thanks to the contributors. This makes open source development twice as much fun. |
The BertTokenizerFast can turn off strip_accents with
strip_accents=False
. This PR also adds this option to the basic BertTokenizer.Also see #6186