Skip to content

Conversation

@IMvision12
Copy link
Contributor

What does this PR do?

Fixes #19303

Added BertTokenizer class in tokenization_convbert.py and BertTokenizerFast in tokenization_convbert_fast.py

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 5, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Left a couple of pointers on how to finish the work :-)

@IMvision12
Copy link
Contributor Author

IMvision12 commented Oct 5, 2022

Done! @sgugger do i need to do same for convbert_fast?

@IMvision12 IMvision12 requested a review from sgugger October 5, 2022 15:07
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! We're on the right path.
Now we just need to make sure the util scanning the copies is happy (e.g., the copies should match the original). I've left comments in that direction and you can check locally if the util is happy by running make repo-consistency

@IMvision12
Copy link
Contributor Author

@sgugger Thanks for quick feeback,
For tokenization_convbert.py i have change the comment to # Copied from transformers.models.bert.tokenization_bert.BertTokenizer with ConvBertTokenizer->BertTokenizer

and for tokenization_convbert_fast.py i have change the comment to # Copied from transformers.models.bert.tokenization_bert_fast.BertTokenizerFast with ConvBertTokenizerFast->ConvBertTokenizer

for convbert_fast make repo-consistency gives error : - src/transformers\models\convbert\tokenization_convbert_fast.py: copy does not match models.bert.tokenization_bert_fast.BertTokenizerFast at line 55

@sgugger
Copy link
Collaborator

sgugger commented Oct 5, 2022

You'll need to add broader patterns than just the full name of the tokenizer as BERT is used in the docstrings for instance. To see what the copy utils wants to modify, you can run make fix-copies locally :-)

@IMvision12
Copy link
Contributor Author

IMvision12 commented Oct 5, 2022

After running make fix-copies it changes slow_tokenizer_class = BertTokenizer but it should be ConvBertTokenizer in tokenization_convbert_fast.py

@sgugger
Copy link
Collaborator

sgugger commented Oct 5, 2022

Yes, that's why I made the suggestions above.

@IMvision12
Copy link
Contributor Author

IMvision12 commented Oct 5, 2022

So, do i need to copy BertTokenizer and BertTokenizerFast class to tokenization_convbert_fast.py as after adding those classes it passes all tests

@sgugger
Copy link
Collaborator

sgugger commented Oct 5, 2022

Just accept the suggestions above.

@IMvision12 IMvision12 requested a review from sgugger October 5, 2022 16:38
@IMvision12
Copy link
Contributor Author

IMvision12 commented Oct 5, 2022

Done @sgugger are there any more changes?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following what you're doing in the fast tokenizer file. The goal is to have ConvBertTokenizerFast stop depending on BertTokenizerFast by copying the code from that file. Not put the code of BertTokenizer there.

@IMvision12 IMvision12 requested a review from sgugger October 5, 2022 19:44
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for iterating! This is almost ready to merge, just a couple of last nits.

@IMvision12 IMvision12 requested a review from sgugger October 6, 2022 01:34
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for iterating on this PR! Looks perfect now.

@sgugger sgugger merged commit 7e348aa into huggingface:main Oct 7, 2022
@IMvision12 IMvision12 deleted the ConvBert branch October 7, 2022 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make all models folder independent from each other

3 participants