Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BertTokenizer and encode_plus() #9655

Closed
SimplyLucKey opened this issue Jan 18, 2021 · 4 comments
Closed

BertTokenizer and encode_plus() #9655

SimplyLucKey opened this issue Jan 18, 2021 · 4 comments

Comments

@SimplyLucKey
Copy link

SimplyLucKey commented Jan 18, 2021

I see that from version 2.4.0 I was able to use encode_plus() with BertTokenizer

However it seems like that is not the case anymore.
AttributeError: 'BertTokenizer' object has no attribute 'encoder_plus'

Is there a replacement to encode_plus?

@thomwolf
Copy link
Member

No it’s still there and still identical. It’s just that you made a typo and typed encoder_plus instead of encode_plus for what I can tell.

Though we recommand using just the __call__ method now which is a shortcut wrapping all the encode method in a single API. You can read more details on the additional features that have been added in v3 and v4 in the doc if you want to simplify your preprocessing.

@thomwolf
Copy link
Member

@SimplyLucKey
Copy link
Author

No it’s still there and still identical. It’s just that you made a typo and typed encoder_plus instead of encode_plus for what I can tell.

Though we recommand using just the __call__ method now which is a shortcut wrapping all the encode method in a single API. You can read more details on the additional features that have been added in v3 and v4 in the doc if you want to simplify your preprocessing.

Oops sorry I completely missed that. Thank you!

@vedantg12
Copy link

long_text = "This is a very very long text. " * 300
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")

tokenize without truncation

inputs_no_trunc = tokenizer.encode_plus(long_text, add_special_tokens=False, return_tensors='pt')

I get the following error:

AttributeError: 'BertTokenizer' object has no attribute 'encode_plus'

Is their a substitution for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants