BertTokenizer and encode_plus() #9655

SimplyLucKey · 2021-01-18T12:09:45Z

I see that from version 2.4.0 I was able to use encode_plus() with BertTokenizer

However it seems like that is not the case anymore.
AttributeError: 'BertTokenizer' object has no attribute 'encoder_plus'

Is there a replacement to encode_plus?

The text was updated successfully, but these errors were encountered:

thomwolf · 2021-01-18T12:19:26Z

No it’s still there and still identical. It’s just that you made a typo and typed encoder_plus instead of encode_plus for what I can tell.

Though we recommand using just the __call__ method now which is a shortcut wrapping all the encode method in a single API. You can read more details on the additional features that have been added in v3 and v4 in the doc if you want to simplify your preprocessing.

thomwolf · 2021-01-18T12:20:28Z

Here: https://huggingface.co/transformers/preprocessing.html

SimplyLucKey · 2021-01-19T01:36:12Z

No it’s still there and still identical. It’s just that you made a typo and typed encoder_plus instead of encode_plus for what I can tell.

Though we recommand using just the __call__ method now which is a shortcut wrapping all the encode method in a single API. You can read more details on the additional features that have been added in v3 and v4 in the doc if you want to simplify your preprocessing.

Oops sorry I completely missed that. Thank you!

vedantg12 · 2023-05-29T17:45:03Z

long_text = "This is a very very long text. " * 300
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")

tokenize without truncation

inputs_no_trunc = tokenizer.encode_plus(long_text, add_special_tokens=False, return_tensors='pt')

I get the following error:

AttributeError: 'BertTokenizer' object has no attribute 'encode_plus'

Is their a substitution for this?

SimplyLucKey closed this as completed Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BertTokenizer and encode_plus() #9655

BertTokenizer and encode_plus() #9655

SimplyLucKey commented Jan 18, 2021 •

edited

thomwolf commented Jan 18, 2021

thomwolf commented Jan 18, 2021

SimplyLucKey commented Jan 19, 2021

vedantg12 commented May 29, 2023

BertTokenizer and encode_plus() #9655

BertTokenizer and encode_plus() #9655

Comments

SimplyLucKey commented Jan 18, 2021 • edited

thomwolf commented Jan 18, 2021

thomwolf commented Jan 18, 2021

SimplyLucKey commented Jan 19, 2021

vedantg12 commented May 29, 2023

tokenize without truncation

SimplyLucKey commented Jan 18, 2021 •

edited