Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the fast implementation of BlenderbotTokenizer #13634

Closed
stancld opened this issue Sep 17, 2021 · 6 comments · Fixed by #13720
Closed

Add the fast implementation of BlenderbotTokenizer #13634

stancld opened this issue Sep 17, 2021 · 6 comments · Fixed by #13720

Comments

@stancld
Copy link
Contributor

stancld commented Sep 17, 2021

🚀 Feature request

As it is the case for other models' tokenizers, add the fast implementation of BlenderbotTokenizer.

Motivation

To have faster tokenization for Blenderbot models. (Also, the implementation should be pretty straightforward considering the similarity to the RobertaTokenizer/RobertaTokenizerFast.)

Your contribution

I would like to have a look at this and will be glad to add that.

@LysandreJik
Copy link
Member

That sounds great @stancld, we would love a PR!

@stancld
Copy link
Contributor Author

stancld commented Sep 20, 2021

@LysandreJik - I found a minor issue in the formatting of tokenizer_config.json at https://huggingface.co/facebook/blenderbot-3B/blob/main/tokenizer_config.json, where is "add_prefix_space": "true" instead of "add_prefix_space": true. This leads to the error (see below) during the slow->fast (tokenizer) conversion. It can be handled in the converter source code, though I believe it might be better to update a config file. Is there a way of how to send a PR to HF's hub?

Error:

TypeError: Can't convert 'true' to PyBool

@LysandreJik
Copy link
Member

Ah ! There's no way to do that as of now - let me handle that for you.

@LysandreJik
Copy link
Member

LysandreJik commented Sep 21, 2021

Should be done with huggingface#c468b23

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@stancld
Copy link
Contributor Author

stancld commented Oct 22, 2021

still open #13720

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants