Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BlenderBot small tokenizer to the init #13367

Merged
merged 4 commits into from
Sep 22, 2021
Merged

Conversation

LysandreJik
Copy link
Member

This class was forgotten, adding it to the init and the documentation.

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this !

src/transformers/__init__.py Outdated Show resolved Hide resolved
LysandreJik and others added 2 commits September 1, 2021 12:57
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@LysandreJik
Copy link
Member Author

Actually this tokenizer seems a bit broken:

tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/lysandre/Workspaces/Python/transformers/src/transformers/models/auto/tokenization_auto.py", line 469, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/lysandre/Workspaces/Python/transformers/src/transformers/tokenization_utils_base.py", line 1741, in from_pretrained
    return cls._from_pretrained(
  File "/home/lysandre/Workspaces/Python/transformers/src/transformers/tokenization_utils_base.py", line 1858, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/lysandre/Workspaces/Python/transformers/src/transformers/models/blenderbot_small/tokenization_blenderbot_small_fast.py", line 76, in __init__
    ByteLevelBPETokenizer(
  File "/home/lysandre/transformers/.env/lib/python3.8/site-packages/tokenizers/implementations/byte_level_bpe.py", line 36, in __init__
    BPE(
Exception: Error while initializing BPE: Token `_</w>` out of vocabular

@LysandreJik
Copy link
Member Author

LysandreJik commented Sep 1, 2021

Which only appears after fixing with 6d90d5a

cc @patil-suraj if I recall correctly you implemented this tokenizer, do you remember what might have gone wrong?

@LysandreJik LysandreJik merged commit 5b57075 into master Sep 22, 2021
@LysandreJik LysandreJik deleted the blenderbot-small-fix branch September 22, 2021 23:00
Narsil pushed a commit to Narsil/transformers that referenced this pull request Sep 25, 2021
* Add BlenderBot small tokenizer to the init

* Update src/transformers/__init__.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Style

* Bugfix

Co-authored-by: Suraj Patil <surajp815@gmail.com>
stas00 pushed a commit to stas00/transformers that referenced this pull request Oct 12, 2021
* Add BlenderBot small tokenizer to the init

* Update src/transformers/__init__.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Style

* Bugfix

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022
* Add BlenderBot small tokenizer to the init

* Update src/transformers/__init__.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Style

* Bugfix

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
* Add BlenderBot small tokenizer to the init

* Update src/transformers/__init__.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Style

* Bugfix

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants