-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fast tokenizer does not deal with AddedTokens properly(no problem in Transformers python tokenizer impl.) #1544
Comments
|
Hey! I think most of these can be removed if you set the Basically the |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
transformers version: 4.42.3 the output is exactly same as the time I first reported this issue. |
Hey! As you mention:
the issue is that if you do not call "from_slow" I cannot update the tokenizer for you. I mean I'll ping the team internally for sure, but we need to re-upload tokenizer.json! |
Can you open a PR on the hub and ping me here with the link? 🤗 |
Closing as the issue is with the model on the hub! |
When I'm trying to add some tokens in vocab, there's 3 issue in
Fast
type tokenizers; there's no problem in python tokenizer, though.Source code to recall issue
execution result
Additional Note
If I use
from_slow
option to load Fast Tokenizer, it have no problem.tokenizer = LlamaTokenizerFast.from_pretrained("HuggingFaceM4/idefics2-8b", from_slow=True)
The text was updated successfully, but these errors were encountered: