-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3 #31789
Comments
Hello! Is it possible you have an outdated version of
|
Thanks, Updating the |
This PR - makes new document columns nullable - upgrades rag transformers and tokenizers to fix the error huggingface/transformers#31789 Part of #108
Fix Mistral truss-examples, see [issue](huggingface/transformers#31789) for context. Something changed w tokenizers library that we need to update these. This is the exception that we're seeing: ``` Exception while loading model Traceback (most recent call last): File "/app/model_wrapper.py", line 118, in load self.try_load() File "/app/model_wrapper.py", line 179, in try_load retry( File "/app/common/retry.py", line 20, in retry raise exc File "/app/common/retry.py", line 15, in retry fn() File "/app/model/model.py", line 34, in load self.tokenizer = AutoTokenizer.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/transformers/models/auto/tokenization_auto.py", line 751, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/transformers/tokenization_utils_base.py", line 2045, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/transformers/models/llama/tokenization_llama_fast.py", line 122, in __init__ super().__init__( File "/usr/local/lib/python3.11/dist-packages/transformers/tokenization_utils_fast.py", line 111, in __init__ fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3 ```
# Context There was a recent change to the Mistral repo on huggingface where they started using a newer transformer feature: This resulted in this: huggingface/transformers#31789. Bumping the transformers to fix this across all of our mistral models. As a follow-up, we should start pinning the HF repository for all of our examples to prevent this from happening. # Testing I have tested a couple of the TRT examples, but not _everything_
I have this issue and partial update is not working: !pip install -U 'tokenizers<0.15'
# Successfully installed tokenizers-0.14.1
from transformers import AutoTokenizer
model_id = "mistralai/Mistral-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True)
# Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3
!pip install -U 'tokenizers'
# Successfully installed tokenizers-0.19.1
# RESTART NOTEBOOK
from transformers import AutoTokenizer
model_id = "mistralai/Mistral-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True)
# ImportError: tokenizers>=0.14,<0.15 is required for a normal functioning of this module, but found tokenizers==0.19.1. Full update solves issue to me: !pip install -U transformers
#Successfully installed huggingface-hub-0.23.4 transformers-4.42.3 And it also works in case of run from xonsh shell. |
I have the same issue, and the package version is latest. transformers==4.43.3 tokenizers==0.19.1 |
@littlerookie sorry but can't reproduce: |
System Info
transformers
version: 4.39.0.dev0Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
There should be no error loading tokenizer
The text was updated successfully, but these errors were encountered: