New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #20011
Comments
It looks like you are using the tokenizer with a broken sentencepiece vocab. In any case, we would need a reproducer with a file we have access to to be able to investigate. |
Ran into the same issue. How did you solve it? |
The whole problem was the vocab. I just took a different one. |
Whats wrong with vocab? how to change it correct? |
make sure your vocab files(*.bin files) have been downloaded fully.in my case, I didn't install git-lfs. git clone repo from huggingface is failed for these files. download these files or use git-lfs. |
System Info
transformers
version: 4.24.0Who can help?
@patrickvonplaten
@sgugger
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
from transformers import T5Tokenizer
tokenizer = T5Tokenizer(vocab_file='vocab_ruturk.spm')
Traceback (most recent call last):
File "app.py", line 3, in
tokenizer = T5Tokenizer(vocab_file='vocab.ruturk.spm')
File "env\lib\site-packages\transformers\models\t5\tokenization_t5.py", line 157, in init
self.sp_model.Load(vocab_file)
File "env\lib\site-packages\sentencepiece_init_.py", line 910, in Load
return self.LoadFromFile(model_file)
File "env\lib\site-packages\sentencepiece_init_.py", line 311, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
Expected behavior
No errors
The text was updated successfully, but these errors were encountered: