sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #20011

showpiecep · 2022-11-01T22:31:33Z

System Info

transformers version: 4.24.0
Platform: Windows-10-10.0.19041-SP0
Python version: 3.9.2
Huggingface_hub version: 0.10.1
PyTorch version (GPU?): 1.13.0+cpu (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@patrickvonplaten
@sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import T5Tokenizer
tokenizer = T5Tokenizer(vocab_file='vocab_ruturk.spm')

Traceback (most recent call last):
File "app.py", line 3, in
tokenizer = T5Tokenizer(vocab_file='vocab.ruturk.spm')
File "env\lib\site-packages\transformers\models\t5\tokenization_t5.py", line 157, in init
self.sp_model.Load(vocab_file)
File "env\lib\site-packages\sentencepiece_init_.py", line 910, in Load
return self.LoadFromFile(model_file)
File "env\lib\site-packages\sentencepiece_init_.py", line 311, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Expected behavior

No errors

The text was updated successfully, but these errors were encountered:

sgugger · 2022-11-02T12:34:22Z

It looks like you are using the tokenizer with a broken sentencepiece vocab. In any case, we would need a reproducer with a file we have access to to be able to investigate.

cliangyu · 2022-11-05T05:55:02Z

Ran into the same issue. How did you solve it?

showpiecep · 2022-11-06T20:31:55Z

Ran into the same issue. How did you solve it?

The whole problem was the vocab. I just took a different one.

AlexWortega · 2023-05-25T08:36:46Z

Whats wrong with vocab? how to change it correct?

jiangzhuolin · 2023-08-06T12:10:04Z

Whats wrong with vocab? how to change it correct?

make sure your vocab files(*.bin files) have been downloaded fully.in my case, I didn't install git-lfs. git clone repo from huggingface is failed for these files. download these files or use git-lfs.

showpiecep closed this as completed Nov 4, 2022

cliangyu mentioned this issue Nov 7, 2022

Problematic Tokennizer? antoyang/FrozenBiLM#3

Closed

yanqiangmiffy mentioned this issue Apr 19, 2023

load model failed. 加载模型失败 chatchat-space/Langchain-Chatchat#132

Closed

xnerhu mentioned this issue Aug 13, 2023

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] DAMO-NLP-SG/Video-LLaMA#73

Closed

rjccv mentioned this issue Mar 7, 2024

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] mbzuai-oryx/Video-ChatGPT#91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #20011

sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #20011

showpiecep commented Nov 1, 2022

sgugger commented Nov 2, 2022

cliangyu commented Nov 5, 2022

showpiecep commented Nov 6, 2022 •

edited

AlexWortega commented May 25, 2023

jiangzhuolin commented Aug 6, 2023

sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #20011

sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] #20011

Comments

showpiecep commented Nov 1, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

sgugger commented Nov 2, 2022

cliangyu commented Nov 5, 2022

showpiecep commented Nov 6, 2022 • edited

AlexWortega commented May 25, 2023

jiangzhuolin commented Aug 6, 2023

showpiecep commented Nov 6, 2022 •

edited