Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行data_uilts的时候出错 #61

Open
z1968357787 opened this issue Apr 8, 2023 · 0 comments
Open

运行data_uilts的时候出错 #61

z1968357787 opened this issue Apr 8, 2023 · 0 comments

Comments

@z1968357787
Copy link

INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg1hbjeku
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg1hbjeku/_remote_module_non_scriptable.py
INFO:lightning_fabric.utilities.seed:Global seed set to 42
Traceback (most recent call last):
File "/home/cike/zzp/alpaca/chatglm_finetuning/data_utils.py", line 272, in
tokenizer, config, , = dataHelper.load_tokenizer_and_config(tokenizer_class_name=ChatGLMTokenizer,config_class_name=ChatGLMConfig)
File "/home/cike/anaconda/envs/alpaca/lib/python3.9/site-packages/deep_training/data_helper/data_helper.py", line 257, in load_tokenizer_and_config
tokenizer = load_tokenizer(tokenizer_name=tokenizer_name or model_args.tokenizer_name,
File "/home/cike/anaconda/envs/alpaca/lib/python3.9/site-packages/deep_training/data_helper/data_module.py", line 29, in load_tokenizer
tokenizer = class_name.from_pretrained(tokenizer_name, **tokenizer_kwargs)
File "/home/cike/anaconda/envs/alpaca/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "/home/cike/anaconda/envs/alpaca/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/cike/zzp/alpaca/chatglm_finetuning/tokenization_chatglm.py", line 211, in init
self.sp_tokenizer = SPTokenizer(vocab_file)
File "/home/cike/zzp/alpaca/chatglm_finetuning/tokenization_chatglm.py", line 32, in init
self.text_tokenizer = self._build_text_tokenizer(encode_special_tokens=False)
File "/home/cike/zzp/alpaca/chatglm_finetuning/tokenization_chatglm.py", line 65, in _build_text_tokenizer
self._configure_tokenizer(
File "/home/cike/zzp/alpaca/chatglm_finetuning/tokenization_chatglm.py", line 61, in _configure_tokenizer
text_tokenizer.refresh()
File "/home/cike/anaconda/envs/alpaca/lib/python3.9/site-packages/icetk/text_tokenizer.py", line 31, in refresh
self.sp.Load(model_proto=self.proto.SerializeToString())
File "/home/cike/anaconda/envs/alpaca/lib/python3.9/site-packages/sentencepiece/init.py", line 904, in Load
return self.LoadFromSerializedProto(model_proto)
File "/home/cike/anaconda/envs/alpaca/lib/python3.9/site-packages/sentencepiece/init.py", line 250, in LoadFromSerializedProto
return _sentencepiece.SentencePieceProcessor_LoadFromSerializedProto(self, serialized)
RuntimeError: Internal: [MASK] is already defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant