Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeGeeX2模型转换错误 #228

Open
norlandsoft opened this issue Dec 16, 2023 · 2 comments
Open

CodeGeeX2模型转换错误 #228

norlandsoft opened this issue Dec 16, 2023 · 2 comments

Comments

@norlandsoft
Copy link

执行:
python3 chatglm_cpp/convert.py -i modules/codegeex2-6b -t q4_0 -o codegeex-ggml.bin
报错:
Traceback (most recent call last): File "chatglm_cpp/convert.py", line 543, in <module> main() File "chatglm_cpp/convert.py", line 537, in main convert(f, args.model_name_or_path, dtype=args.type) File "chatglm_cpp/convert.py", line 469, in convert tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True) File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/Users/eric/.cache/huggingface/modules/transformers_modules/codegeex2-6b/tokenization_chatglm.py", line 69, in __init__ super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=False, **kwargs) File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/Users/eric/.cache/huggingface/modules/transformers_modules/codegeex2-6b/tokenization_chatglm.py", line 99, in get_vocab vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)} File "/Users/eric/.cache/huggingface/modules/transformers_modules/codegeex2-6b/tokenization_chatglm.py", line 95, in vocab_size return self.tokenizer.n_words AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'

@sosoayaen
Copy link

执行: python3 chatglm_cpp/convert.py -i modules/codegeex2-6b -t q4_0 -o codegeex-ggml.bin 报错: Traceback (most recent call last): File "chatglm_cpp/convert.py", line 543, in <module> main() File "chatglm_cpp/convert.py", line 537, in main convert(f, args.model_name_or_path, dtype=args.type) File "chatglm_cpp/convert.py", line 469, in convert tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True) File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/Users/eric/.cache/huggingface/modules/transformers_modules/codegeex2-6b/tokenization_chatglm.py", line 69, in __init__ super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=False, **kwargs) File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/Users/eric/miniconda3/envs/chatglm.cpp/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/Users/eric/.cache/huggingface/modules/transformers_modules/codegeex2-6b/tokenization_chatglm.py", line 99, in get_vocab vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)} File "/Users/eric/.cache/huggingface/modules/transformers_modules/codegeex2-6b/tokenization_chatglm.py", line 95, in vocab_size return self.tokenizer.n_words AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'

#190 (comment)

@sosoayaen
Copy link

chatchat-space/Langchain-Chatchat#1835 (comment)

这两个方式都能解决问题,要么降级,要么使用 chatgml3-6b 下的 文件 把 codegeex2-6b 下的覆盖掉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants