Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

Closed
jackshiwl opened this issue Jan 31, 2024 · 12 comments
Closed

Comments

@jackshiwl
Copy link

Hi,

I am trying to quantize my custom fine-tuned deepseek-7b instruct model, and I am unable to to do. I followed the document:

# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}

but it produces this error:

/content/llama.cpp/gguf-py
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00002-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00003-of-00003.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-06, n_experts=None, n_experts_used=None, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('deepseek-coder-6.7b-instruct-finetuned'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': PosixPath('deepseek-coder-6.7b-instruct-finetuned/tokenizer.json')}
Loading vocab file 'deepseek-coder-6.7b-instruct-finetuned/tokenizer.json', type 'spm'
Traceback (most recent call last):
  File "/content/llama.cpp/convert.py", line 1662, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
  File "/content/llama.cpp/convert.py", line 1618, in main
    vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
  File "/content/llama.cpp/convert.py", line 1422, in load_vocab
    vocab = SentencePieceVocab(
  File "/content/llama.cpp/convert.py", line 449, in __init__
    self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 447, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

I cannot seem to find similar errors on the github issues. Any insight to this would be greatly appreciated.
One can replicate this experiment by quantizing a deepseek 7b instruct coder model.

@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 1, 2024

Reads like a broken tokenizer file ?
Given the vocab appears not have been fine tuned, maybe get the original from here: https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main ?

@jackshiwl
Copy link
Author

Thanks for your response - however, where do I find the vocab file in that huggingface? I assume you meant the vocab.json file?

@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 2, 2024

the tokenizer and vocab files, I'm not sure which ones are used.
But given the vocabulary is the same in your fine tune I'd assume they are identical.
You could also doublecheck your local directory, if any of those files are broken

@jackshiwl
Copy link
Author

the files are not broken. This is an issue for other people as well. In fact, you dont have to quantize a custom deepseek model to get this error. If you just quantize the original 7b model, it will throw up this error too.

@vlsav
Copy link

vlsav commented Feb 10, 2024

Same story with latest set of DeepSeek Math Models.
python convert.py deepseek-math-7b-rl --pad-vocab
Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin
Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin
Loading model file deepseek-math-7b-rl\pytorch_model-00002-of-000002.bin
params = Params(n_vocab=102400, n_embd=4096, n_layer=30, n_ctx=4096, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=10000, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=WindowsPath('deepseek-math-7b-rl'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': WindowsPath('deepseek-math-7b-rl/tokenizer.json')}
Loading vocab file 'deepseek-math-7b-rl\tokenizer.json', type 'spm'
Traceback (most recent call last):
File "D:\Util\llama.cpp\convert.py", line 1478, in
main()
File "D:\Util\llama.cpp\convert.py", line 1446, in main
vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
File "D:\Util\llama.cpp\convert.py", line 1332, in load_vocab
vocab = SentencePieceVocab(
File "D:\Util\llama.cpp\convert.py", line 394, in init
self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 447, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 905, in Load
return self.LoadFromFile(model_file)
File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: D:\a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

python convert.py deepseek-math-7b-rl --vocab-type hfft --pad-vocab
makes broken model. llama.cpp cannot load it.

python convert.py deepseek-math-7b-rl --vocab-type bpe --pad-vocab
Makes loadable model, but it generates a lof of garbage and in general very strange output.
Convert shows following message abou vocab generation:
Vocab info: <BpeVocab with 100000 base tokens and 2 added tokens>
Special vocab info: <SpecialVocab with 99757 merges, special tokens {'bos': 100000, 'eos': 100001}, add special tokens {'bos': True, 'eos': False}>

@RonanKMcGovern
Copy link

Any insights @jackshiwl ?

@Nold360
Copy link
Contributor

Nold360 commented Mar 1, 2024

Can confirm this issue.. although it converts the model using vocab-type hfft, the model will not load:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ).
[...]
terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at

@jackshiwl
Copy link
Author

hi all, i am not investigating this issue anymore. I am using another model. Hope someone can fix this / look into this @cmp-nct

@itsdotscience
Copy link

itsdotscience commented Mar 10, 2024

It seems there was a change recently that pins bpe to vocab.json . From the HF docs it looks like any compatible PretrainedTokenizer transformers supports could be represented by tokenizer.json

https://huggingface.co/docs/transformers/en/fast_tokenizers

3 weeks ago, b2213 convert.py output

Loading vocab file '/ai/models/tokenizer.json', type 'bpe'
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

current mainline convert.py output

Loading vocab file PosixPath('/ai/models/tokenizer.json'), type 'hfft'
fname_tokenizer: /ai/models
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Vocab info: <HfVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

result latest running main:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_voca  b: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).

llm_load_print_meta: BOS token        = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token        = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: UNK token        = 0 '!'
llm_load_print_meta: PAD token        = 32014 '<|end▁of▁sentence|>'


terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at
Aborted (core dumped)

3 weeks ago running main:

llm_load_vocab: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).

llm_load_print_meta: BOS token        = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token        = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: PAD token        = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: LF token         = 126 'Ä'

we still have our mismatched but the type is bpe rather than spm it also produces text as expected, no garbage, rather than segfault

edit: I had another moment so I tried just copying tokenizer.json to vocab.json and setting vocab-type to bpe.

Loading vocab file PosixPath('/ai/models/vocab.json'), type 'bpe'
/ai/models
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

I confirmed both b2213 and the current main's convert.py if you do the above generate an f32 with an idental sha256 hash.

@christopherthompson81
Copy link

There's a PR from the deepseek team about this. Basically, their tokenizer needs to be supported in llama.cpp for this to work.

@hyperbolic-c
Copy link

Can confirm this issue.. although it converts the model using vocab-type hfft, the model will not load:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ).
[...]
terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at

@Nold360 yeah, I got the same error, did you have any way to solve it ? thanks. It can not quantize with convert-hf-to-gguf.py too

@github-actions github-actions bot added the stale label Apr 15, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants