Bug: GGUF of Llama 3 8B appears to use smaug-bpe pretokenizer? #7724

jim-plus · 2024-06-04T04:02:07Z

What happened?

Although running convert_hf_convert.py and then quantize completed (without errors) and appears to generate GGUFs of the correct size for Llama 3 8B, they appear to be of pretokenizer smaug-bpe. They will not load in current ooba. Unsure if this is a temporary mismatch.

Example of broken quants here:
https://huggingface.co/grimjim/Llama-3-Luminurse-v0.1-OAS-8B-GGUF/tree/main

I tried two methods, but the incompatible result happened for both cases:

python llama.cpp/convert-hf-to-gguf.py ./text-generation-webui/models/xp98-8B --outfile temp.gguf --outtype f32
llama.cpp\build\bin\release\quantize temp.gguf ./text-generation-webui/models/xp98b.Q8_0.gguf q8_0

python llama.cpp/convert-hf-to-gguf.py ./text-generation-webui/models/xp98-8B --outfile temp.gguf --outtype bf16
llama.cpp\build\bin\release\quantize temp.gguf ./text-generation-webui/models/xp98b.Q8_0.gguf q8_0

Name and Version

version: 3070 (3413ae2)
built with MSVC 19.39.33521.0 for x64

What operating system are you seeing the problem on?

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

jim-plus · 2024-06-05T12:56:27Z

Should the pretokenizer not be llama-bpe?

morrissas · 2024-06-09T13:22:08Z

i have same problem

jabberjabberjabber · 2024-06-11T06:33:16Z

It appears this happens when the creator uses the wrong tokenizer config:

https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5-GGUF/discussions/1#665878420a9e4d84afd746b7

You can fix it using this script:

llama.cpp/gguf-py/scripts$ python gguf-new-metadata.py --token_pre llama-bpe input output

jim-plus · 2024-06-11T16:05:22Z

Confirmed that the following works with the version I pulled yesterday, and the corrected GGUF will load in current ooba. That still leaves a question of why smaug-bpe was selected for Llama 3 8B instead of llama-bpe. It seems the decision arises in convert-hf-to-gguf.py and there is no apparent option to override that with a command-line option.

python llama.cpp/gguf-py/scripts/gguf-new-metadata.py --pre-tokenizer llama-bpe input_gguf output_gguf

Galunid · 2024-06-11T16:11:24Z

Let me close the issue since it happens on the model's creator side.

jim-plus · 2024-06-11T16:13:46Z

Sure. I have a more focused question regarding convert-hf-to-gguf.py and its choice of default pre-tokenizer, but I will explore the script more before asking it.

jim-plus added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 4, 2024

Galunid closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: GGUF of Llama 3 8B appears to use smaug-bpe pretokenizer? #7724

Bug: GGUF of Llama 3 8B appears to use smaug-bpe pretokenizer? #7724

jim-plus commented Jun 4, 2024

jim-plus commented Jun 5, 2024

morrissas commented Jun 9, 2024

jabberjabberjabber commented Jun 11, 2024

jim-plus commented Jun 11, 2024 •

edited

Loading

Galunid commented Jun 11, 2024

jim-plus commented Jun 11, 2024

Bug: GGUF of Llama 3 8B appears to use smaug-bpe pretokenizer? #7724

Bug: GGUF of Llama 3 8B appears to use smaug-bpe pretokenizer? #7724

Comments

jim-plus commented Jun 4, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

jim-plus commented Jun 5, 2024

morrissas commented Jun 9, 2024

jabberjabberjabber commented Jun 11, 2024

jim-plus commented Jun 11, 2024 • edited Loading

Galunid commented Jun 11, 2024

jim-plus commented Jun 11, 2024

jim-plus commented Jun 11, 2024 •

edited

Loading