Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: GGUF of Llama 3 8B appears to use smaug-bpe pretokenizer? #7724

Closed
jim-plus opened this issue Jun 4, 2024 · 6 comments
Closed

Bug: GGUF of Llama 3 8B appears to use smaug-bpe pretokenizer? #7724

jim-plus opened this issue Jun 4, 2024 · 6 comments
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

Comments

@jim-plus
Copy link

jim-plus commented Jun 4, 2024

What happened?

Although running convert_hf_convert.py and then quantize completed (without errors) and appears to generate GGUFs of the correct size for Llama 3 8B, they appear to be of pretokenizer smaug-bpe. They will not load in current ooba. Unsure if this is a temporary mismatch.

Example of broken quants here:
https://huggingface.co/grimjim/Llama-3-Luminurse-v0.1-OAS-8B-GGUF/tree/main

I tried two methods, but the incompatible result happened for both cases:

python llama.cpp/convert-hf-to-gguf.py ./text-generation-webui/models/xp98-8B --outfile temp.gguf --outtype f32
llama.cpp\build\bin\release\quantize temp.gguf ./text-generation-webui/models/xp98b.Q8_0.gguf q8_0

python llama.cpp/convert-hf-to-gguf.py ./text-generation-webui/models/xp98-8B --outfile temp.gguf --outtype bf16
llama.cpp\build\bin\release\quantize temp.gguf ./text-generation-webui/models/xp98b.Q8_0.gguf q8_0

Name and Version

version: 3070 (3413ae2)
built with MSVC 19.39.33521.0 for x64

What operating system are you seeing the problem on?

No response

Relevant log output

No response

@jim-plus jim-plus added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 4, 2024
@jim-plus
Copy link
Author

jim-plus commented Jun 5, 2024

Should the pretokenizer not be llama-bpe?

@morrissas
Copy link

i have same problem
螢幕擷取畫面 2024-06-09 212151

@jabberjabberjabber
Copy link

It appears this happens when the creator uses the wrong tokenizer config:

You can fix it using this script:

llama.cpp/gguf-py/scripts$ python gguf-new-metadata.py --token_pre llama-bpe input output

@jim-plus
Copy link
Author

jim-plus commented Jun 11, 2024

Confirmed that the following works with the version I pulled yesterday, and the corrected GGUF will load in current ooba. That still leaves a question of why smaug-bpe was selected for Llama 3 8B instead of llama-bpe. It seems the decision arises in convert-hf-to-gguf.py and there is no apparent option to override that with a command-line option.

python llama.cpp/gguf-py/scripts/gguf-new-metadata.py --pre-tokenizer llama-bpe input_gguf output_gguf

@Galunid
Copy link
Collaborator

Galunid commented Jun 11, 2024

Let me close the issue since it happens on the model's creator side.

@Galunid Galunid closed this as completed Jun 11, 2024
@jim-plus
Copy link
Author

Sure. I have a more focused question regarding convert-hf-to-gguf.py and its choice of default pre-tokenizer, but I will explore the script more before asking it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
Projects
None yet
Development

No branches or pull requests

4 participants