Llama3-8b & Perplexity.exe Issue #7291

InferenceIllusionist · 2024-05-15T01:45:11Z

Hi there, creating an issue for a possible bug encountered while running some tests with perplexity.exe on Llama 3 earlier today. The program is exiting without running the KL-divergence calculation.

Steps to Reproduce:
Using the 8b base model, downloaded from https://huggingface.co/meta-llama/Meta-Llama-3-8B, I ran convert-hf-to-gguf.py with the following command:

python convert-hf-to-gguf.py   .\models\Meta-Llama-3-8B

So far so good, but then when I try to run perplexity.exe, it exits without outputing anything:

.\perplexity -m models\Meta-Llama-3-8B\ggml-model-f16.gguf -bf .\evaluations\arc-challenge-validation.bin --kl-divergence-base .\models\Meta-Llama-3-8B\Meta-Llama-3-8B-divergence.dat -t 24 -ngl 200

Here's a screenshot right before it exits, tokenizing the input appears to the final step reached:

Is there something I'm missing here? I was able to run this same command for a model based on Mistral 7b without any problems as shown in the thread linked above. Also I'm not getting any pre-tokenizer warnings when loading the fp16 GGUF for what it's worth.

Quick edit: Verified that this seems to be happening with the f32 and Q8_0 as well.

Version: b2861 (compiled from source earlier today)
Model affected: meta-llama/Meta-Llama-3-8B

The text was updated successfully, but these errors were encountered:

JohannesGaessler · 2024-05-15T06:40:50Z

https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity#perplexity

Once you have the file, supply perplexity with the quantized model, the logits file via --kl-divergence-base, and finally the --kl-divergence argument to indicate that the program should calculate the so-called Kullback-Leibler divergence.

InferenceIllusionist · 2024-05-15T16:20:53Z

Hey thanks for your reply. I took another look at the documentation for perplexity that you linked, these instructions work fine for other models but the step prior to the one you highlighted (recording the logits from the FP16 version of a model) is not working on L3 for me.

For example when I pass these arguments to perplexity.exe for a model based on Mistral-7b to record the logits:

.\perplexity -m models\Excalibur-7b\ggml-model-f16.gguf -bf .\evaluations\arc-challenge-validation.bin --kl-divergence-base .\models\Excalibur-7b\Excalibur-7b-divergence.dat -ngl 200 -t 24

I get the expected result which outputs each chunk while it's running. The final line in the output is a PPL estimate, and the file Excalibur-7b-divergence.dat is created. This is the file that then gets passed in addition to the --kl-divergence argument and the quantized model in the step you mentioned.

Following these same steps for L3, perplexity exits prematurely before returning the expected next line ('perplexity: tokenization took [x] ms'), so the chunks/PPL estimate are not displayed and no Meta-Llama-3-8B-divergence.dat logits file is created:

If it's not clear, I'm specifying the same arc-challenge-validation.bin binary file in both cases too so the file shouldn't be the issue either. It looks like some sort of problem with tokenizing the binary file but not seeing any other leads as to what could be happening. Happy to provide further screenshots or help clarify any points if needed and thanks again for your assistance here.

JohannesGaessler · 2024-05-15T17:27:01Z

Does it work with a plain text file?

Dampfinchen · 2024-05-15T17:37:37Z

It could be related to text formatting. When I was using my usual copy pasted prompt from Wikipedia, main.exe exited like that too. When I deleted the copy pasted text and just wrote "Hello", it worked well.

InferenceIllusionist · 2024-05-15T19:07:51Z

Does it work with a plain text file?

Yes, it does. I was able to run it with groups_merged.txt without issues just as a test:

That said using groups_merged.txt to calculate the KL-divergence base would not be ideal in this case since it's also expected to be used in some capacity for calculating the importance matrix. Also as it is not an evaluation dataset using the arc-challenge-validation binary file that has worked for other kinds of models would be better if at all possible.

When I deleted the copy pasted text and just wrote "Hello", it worked well.

Tried typing the command from the first post manually as well to make sure there were no invisible characters messing anything up, but still getting the same result. I think it's something to do with the way the tokenization handles the binary file since the process does kick off and finish when using groups_merged.txt.

InferenceIllusionist · 2024-05-16T16:25:42Z

Following up here, the solution was to convert the arc-challenge binary file to .json. Was able to get the baseline KLD and the resulting divergence.dat file was populated as expected.

Should be good to close here, appreciate everyone's input.

InferenceIllusionist added the bug-unconfirmed label May 15, 2024

InferenceIllusionist closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3-8b & Perplexity.exe Issue #7291

Llama3-8b & Perplexity.exe Issue #7291

InferenceIllusionist commented May 15, 2024 •

edited

JohannesGaessler commented May 15, 2024

InferenceIllusionist commented May 15, 2024

JohannesGaessler commented May 15, 2024

Dampfinchen commented May 15, 2024 •

edited

InferenceIllusionist commented May 15, 2024

InferenceIllusionist commented May 16, 2024 •

edited

Llama3-8b & Perplexity.exe Issue #7291

Llama3-8b & Perplexity.exe Issue #7291

Comments

InferenceIllusionist commented May 15, 2024 • edited

JohannesGaessler commented May 15, 2024

InferenceIllusionist commented May 15, 2024

JohannesGaessler commented May 15, 2024

Dampfinchen commented May 15, 2024 • edited

InferenceIllusionist commented May 15, 2024

InferenceIllusionist commented May 16, 2024 • edited

InferenceIllusionist commented May 15, 2024 •

edited

Dampfinchen commented May 15, 2024 •

edited

InferenceIllusionist commented May 16, 2024 •

edited