Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3-8b & Perplexity.exe Issue #7291

Closed
InferenceIllusionist opened this issue May 15, 2024 · 6 comments
Closed

Llama3-8b & Perplexity.exe Issue #7291

InferenceIllusionist opened this issue May 15, 2024 · 6 comments

Comments

@InferenceIllusionist
Copy link

InferenceIllusionist commented May 15, 2024

Hi there, creating an issue for a possible bug encountered while running some tests with perplexity.exe on Llama 3 earlier today. The program is exiting without running the KL-divergence calculation.

Steps to Reproduce:
Using the 8b base model, downloaded from https://huggingface.co/meta-llama/Meta-Llama-3-8B, I ran convert-hf-to-gguf.py with the following command:

python convert-hf-to-gguf.py   .\models\Meta-Llama-3-8B

So far so good, but then when I try to run perplexity.exe, it exits without outputing anything:

.\perplexity -m models\Meta-Llama-3-8B\ggml-model-f16.gguf -bf .\evaluations\arc-challenge-validation.bin --kl-divergence-base .\models\Meta-Llama-3-8B\Meta-Llama-3-8B-divergence.dat -t 24 -ngl 200

Here's a screenshot right before it exits, tokenizing the input appears to the final step reached:

l3-kld

Is there something I'm missing here? I was able to run this same command for a model based on Mistral 7b without any problems as shown in the thread linked above. Also I'm not getting any pre-tokenizer warnings when loading the fp16 GGUF for what it's worth.

Quick edit: Verified that this seems to be happening with the f32 and Q8_0 as well.

  • Version: b2861 (compiled from source earlier today)
  • Model affected: meta-llama/Meta-Llama-3-8B
@JohannesGaessler
Copy link
Collaborator

https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexity#perplexity

Once you have the file, supply perplexity with the quantized model, the logits file via --kl-divergence-base, and finally the --kl-divergence argument to indicate that the program should calculate the so-called Kullback-Leibler divergence.

@InferenceIllusionist
Copy link
Author

Hey thanks for your reply. I took another look at the documentation for perplexity that you linked, these instructions work fine for other models but the step prior to the one you highlighted (recording the logits from the FP16 version of a model) is not working on L3 for me.

For example when I pass these arguments to perplexity.exe for a model based on Mistral-7b to record the logits:

.\perplexity -m models\Excalibur-7b\ggml-model-f16.gguf -bf .\evaluations\arc-challenge-validation.bin --kl-divergence-base .\models\Excalibur-7b\Excalibur-7b-divergence.dat -ngl 200 -t 24 

I get the expected result which outputs each chunk while it's running. The final line in the output is a PPL estimate, and the file Excalibur-7b-divergence.dat is created. This is the file that then gets passed in addition to the --kl-divergence argument and the quantized model in the step you mentioned.
excal-kl-base

Following these same steps for L3, perplexity exits prematurely before returning the expected next line ('perplexity: tokenization took [x] ms'), so the chunks/PPL estimate are not displayed and no Meta-Llama-3-8B-divergence.dat logits file is created:
l3-ppl

If it's not clear, I'm specifying the same arc-challenge-validation.bin binary file in both cases too so the file shouldn't be the issue either. It looks like some sort of problem with tokenizing the binary file but not seeing any other leads as to what could be happening. Happy to provide further screenshots or help clarify any points if needed and thanks again for your assistance here.

@JohannesGaessler
Copy link
Collaborator

Does it work with a plain text file?

@Dampfinchen
Copy link

Dampfinchen commented May 15, 2024

It could be related to text formatting. When I was using my usual copy pasted prompt from Wikipedia, main.exe exited like that too. When I deleted the copy pasted text and just wrote "Hello", it worked well.

@InferenceIllusionist
Copy link
Author

Does it work with a plain text file?

Yes, it does. I was able to run it with groups_merged.txt without issues just as a test:
l3-kld-base-groups-merged

That said using groups_merged.txt to calculate the KL-divergence base would not be ideal in this case since it's also expected to be used in some capacity for calculating the importance matrix. Also as it is not an evaluation dataset using the arc-challenge-validation binary file that has worked for other kinds of models would be better if at all possible.

When I deleted the copy pasted text and just wrote "Hello", it worked well.

Tried typing the command from the first post manually as well to make sure there were no invisible characters messing anything up, but still getting the same result. I think it's something to do with the way the tokenization handles the binary file since the process does kick off and finish when using groups_merged.txt.

@InferenceIllusionist
Copy link
Author

InferenceIllusionist commented May 16, 2024

Following up here, the solution was to convert the arc-challenge binary file to .json. Was able to get the baseline KLD and the resulting divergence.dat file was populated as expected.

Should be good to close here, appreciate everyone's input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants