Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a warning when quantize IQ3_XXS without imatrix #5332

Closed
wants to merge 1 commit into from

Conversation

ymcui
Copy link
Contributor

@ymcui ymcui commented Feb 5, 2024

I believe we forgot to add IQ3_XXS to warning list when quantized without an importance matrix, as IQ3_XXS performs extremely bad without an imatrix.

Llama-2-7B@IQ3_XXS PPL
w/o imatrix 107.1787 +/- 2.65072
w/ imatrix 5.5101 +/- 0.09247

Note: The imatrix was calculated using the first 100 chunks of wiki.train.raw

iq3_xxs should not be used without imatrix.
@ggerganov
Copy link
Owner

I suppose @ikawrakow had something in mind to leave the imatrix to not be required for IQ3_XXS

Otherwise, the IQ3_XXS type should also be added here:

llama.cpp/ggml.c

Lines 19116 to 19122 in 4833ac2

bool ggml_quantize_requires_imatrix(enum ggml_type type) {
return
type == GGML_TYPE_IQ2_XXS ||
type == GGML_TYPE_IQ2_XS;
}

@ikawrakow
Copy link
Contributor

I did indeed have something in mind but forgot to add it. In my repo where I develop new quantization types I have all sorts of tweaks that are not there in mainline llama.cpp. One of those is to use a better quantization for ffn_down and attn_v when no imatrix has been provided. Because of that, I had reasonable results for IQ3_XXS even without an imatrix (e.g., PPL = 6.487 for LLaMA-v2-7B and a context of 512). When I added IQ3_XXS to llama.cpp I forgot about this detail.

I'll make a PR to address the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants