Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iq3_xxs: guards for the no-imatrix situation #5334

Merged
merged 1 commit into from Feb 5, 2024

Conversation

ikawrakow
Copy link
Contributor

IQ3_XXS can give a very bad quantization when used without an importance matrix (imatrix), see #5332.

Instead of adding a warning or even disallowing IQ3_XXS quantization without an imatrix, this PR prevents a bad outcome by using Q3_K for the attn_v tensors, and a mix of Q4_K and Q3_K for the ffn_down tensors when no imatrix has been supplied. This results in a somewhat larger quantized model (e.g., 2.61 GiB vs 2.5 GiB for 7B LLaMAs) but a more reasonable PPL (e.g., 5.4923 for LLaMA-v2-7B and a context of 4096 vs 100+).

@ggerganov ggerganov changed the title iq3_xxs: quards for the no-imatrix situation iq3_xxs: guards for the no-imatrix situation Feb 5, 2024
@ikawrakow ikawrakow merged commit 89503dc into master Feb 5, 2024
56 checks passed
@ikawrakow ikawrakow deleted the ik/iq3xxs_noimatrix_guard branch February 5, 2024 10:32
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants