Add a warning when quantize IQ3_XXS without imatrix #5332

ymcui · 2024-02-05T07:16:45Z

I believe we forgot to add IQ3_XXS to warning list when quantized without an importance matrix, as IQ3_XXS performs extremely bad without an imatrix.

Llama-2-7B@IQ3_XXS	PPL
w/o imatrix	107.1787 +/- 2.65072
w/ imatrix	5.5101 +/- 0.09247

Note: The imatrix was calculated using the first 100 chunks of wiki.train.raw

iq3_xxs should not be used without imatrix.

ggerganov · 2024-02-05T07:41:10Z

I suppose @ikawrakow had something in mind to leave the imatrix to not be required for IQ3_XXS

Otherwise, the IQ3_XXS type should also be added here:

llama.cpp/ggml.c

Lines 19116 to 19122 in 4833ac2

    
           bool ggml_quantize_requires_imatrix(enum ggml_type type) { 
        
               return 
        
                   type == GGML_TYPE_IQ2_XXS || 
        
                   type == GGML_TYPE_IQ2_XS; 
        
           }

ikawrakow · 2024-02-05T08:23:22Z

I did indeed have something in mind but forgot to add it. In my repo where I develop new quantization types I have all sorts of tweaks that are not there in mainline llama.cpp. One of those is to use a better quantization for ffn_down and attn_v when no imatrix has been provided. Because of that, I had reasonable results for IQ3_XXS even without an imatrix (e.g., PPL = 6.487 for LLaMA-v2-7B and a context of 512). When I added IQ3_XXS to llama.cpp I forgot about this detail.

I'll make a PR to address the issue.

quantize: add iq3_xxs warning

a5aa793

iq3_xxs should not be used without imatrix.

ikawrakow mentioned this pull request Feb 5, 2024

iq3_xxs: guards for the no-imatrix situation #5334

Merged

ymcui closed this Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a warning when quantize IQ3_XXS without imatrix #5332

Add a warning when quantize IQ3_XXS without imatrix #5332

ymcui commented Feb 5, 2024

ggerganov commented Feb 5, 2024

ikawrakow commented Feb 5, 2024

Add a warning when quantize IQ3_XXS without imatrix #5332

Add a warning when quantize IQ3_XXS without imatrix #5332

Conversation

ymcui commented Feb 5, 2024

ggerganov commented Feb 5, 2024

ikawrakow commented Feb 5, 2024