[ggml-quants] Add memsets and other fixes for IQ quants by bartowski1182 · Pull Request #19861 · ggml-org/llama.cpp

bartowski1182 · 2026-02-24T20:04:04Z

While trying to stop my Qwen3.5 quants from getting a ton of "Oops: found point X not on grid ...", I (and claude) came across a potential big issue

Using gdb, it seems that L is often initialized to non-zero memory, and so when it's read, it has garbage data in it that's causing the quantizations to go awry when there's no candidates found during the search

With this change, with Qwen3.5, I no longer saw ANY "Oops: found point.." errors, and the PPL seems totally as expected

This affected IQ2_XXS, IQ2_XS, IQ3_XXS, and IQ3_S. It seems that it's most often caused when the imatrix data contains a full block of 0s (as is common in MoE models with a large number of experts) or when the block itself is all 0s (for a particularly sparse model)

Now, memset is called on L before it is used.

Additional changes:

L -> Laux: basically a no-op, we only use the returned scale so we can use L or Laux here, since L is immediately memset right after this change makes it more obvious that this is a buffer, but I don't care if we leave this line out

eff_max <= 0 check: guards against dividing by 0 when scale is 0, we set the scales to 0 and exit early like when max < GROUP_MAX_EPS is triggered

PPL tested on Qwen3-Code-Next:

PPL IQ2_XS before:

Final estimate: PPL = 9.3646 +/- 0.16983

PPL IQ2_XS after:

Final estimate: PPL = 9.3703 +/- 0.17014

Difference: +0.0057

IQ2_XXS is now possible to make (with my imatrix at least) whereas before I got:

Oops: found point 16384 not on grid: 0 0 0 0 0 0 0 1
[   9/ 843]           blk.0.ffn_gate_exps.weight - [ 2048,   512,   512,     1], type =   bf16, converting to iq2_xs .. /llama.cpp/ggml/src/ggml-quants.c:3355: fatal error
libggml-base.so.0(+0x1848b)[0x70bdefd5648b]
libggml-base.so.0(ggml_print_backtrace+0x21f)[0x70bdefd568ef]
libggml-base.so.0(ggml_abort+0x152)[0x70bdefd56ac2]
libggml-base.so.0(+0x46b8c)[0x70bdefd84b8c]
libggml-base.so.0(quantize_iq2_xs+0x81)[0x70bdefda3d21]

For sanity, PPL IQ2_XXS:

Final estimate: PPL = 10.4755 +/- 0.19246

For IQ1_S and IQ1_M:

Similar to above, the assert happens when we have all-zero weights meaning the search loop failed to find a candidate. Instead now we treat is as a zero block, and we populate the scales and shifts to avoid uninitialized memory

This was the assert I was seeing:

blk.7.ffn_down_exps.weight - [ 1024,  4096,   512,     1], type =   bf16, converting to iq1_m .. /llama.cpp/ggml/src/ggml-quants.c:4534: GGML_ASSERT(besti1 >= 0 && besti2 >= 0 && best_k >= 0) failed                                                                                          
  /llama.cpp/ggml/src/ggml-quants.c:4534: GGML_ASSERT(besti1 >= 0 && besti2 >= 0 && best_k >= 0) failed

PPL IQ1_S before:

Final estimate: PPL = 15.6349 +/- 0.31023

PPL IQ1_S after:

Final estimate: PPL = 15.6509 +/- 0.31119

Difference: +0.016

ALL PPL calculations done with:

./llama-perplexity -m /models/Qwen3-Coder-Next.gguf --flash-attn on -f /training_data/wiki.test.raw --chunks 100 --batch-size 4096 --ubatch-size 4096

I used Qwen3-Coder-Next because I had existing "before" quants to compare against, and IQ2_XXS previously failed with the "Oops" crash.

Note, I think the PPL change is from changes to the model conversion, I ran PPL against the full bf16 model

Conversion done with old release b7936:

Final estimate: PPL = 8.0452 +/- 0.15022

Conversion from master:

Final estimate: PPL = 8.0508 +/- 0.15029

Difference: +0.0056

(in hindsight, wikitext for a coding model wasn't the best dataset... willing to redo the PPLs against a different dataset if necessary)

Disclaimer:

I don't fully understand this area of code yet, so while the memset stuff is obvious and had immediate tangible improvements, the rest is outside my wheelhouse. I really would appreciate another set of experienced eyes looking at this, I provide my test results in an attempt to prove nothing has been broken by this, and have reviewed this to the best of my ability

If any additional tests should be run before people are willing to review I'll more than happily run them

ggml/src/ggml-quants.c

bartowski1182 · 2026-02-25T01:14:15Z

~~So strangely after that change, IQ2_XXS PPL went down:~~

~~Final estimate: PPL = 10.2589 +/- 0.18574~~

~~Difference: -0.2166~~

~~but IQ2_XS PPL went up (pretty substantially):~~

~~Final estimate: PPL = 9.6630 +/- 0.17577~~

~~Difference: +0.2927~~

~~Not sure what to take from that..~~

Wait, I think I may have used a wrong branch versus the original ones I was testing, ignore this for now

bartowski1182 · 2026-02-25T03:05:06Z

Okay so with the change, IQ2_XS PPL:

Final estimate: PPL = 9.3747 +/- 0.17023

Difference: +0.0044

IQ2_XXS PPL:

10.4916 +/- 0.19312

Difference: +0.0161

So strangely it's worse but only just barely 🤷

bartowski1182 · 2026-02-25T18:10:43Z

Okay so after compiling again with -O0 to avoid optimizations introducing random noise, the PPL from my initial commit compared to my most recent is identical, so I think the PPL differences (from iterating) can be safely ignored.

I think clearing the L value after make_qp_quants was also effectively a no-op

If scale is negative, as @TheAIBot noted, eff_max will be negative and we'll eject early, so scale is guaranteed to be positive

Then, scale is updated right before Laux is copied onto L

L is then updated if scale > 0, and the only time that couldn't happen is if this made scale negative:

                if (sumq2 > 0 && sumqx*sumqx > best*sumq2) {
                    scale = sumqx/sumq2; best = scale*sumqx;
                    memcpy(L, Laux, 32);
                }

But then as I mentioned, we memcpy onto L anyways

so in summation, the current state is better and more clear than it was, but it was effectively the exact same result :)

bartowski1182 · 2026-02-26T14:24:23Z

@ggerganov ready to review, I can provide more of a TLDR if you don't want to dig through my larger posts

bartowski1182 · 2026-03-05T14:17:14Z

@ggerganov bump in case you missed it :)

ggerganov · 2026-03-05T14:43:50Z

For normal models that do not have degenerate blocks full of 0s does the PPL match exactly before and after?

bartowski1182 · 2026-03-05T15:42:22Z

For normal models that do not have degenerate blocks full of 0s does the PPL match exactly before and after?

Tested on Qwen3.5-35B-A3B IQ2_XXS and actually the sha256sum is identical with and without this change (compiled with -O0)

PPL also matches (which I went to test before trying the sha256sum)

edit: Will also test IQ1_M/S since that change is different

bartowski1182 · 2026-03-05T18:26:14Z

Decided to test all just to be sure @ggerganov

IQ1_M sha256sum didn't match, but the PPL did:

Final estimate: PPL = 13.4711 +/- 0.09659

manually setting shifts[ib] = 0; in the max < GROUP_MAX_EPS_IQ1_M must result in a tiny but irrelevant change

IQ1_S can't be different because the only time it can be was when there was previously an assert, that said I ran them anyways

IQ1_S, IQ3_XXS, IQ3_XS, IQ3_S, IQ2_S, IQ2_XS, and IQ2_XXS all have identical sha256sum before and after this PR on Qwen3.5-35B-A3B

Add memsets and other fixes for IQ quants

abfca17

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 24, 2026

bartowski1182 changed the title ~~[WIP] Add memsets and other fixes for IQ quants - NEED REVIEW~~ [ggml-quants] Add memsets and other fixes for IQ quants - NEED REVIEW Feb 24, 2026

bartowski1182 changed the title ~~[ggml-quants] Add memsets and other fixes for IQ quants - NEED REVIEW~~ [ggml-quants] Add memsets and other fixes for IQ quants Feb 24, 2026

compilade self-requested a review February 24, 2026 20:24

compilade reviewed Feb 24, 2026

View reviewed changes

ggml/src/ggml-quants.c Outdated Show resolved Hide resolved

ggml/src/ggml-quants.c Outdated Show resolved Hide resolved

ggml/src/ggml-quants.c Outdated Show resolved Hide resolved

ggml/src/ggml-quants.c Show resolved Hide resolved

Make memset unconditional, change Laux back to L

9354ce0

compilade reviewed Feb 24, 2026

View reviewed changes

ggml/src/ggml-quants.c Outdated Show resolved Hide resolved

Move another memset

5da083c

TheAIBot reviewed Feb 24, 2026

View reviewed changes

ggml/src/ggml-quants.c Show resolved Hide resolved

bartowski1182 marked this pull request as ready for review February 25, 2026 18:11

bartowski1182 requested a review from ggerganov as a code owner February 25, 2026 18:11

Merge branch 'ggml-org:master' into master

163093c

ggerganov merged commit 649f064 into ggml-org:master Mar 6, 2026
78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ggml-quants] Add memsets and other fixes for IQ quants#19861

[ggml-quants] Add memsets and other fixes for IQ quants#19861
ggerganov merged 4 commits intoggml-org:masterfrom
bartowski1182:master

bartowski1182 commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bartowski1182 commented Feb 25, 2026 •

edited

Loading

Uh oh!

bartowski1182 commented Feb 25, 2026

Uh oh!

bartowski1182 commented Feb 25, 2026 •

edited

Loading

Uh oh!

bartowski1182 commented Feb 26, 2026

Uh oh!

bartowski1182 commented Mar 5, 2026

Uh oh!

ggerganov commented Mar 5, 2026

Uh oh!

bartowski1182 commented Mar 5, 2026 •

edited

Loading

Uh oh!

bartowski1182 commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bartowski1182 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bartowski1182 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Feb 25, 2026

Uh oh!

bartowski1182 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Feb 26, 2026

Uh oh!

bartowski1182 commented Mar 5, 2026

Uh oh!

ggerganov commented Mar 5, 2026

Uh oh!

bartowski1182 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bartowski1182 commented Feb 24, 2026 •

edited

Loading

bartowski1182 commented Feb 25, 2026 •

edited

Loading

bartowski1182 commented Feb 25, 2026 •

edited

Loading

bartowski1182 commented Mar 5, 2026 •

edited

Loading