ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #17526

Alcpz · 2025-11-26T17:40:37Z

For small shapes where the number of columns is small (i.e. 16), the current logic skipped some chunks due to rounding.

The issue was observed with NB_COLS 8 and ne01 16, and could potentially happen with NB_COLS 4 and other combinations threads/shape.
This is also affected the corner case where chunking is disabled.

@max-krasnyansky I checked the performance here and didn't see any issue. Let me know if you'd like me to perform any particular test

Performance

RPI5

model	test	`2f416b2` (7162) t/s	`3e18dba` (7161) t/s
lfm2 350M Q4_0	pp256	174.46 ± 0.07	173.41 ± 0.64
lfm2 350M Q4_0	tg128	51.58 ± 0.03	51.38 ± 0.26
lfm2 700M Q4_0	pp256	81.79 ± 0.01	82.55 ± 0.03
lfm2 700M Q4_0	tg128	25.78 ± 0.00	25.86 ± 0.00

M4 max

model	test	`2f416b2` (7162) t/s	`3e18dba` (7161) t/s
lfm2 1.2B Q4_K Medium	pp256	682.39 ± 3.23	682.82 ± 2.97
lfm2 1.2B Q4_K Medium	tg128	233.77 ± 4.45	234.96 ± 0.57
lfm2 700M Q4_K Medium	pp256	1070.08 ± 2.77	1067.29 ± 7.14
lfm2 700M Q4_K Medium	tg128	331.12 ± 1.27	333.13 ± 1.32
llama 8B Q4_K Medium	pp256	100.26 ± 0.11	96.65 ± 1.75
llama 8B Q4_K Medium	tg128	43.10 ± 0.50	41.69 ± 0.72
qwen3 8B Q4_K Medium	pp256	94.40 ± 0.33	90.45 ± 0.34
qwen3 8B Q4_K Medium	tg128	40.92 ± 0.33	40.29 ± 0.27

max-krasnyansky · 2025-11-26T21:14:22Z

Looks good to me. It's funny how many little corner cases we ended up having to deal with.
The original logic I added (ie 4x chunks per thread) seemed so simple and bulletproof :)

Tested on my Snapdragon Gen5 with a bunch of models (llama-3.2-1/2B, qwen3-0.6B .. 8B, LFM2s, ...).
nchunk selection looks good and the overall performance is the same. Merging ...

Alcpz · 2025-11-27T11:27:45Z

Yeah totally. I guess these smaller cases are not representative of the models that are out there and that's why we don't run into them. Thanks for the review and merge

Fix chunks being too small with small matrix sizes

15d640e

Alcpz requested a review from ggerganov as a code owner November 26, 2025 17:40

Alcpz mentioned this pull request Nov 26, 2025

ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) #17494

Merged

loci-dev mentioned this pull request Nov 26, 2025

UPSTREAM PR #17526: ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat auroralabs-loci/llama.cpp#337

Open

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 26, 2025

max-krasnyansky approved these changes Nov 26, 2025

View reviewed changes

max-krasnyansky merged commit 5449367 into ggml-org:master Nov 26, 2025
70 of 74 checks passed

Alcpz deleted the Alcpz/mul_mat_chunk_fix branch November 27, 2025 12:04

am17an pushed a commit to am17an/llama.cpp that referenced this pull request Nov 27, 2025

Fix chunks being too small with small matrix sizes (ggml-org#17526)

7afcb53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #17526

ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #17526

Uh oh!

Alcpz commented Nov 26, 2025 •

edited

Loading

Uh oh!

max-krasnyansky commented Nov 26, 2025

Uh oh!

Uh oh!

Alcpz commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #17526

ggml-cpu: repack: Fix chunks being too small with small matrix shapes in REPACK forward_mul_mat #17526

Uh oh!

Conversation

Alcpz commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

RPI5

M4 max

Uh oh!

max-krasnyansky commented Nov 26, 2025

Uh oh!

Uh oh!

Alcpz commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Alcpz commented Nov 26, 2025 •

edited

Loading