metal : fix mul-mm condition + fix mul-mv permuted kernels #16494

ggerganov · 2025-10-10T07:28:42Z

From the llama-batched-bench numbers in #16490 (comment) I realized that the wrong Metal kernels were being used for SSM models. Also fix some mv kernels when src0 is permuted.

make -j && ./bin/llama-batched-bench -hf unsloth/granite-4.0-h-micro-GGUF:Q4_0 -c 4096 -b 2048 -ub 512 -npp 512 -ntg 128 -npl 1,2,4 -ngl 99 -fa on

Before

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
512	128	1	640	0.351	1458.49	1.358	94.25	1.709	374.45
512	128	2	1280	0.670	1527.90	3.064	83.54	3.735	342.75
512	128	4	2560	1.347	1520.46	4.693	109.10	6.040	423.86

After

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
512	128	1	640	0.337	1520.61	1.359	94.16	1.696	377.33
512	128	2	1280	0.682	1501.49	1.976	129.57	2.658	481.61
512	128	4	2560	1.347	1520.02	2.895	176.89	4.242	603.51

gabe-l-hart

I've tested this with llama-batched-bench and llama-parallel and confirmed that I see correctly managed parallel requests and the expected speed up. Thank you for this fix!

metal : fix mul-mm condition + fix mul-mv permuted kernels

0b9c1ae

ggerganov requested a review from CISC as a code owner October 10, 2025 07:28

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 10, 2025

ggerganov mentioned this pull request Oct 10, 2025

graph : reuse SSM graphs #16490

Open

gabe-l-hart approved these changes Oct 10, 2025

View reviewed changes

ggerganov merged commit a3cb047 into master Oct 11, 2025
69 checks passed

ggerganov deleted the gg/metal-mul-mat-fixes branch October 11, 2025 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : fix mul-mm condition + fix mul-mv permuted kernels #16494

metal : fix mul-mm condition + fix mul-mv permuted kernels #16494

ggerganov commented Oct 10, 2025

Uh oh!

gabe-l-hart left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metal : fix mul-mm condition + fix mul-mv permuted kernels #16494

metal : fix mul-mm condition + fix mul-mv permuted kernels #16494

Conversation

ggerganov commented Oct 10, 2025

Before

After

Uh oh!

gabe-l-hart left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants