Skip to content

vulkan: tune MMVQ for Intel Windows#19988

Merged
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-intel-windows-mmvq-tune
Mar 2, 2026
Merged

vulkan: tune MMVQ for Intel Windows#19988
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-intel-windows-mmvq-tune

Conversation

@0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Feb 28, 2026

Tune MMVQ use for Intel Windows according to #17628 (comment)

@savvadesogle Please try it and see if performance is good.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 28, 2026
@savvadesogle
Copy link

@0cc4m omg, omg! Of course!! Thank you. I will try tomorrow ♥️

@savvadesogle
Copy link

@0cc4m Hello. Done!

изображение

And

изображение

qwen3_8b_benchmark_results.csv

@0cc4m
Copy link
Collaborator Author

0cc4m commented Mar 1, 2026

Is that good performance? Please compare it to master.

@characharm
Copy link
Contributor

master

model size params backend ngl fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp512 963.70 ± 5.96
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 tg128 52.15 ± 0.08
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp512 926.94 ± 6.03
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 tg128 51.50 ± 0.02

pr

model size params backend ngl fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp512 970.71 ± 10.03
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 tg128 60.53 ± 0.08
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp512 924.87 ± 3.88
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 tg128 59.50 ± 0.10

@0cc4m
Copy link
Collaborator Author

0cc4m commented Mar 2, 2026

Great, thank you!

@0cc4m 0cc4m requested a review from jeffbolznv March 2, 2026 08:59
@0cc4m 0cc4m merged commit feefb92 into master Mar 2, 2026
77 of 78 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-intel-windows-mmvq-tune branch March 2, 2026 14:58
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
@Cath0deRay
Copy link

I am seeing some significant improvements in token generation speeds on Arrow Lake H, excellent work, many thanks for all your efforts !

Benchmark Model: .\GGUF\qwen\Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
qwen3moe 30B.A3B Q4_K - Medium 17.28 GiB 30.53 B Vulkan 99 1 pp512 222.90 ± 0.37
qwen3moe 30B.A3B Q4_K - Medium 17.28 GiB 30.53 B Vulkan 99 1 tg128 18.28 ± 0.10

build: 2afcdb9 (8185)

Benchmark Model: .\GGUF\qwen\Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
qwen3moe 30B.A3B Q4_K - Medium 17.28 GiB 30.53 B Vulkan 99 1 pp512 223.78 ± 0.21
qwen3moe 30B.A3B Q4_K - Medium 17.28 GiB 30.53 B Vulkan 99 1 tg128 22.37 ± 0.43

build: feefb92 (8187)


Benchmark Model: .\GGUF\Qwen3.5-35B-A3B-UD-Q5_K_XL.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
qwen35moe ?B Q8_0 23.21 GiB 34.66 B Vulkan 99 1 pp512 165.12 ± 0.88
qwen35moe ?B Q8_0 23.21 GiB 34.66 B Vulkan 99 1 tg128 11.57 ± 0.06

build: 2afcdb9 (8185)

Benchmark Model: .\GGUF\Qwen3.5-35B-A3B-UD-Q5_K_XL.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
qwen35moe ?B Q8_0 23.21 GiB 34.66 B Vulkan 99 1 pp512 165.68 ± 0.37
qwen35moe ?B Q8_0 23.21 GiB 34.66 B Vulkan 99 1 tg128 12.76 ± 0.00

build: feefb92 (8187)


Benchmark Model: .\GGUF\qwen\Qwen3.5-27B-Q4_K_M.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
qwen35 ?B Q4_K - Medium 15.58 GiB 26.90 B Vulkan 99 1 pp512 52.51 ± 0.61
qwen35 ?B Q4_K - Medium 15.58 GiB 26.90 B Vulkan 99 1 tg128 2.38 ± 0.02

build: 2afcdb9 (8185)

Benchmark Model: .\GGUF\qwen\Qwen3.5-27B-Q4_K_M.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
qwen35 ?B Q4_K - Medium 15.58 GiB 26.90 B Vulkan 99 1 pp512 52.84 ± 0.18
qwen35 ?B Q4_K - Medium 15.58 GiB 26.90 B Vulkan 99 1 tg128 3.07 ± 0.00

build: feefb92 (8187)


Benchmark Model: .\GGUF\GLM-4.7-Flash-Q4_K_M.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8185\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B Vulkan 99 1 pp512 161.99 ± 21.63
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B Vulkan 99 1 tg128 13.21 ± 1.55

build: 2afcdb9 (8185)

Benchmark Model: .\GGUF\GLM-4.7-Flash-Q4_K_M.gguf

load_backend: loaded RPC backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) 130T GPU (32GB) (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\AI\bin\llamacpp\Vulkan\8187\ggml-cpu-alderlake.dll

model size params backend ngl mmap test t/s
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B Vulkan 99 1 pp512 175.88 ± 0.72
deepseek2 30B.A3B Q4_K - Medium 17.05 GiB 29.94 B Vulkan 99 1 tg128 17.31 ± 0.02

build: feefb92 (8187)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants