Vulkan MGPU on Arc A770: -ub and -b must be re-tuned when GPU count changes #21258

Basten7 · 2026-04-01T11:25:00Z

Basten7
Apr 1, 2026

I ran a set of llama-bench experiments on Ubuntu 24.04.04 with Intel Arc A770 GPUs using the Vulkan backend in llama.cpp build 8502. The main conclusion is simple:
-ub and -b should not be treated as fixed values when changing the number of GPUs.

They are part of the MGPU tuning strategy, especially for prompt processing (PP).

Baseline:

GGML_VK_VISIBLE_DEVICES=0,1 ./llama-bench -m '/Works/llama-2-7b.Q4_0.gguf'-mg 0 -sm none,layer -fa 0 -n 64 -p 1024
load_backend: loaded RPC backend from /home/tto/Works/llama-b8502/libggml-rpc.so
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) A770 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
ggml_vulkan: 1 = Intel(R) Arc(tm) A770 Graphics (DG2) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from /home/tto/Works/llama-b8502/libggml-vulkan.so
load_backend: loaded CPU backend from /home/tto/Works/llama-b8502/libggml-cpu-sapphirerapids.so

model	size	params	backend	ngl	sm	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	none	pp1024	1149.29 ± 0.35
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	none	tg64	47.54 ± 0.08
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	layer	pp1024	1423.65 ± 22.50
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	layer	tg64	31.91 ± 4.94

-ub and -b Tunning

GGML_VK_VISIBLE_DEVICES=0,1 ./llama-bench -m '/Works/llama-2-7b.Q4_0.gguf' -mg 0 -sm none,layer -fa 0,1 -p 1024 -n 0 -ub 128 -b 256,512,1024,2048,4096

model	size	params	backend	ngl	n_batch	n_ubatch	sm	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	1024	64	none	pp1024	757.52 ± 0.30
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	1024	128	none	pp1024	1020.33 ± 0.69
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	1024	256	none	pp1024	1170.52 ± 1.28
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	2048	64	none	pp1024	753.34 ± 1.59
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	2048	128	none	pp1024	1016.69 ± 0.51
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	2048	256	none	pp1024	1173.99 ± 0.66
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	4096	64	none	pp1024	754.81 ± 0.76
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	4096	128	none	pp1024	1018.50 ± 0.58
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	4096	256	none	pp1024	1168.58 ± 0.47
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	1024	64	layer	pp1024	1244.46 ± 5.60
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	1024	128	layer	pp1024	1648.36 ± 1.61
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	1024	256	layer	pp1024	1725.74 ± 11.40
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	2048	64	layer	pp1024	1251.63 ± 4.32
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	2048	128	layer	pp1024	1641.82 ± 1.67
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	2048	256	layer	pp1024	1738.93 ± 9.56
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	4096	64	layer	pp1024	1251.57 ± 1.82
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	4096	128	layer	pp1024	1644.93 ± 1.26
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan	99	4096	256	layer	pp1024	1732.54 ± 5.80
build: `2d2d9c2` (8502)

Final conclusion

These tests suggest three practical rules for llama.cpp Vulkan MGPU on Arc A770:
Do not reuse -ub and -b blindly when changing the GPU count.
For PP, retune -ub first, then refine -b.
Evaluate PP and TG separately, because the best PP settings may not help TG at all.
On this setup, the best 2-GPU PP result in the tested range was approximately:

GGML_VK_VISIBLE_DEVICES=0,1
./llama-bench
-m /home/tto/Works/llama-2-7b.Q4_0.gguf
-mg 0 -sm layer -fa 0 -p 1024 -n 64
-ub 256 -b 2048
with:
pp1024 = 1739.12 t/s
tg64 = 31.29 t/s

In MGPU mode, -ub and -b are not secondary knobs — they are part of the scaling strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan MGPU on Arc A770: -ub and -b must be re-tuned when GPU count changes #21258

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Vulkan MGPU on Arc A770: -ub and -b must be re-tuned when GPU count changes #21258

Uh oh!

Basten7 Apr 1, 2026

Replies: 0 comments

Basten7
Apr 1, 2026