-
Notifications
You must be signed in to change notification settings - Fork 1
Troubleshooting
The default amdgpu compute timeout (2 s) is too short for long prefill submits. Set
amdgpu.lockup_timeout=10000,10000 on the kernel command line (bootloader), regenerate config,
reboot. See Installation.
The card has 16 GB; usable budget is roughly ~14.5 GB after overhead. VulkanForge prints a VRAM
budget and warns when free VRAM drops below the headroom threshold (VF_VRAM_HEADROOM_GIB, default
1.0). Options when a model is tight:
-
Gemma-4-26B-A4B: set
VULKANFORGE_KV_FP8=1(halves KV-cache VRAM; recommended for the 26B MoE). -
14B FP8 / multiple sessions:
VF_CPU_LM_HEAD=1frees ~970 MB by moving the vocab projection to the CPU (on 14B FP8 it's also +32 % decode). - Use a smaller quant (Q3_K_M vs Q4_K_M) — see Supported Models.
The Gemma-4-26B-A4B MoE only fits comfortably in 16 GB with VULKANFORGE_KV_FP8=1. Without it the KV
cache may push you over the budget at larger context sizes. It is value-preserving.
DeepSeek-R1-Distill emits <think>…</think> reasoning before its answer. With a small --max-tokens,
the visible output can still be inside the <think> block (the answer comes after). Raise
--max-tokens, or use --no-think-filter / VF_NO_THINK_FILTER=1 to see the raw stream. This is a
prompting/harness consideration, not a bug.
v0.7.0's batched MoE router is llama-aligned and value-preserving on factual/structural output, but a
borderline top-k expert flip can make a free-form generation tail phrase differently than the
pre-v0.7.0 per-token router. To reproduce the exact older routing, set VF_MOE_ROUTER_BATCHED=0.
See Configuration.
Native FP8 WMMA is capability-driven. Check:
vulkaninfo 2>/dev/null | grep shaderFloat8CooperativeMatrixIf absent (e.g. Mesa 26.0.x), VulkanForge uses the BF16 conversion fallback — correct, just slower on FP8 prefill. Upgrade to Mesa 26.1+ for the native path.
vulkanforge bench accepts Q4_K_M GGUF. Q8_0 loads in chat but is rejected by bench. Use a
Q4_K_M GGUF for benchmarking.
See also Installation · Hardware and Compatibility · Configuration.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases