-
Notifications
You must be signed in to change notification settings - Fork 1
Hardware and Compatibility
VulkanForge is built and tuned for AMD Radeon RX 9070 XT — gfx1201, RDNA 4. The kernels assume
this architecture's properties:
-
Cooperative matrix (
KHR_coopmat) for flash-attention and GEMM. -
Native FP8 WMMA (
V_WMMA_F32_16X16X16_FP8_FP8) when the driver advertisesshaderFloat8CooperativeMatrix(Mesa 26.1+). - Wave64 subgroups (the row-split flash-attention kernels require exactly 4 subgroups per workgroup at WG=256).
- 16 GB VRAM. Usable budget is roughly ~14.5 GB after driver/runtime overhead; VulkanForge warns when free VRAM drops below a headroom threshold (see Troubleshooting).
Other RDNA 4 cards may work but are untested. Non-RDNA-4 AMD, NVIDIA/CUDA, and Intel are out of scope — this is not a portable cross-hardware engine.
| Mesa version | What you get |
|---|---|
| 26.1+ (recommended) | Default path. Native FP8 WMMA via shaderFloat8CooperativeMatrix. |
| 26.0.6 | Legacy. GGUF + FP8 SafeTensors via the BF16 conversion path (no native FP8 WMMA). |
Vulkan 1.4 loader is required. The backend is compute-only (no swapchain / graphics queues).
VulkanForge is single-stream: no batch inference, no concurrent sessions on one inference instance. This is a deliberate design choice for the local single-user case, not a temporary limitation. If you need batched / concurrent serving, use a batch engine (e.g. vLLM).
The server-side memory subsystem (Memory) is opt-in and runs entirely on the CPU — it does not use
the GPU or VRAM, and it never touches the ~14.5 GB inference budget. The embedder (Nomic-Embed v1.5-Q, 768-dim,
INT8) runs through ONNX Runtime: on Zen4 it takes the AVX-512/VNNI fast path and falls back to ONNX Runtime's
portable MLAS kernels on any other x86-64 CPU (VNNI is the fast path, not the only one). Its cost is
therefore CPU time + a little RAM (the model + the vector index) + disk (~/.vulkanforge/memory.db) — no VRAM
contention with the model. See Installation to enable it and Configuration for the flags.
See also: Installation · Supported Models · Benchmarks · Memory.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases