-
Notifications
You must be signed in to change notification settings - Fork 1
Hardware and Compatibility
VulkanForge is built and tuned for AMD Radeon RX 9070 XT — gfx1201, RDNA 4. The kernels assume
this architecture's properties:
-
Cooperative matrix (
KHR_coopmat) for flash-attention and GEMM. -
Native FP8 WMMA (
V_WMMA_F32_16X16X16_FP8_FP8) when the driver advertisesshaderFloat8CooperativeMatrix(Mesa 26.1+). - Wave64 subgroups (the row-split flash-attention kernels require exactly 4 subgroups per workgroup at WG=256).
- 16 GB VRAM. Usable budget is roughly ~14.5 GB after driver/runtime overhead; VulkanForge warns when free VRAM drops below a headroom threshold (see Troubleshooting).
Other RDNA 4 cards may work but are untested. Non-RDNA-4 AMD, NVIDIA/CUDA, and Intel are out of scope — this is not a portable cross-hardware engine.
| Mesa version | What you get |
|---|---|
| 26.1+ (recommended) | Default path. Native FP8 WMMA via shaderFloat8CooperativeMatrix. |
| 26.0.6 | Legacy. GGUF + FP8 SafeTensors via the BF16 conversion path (no native FP8 WMMA). |
Vulkan 1.4 loader is required. The backend is compute-only (no swapchain / graphics queues).
VulkanForge is single-stream: no batch inference, no concurrent sessions on one inference instance. This is a deliberate design choice for the local single-user case, not a temporary limitation. If you need batched / concurrent serving, use a batch engine (e.g. vLLM).
See also: Installation · Supported Models · Benchmarks.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases