Skip to content

Hardware and Compatibility

maeddesg edited this page Jun 9, 2026 · 2 revisions

Hardware & Compatibility

Target hardware

VulkanForge is built and tuned for AMD Radeon RX 9070 XTgfx1201, RDNA 4. The kernels assume this architecture's properties:

  • Cooperative matrix (KHR_coopmat) for flash-attention and GEMM.
  • Native FP8 WMMA (V_WMMA_F32_16X16X16_FP8_FP8) when the driver advertises shaderFloat8CooperativeMatrix (Mesa 26.1+).
  • Wave64 subgroups (the row-split flash-attention kernels require exactly 4 subgroups per workgroup at WG=256).
  • 16 GB VRAM. Usable budget is roughly ~14.5 GB after driver/runtime overhead; VulkanForge warns when free VRAM drops below a headroom threshold (see Troubleshooting).

Other RDNA 4 cards may work but are untested. Non-RDNA-4 AMD, NVIDIA/CUDA, and Intel are out of scope — this is not a portable cross-hardware engine.

Driver matrix

Mesa version What you get
26.1+ (recommended) Default path. Native FP8 WMMA via shaderFloat8CooperativeMatrix.
26.0.6 Legacy. GGUF + FP8 SafeTensors via the BF16 conversion path (no native FP8 WMMA).

Vulkan 1.4 loader is required. The backend is compute-only (no swapchain / graphics queues).

Single-user by design

VulkanForge is single-stream: no batch inference, no concurrent sessions on one inference instance. This is a deliberate design choice for the local single-user case, not a temporary limitation. If you need batched / concurrent serving, use a batch engine (e.g. vLLM).

See also: Installation · Supported Models · Benchmarks.

Clone this wiki locally