Skip to content

Hardware and Compatibility

maeddesg edited this page Jun 15, 2026 · 2 revisions

Hardware & Compatibility

Target hardware

VulkanForge is built and tuned for AMD Radeon RX 9070 XTgfx1201, RDNA 4. The kernels assume this architecture's properties:

  • Cooperative matrix (KHR_coopmat) for flash-attention and GEMM.
  • Native FP8 WMMA (V_WMMA_F32_16X16X16_FP8_FP8) when the driver advertises shaderFloat8CooperativeMatrix (Mesa 26.1+).
  • Wave64 subgroups (the row-split flash-attention kernels require exactly 4 subgroups per workgroup at WG=256).
  • 16 GB VRAM. Usable budget is roughly ~14.5 GB after driver/runtime overhead; VulkanForge warns when free VRAM drops below a headroom threshold (see Troubleshooting).

Other RDNA 4 cards may work but are untested. Non-RDNA-4 AMD, NVIDIA/CUDA, and Intel are out of scope — this is not a portable cross-hardware engine.

Driver matrix

Mesa version What you get
26.1+ (recommended) Default path. Native FP8 WMMA via shaderFloat8CooperativeMatrix.
26.0.6 Legacy. GGUF + FP8 SafeTensors via the BF16 conversion path (no native FP8 WMMA).

Vulkan 1.4 loader is required. The backend is compute-only (no swapchain / graphics queues).

Single-user by design

VulkanForge is single-stream: no batch inference, no concurrent sessions on one inference instance. This is a deliberate design choice for the local single-user case, not a temporary limitation. If you need batched / concurrent serving, use a batch engine (e.g. vLLM).

Memory (optional) — CPU, not VRAM

The server-side memory subsystem (Memory) is opt-in and runs entirely on the CPU — it does not use the GPU or VRAM, and it never touches the ~14.5 GB inference budget. The embedder (Nomic-Embed v1.5-Q, 768-dim, INT8) runs through ONNX Runtime: on Zen4 it takes the AVX-512/VNNI fast path and falls back to ONNX Runtime's portable MLAS kernels on any other x86-64 CPU (VNNI is the fast path, not the only one). Its cost is therefore CPU time + a little RAM (the model + the vector index) + disk (~/.vulkanforge/memory.db) — no VRAM contention with the model. See Installation to enable it and Configuration for the flags.

See also: Installation · Supported Models · Benchmarks · Memory.

Clone this wiki locally