Hardware and Compatibility

Hardware & Compatibility

Target hardware

VulkanForge is built and tuned for AMD Radeon RX 9070 XT — gfx1201, RDNA 4. The kernels assume this architecture's properties:

Cooperative matrix (KHR_coopmat) for flash-attention and GEMM.
Native FP8 WMMA (V_WMMA_F32_16X16X16_FP8_FP8) when the driver advertises shaderFloat8CooperativeMatrix (Mesa 26.1+).
Wave64 subgroups (the row-split flash-attention kernels require exactly 4 subgroups per workgroup at WG=256).
16 GB VRAM. Usable budget is roughly ~14.5 GB after driver/runtime overhead; VulkanForge warns when free VRAM drops below a headroom threshold (see Troubleshooting).

Other RDNA 4 cards may work but are untested. Non-RDNA-4 AMD, NVIDIA/CUDA, and Intel are out of scope — this is not a portable cross-hardware engine.

Driver matrix

Mesa version	What you get
26.1+ (recommended)	Default path. Native FP8 WMMA via `shaderFloat8CooperativeMatrix`.
26.0.6	Legacy. GGUF + FP8 SafeTensors via the BF16 conversion path (no native FP8 WMMA).

Vulkan 1.4 loader is required. The backend is compute-only (no swapchain / graphics queues).

Single-user by design

VulkanForge is single-stream: no batch inference, no concurrent sessions on one inference instance. This is a deliberate design choice for the local single-user case, not a temporary limitation. If you need batched / concurrent serving, use a batch engine (e.g. vLLM).

Memory (optional) — CPU, not VRAM

The server-side memory subsystem (Memory) is opt-in and runs entirely on the CPU — it does not use the GPU or VRAM, and it never touches the ~14.5 GB inference budget. The embedder (Nomic-Embed v1.5-Q, 768-dim, INT8) runs through ONNX Runtime: on Zen4 it takes the AVX-512/VNNI fast path and falls back to ONNX Runtime's portable MLAS kernels on any other x86-64 CPU (VNNI is the fast path, not the only one). Its cost is therefore CPU time + a little RAM (the model + the vector index) + disk (~/.vulkanforge/memory.db) — no VRAM contention with the model. See Installation to enable it and Configuration for the flags.

See also: Installation · Supported Models · Benchmarks · Memory.

VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 · Repository · Releases

VulkanForge Wiki

Get Started

Use VulkanForge

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware and Compatibility

Hardware & Compatibility

Target hardware

Driver matrix

Single-user by design

Memory (optional) — CPU, not VRAM

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VulkanForge Wiki

Clone this wiki locally