Hardware and Compatibility

Hardware & Compatibility

Target hardware

VulkanForge is built and tuned for AMD Radeon RX 9070 XT — gfx1201, RDNA 4. The kernels assume this architecture's properties:

Cooperative matrix (KHR_coopmat) for flash-attention and GEMM.
Native FP8 WMMA (V_WMMA_F32_16X16X16_FP8_FP8) when the driver advertises shaderFloat8CooperativeMatrix (Mesa 26.1+).
Wave64 subgroups (the row-split flash-attention kernels require exactly 4 subgroups per workgroup at WG=256).
16 GB VRAM. Usable budget is roughly ~14.5 GB after driver/runtime overhead; VulkanForge warns when free VRAM drops below a headroom threshold (see Troubleshooting).

Other RDNA 4 cards may work but are untested. Non-RDNA-4 AMD, NVIDIA/CUDA, and Intel are out of scope — this is not a portable cross-hardware engine.

Driver matrix

Mesa version	What you get
26.1+ (recommended)	Default path. Native FP8 WMMA via `shaderFloat8CooperativeMatrix`.
26.0.6	Legacy. GGUF + FP8 SafeTensors via the BF16 conversion path (no native FP8 WMMA).

Vulkan 1.4 loader is required. The backend is compute-only (no swapchain / graphics queues).

Single-user by design

VulkanForge is single-stream: no batch inference, no concurrent sessions on one inference instance. This is a deliberate design choice for the local single-user case, not a temporary limitation. If you need batched / concurrent serving, use a batch engine (e.g. vLLM).

See also: Installation · Supported Models · Benchmarks.

VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 · Repository · Releases

VulkanForge Wiki

Get Started

Use VulkanForge

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware and Compatibility

Hardware & Compatibility

Target hardware

Driver matrix

Single-user by design

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VulkanForge Wiki

Clone this wiki locally