Home

VulkanForge

LLM inference engine for AMD RDNA 4 GPUs. Pure Rust + Vulkan compute shaders, ~14 MB static binary, no runtime dependencies beyond the system Vulkan loader. It is the first engine doing native FP8 WMMA over Vulkan on consumer AMD hardware (V_WMMA_F32_16X16X16_FP8_FP8 via Mesa 26.1+ shaderFloat8CooperativeMatrix).

This wiki documents the shipped v0.7.0 reality. It complements — does not replace — the README and CHANGELOG.

Who it is for — and who it is not for

VulkanForge is a single-user, RDNA 4 / gfx1201-specific Vulkan inference engine. It targets one GPU (Radeon RX 9070 XT) running one request at a time, and it is tuned for that case.

A good fit if you own an RDNA 4 card, run single-user chat / single-stream inference locally on Linux + Mesa RADV, and want a tiny self-contained binary with native FP8.
Not a fit if you need batch serving / concurrent sessions, multi-GPU, NVIDIA/CUDA, or a general cross-hardware llama.cpp replacement. For batch throughput, vLLM is the right tool.

v0.7.0 — Prefill Parity

As of v0.7.0, prefill reaches parity with llama.cpp's Vulkan backend on dense models, and the Gemma-4 MoE prefill gap is largely closed — decode is unchanged. Measured same-run vs llama.cpp Vulkan (RX 9070 XT, RADV Mesa 26.1.2):

Dense prefill (Qwen3-8B / Llama-3.1-8B / Mistral-7B / DeepSeek-R1-8B) @p2048: 0.96–1.04× llama (parity — Mistral ahead).
Gemma-4-26B-A4B MoE prefill @p2048: Q3_K_M 0.89× · QAT-Q4_0 0.83×.
Decode: 0.87–0.97× llama (unchanged).

Full table + conditions on Benchmarks.

Quick links

Get started: Installation · Hardware and Compatibility
Use it: Supported Models · Usage · Configuration
Reference: Benchmarks · Architecture · Troubleshooting

License

GPL-3.0. VulkanForge builds on the foundational work of oldnordic/ROCmForge (model loader, GGUF parser, CPU path, overall architecture). See Architecture for full attribution.

VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 · Repository · Releases

VulkanForge Wiki

Get Started

Use VulkanForge

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

VulkanForge

Who it is for — and who it is not for

v0.7.0 — Prefill Parity

Quick links

License

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VulkanForge Wiki

Clone this wiki locally