Installation

VulkanForge is built from source. It is a single static binary once compiled.

Prerequisites

GPU: AMD RDNA 4 / gfx1201 (Radeon RX 9070 XT). See Hardware and Compatibility.
Driver: Mesa RADV ≥ 26.1 recommended (native FP8 WMMA via shaderFloat8CooperativeMatrix). Mesa 26.0.6 also works (GGUF + FP8 via the BF16 conversion path, no native FP8 WMMA). Vulkan 1.4 loader + headers.
Toolchain: Rust 1.85+ (edition 2024), Vulkan headers. The optional memory feature (Memory) needs a newer toolchain — Rust 1.89+ (it pulls in the edition-2024 sqlitegraph, which declares rust-version = 1.89; ort declares 1.88).
OS: a current Linux with RADV (Arch / CachyOS and similar).

Build

git clone https://github.com/maeddesg/vulkanforge.git
cd vulkanforge
cargo build --release   # Rust 1.85+, Vulkan headers required

The release binary is at target/release/vulkanforge.

Memory (optional). To compile the server-side memory subsystem in, add the memory feature:

cargo build --release --features memory   # Rust 1.89+ — see Prerequisites

This pulls in two native deps — the ONNX Runtime (downloaded by ort at build time) and bundled SQLite (rusqlite, a C compile) — so the first build takes several extra minutes and the binary grows from ~25 MB to ~58 MB. The default cargo build --release stays lean and pulls in neither. Memory is also off at runtime until you pass serve --memory; see Memory (optional) below and Memory.

The CLI chat & agentic coding client vf-clide is a separate crate (no engine dependencies) — build it on demand:

cargo build --release --manifest-path vf-clide/Cargo.toml   # → ./vf-clide/target/release/vf-clide

Kernel parameter (14B+ models)

For 14B+ models, raise the amdgpu compute timeout on the kernel command line — the default 2 s is too short for long prefill submits and will TDR-reset the GPU:

amdgpu.lockup_timeout=10000,10000

Add it to your bootloader (e.g. GRUB GRUB_CMDLINE_LINUX_DEFAULT), regenerate the config, reboot.

Verify the driver path

Check whether native FP8 WMMA is available on your driver:

vulkaninfo 2>/dev/null | grep shaderFloat8CooperativeMatrix

If present, VulkanForge auto-selects the native FP8 path; otherwise it falls back to the BF16 path.

Verify the install

Run a quick benchmark on a Q4_K_M GGUF:

vulkanforge bench --model ~/models/Qwen3-8B-Q4_K_M.gguf

This enumerates the GPU, loads the model, and prints decode + prefill numbers. For a chat sanity check, see Usage.

Memory (optional)

VulkanForge can keep a persistent, project-scoped semantic memory in the serve process — opt-in, off by default (Memory explains what it is and isn't). Local setup:

Build with the feature: cargo build --release --features memory (Rust 1.89+).
Activate per run: vulkanforge serve --model … --memory (or VULKANFORGE_MEMORY=1). Without it, /memory/* returns 503 and inference runs with no memory overhead (no embedder load, no database opened).
Where it lives: one SQLite file at ~/.vulkanforge/memory.db (override with VF_MEMORY_DB), plus the embedding model cached in the sibling ~/.vulkanforge/embed-cache/.
First start needs network once: the first --memory run downloads the Nomic embedding model (INT8, 768-dim) into embed-cache/; every start afterwards is fully offline.

The embedder runs on the CPU (AVX-512/VNNI fast path on Zen4, MLAS fallback elsewhere) — it uses no VRAM and doesn't touch the GPU budget. See Usage for the /memory/* endpoints, Configuration for the flags, and Hardware and Compatibility for the cost.

More driver / environment detail: see docs/INSTALLATION.md.

VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 · Repository · Releases

VulkanForge Wiki

Get Started

Use VulkanForge

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation

Installation

Prerequisites

Build

Kernel parameter (14B+ models)

Verify the driver path

Verify the install

Memory (optional)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VulkanForge Wiki

Clone this wiki locally