Skip to content
maeddesg edited this page Jun 15, 2026 · 4 revisions

Installation

VulkanForge is built from source. It is a single static binary once compiled.

Prerequisites

  • GPU: AMD RDNA 4 / gfx1201 (Radeon RX 9070 XT). See Hardware and Compatibility.
  • Driver: Mesa RADV ≥ 26.1 recommended (native FP8 WMMA via shaderFloat8CooperativeMatrix). Mesa 26.0.6 also works (GGUF + FP8 via the BF16 conversion path, no native FP8 WMMA). Vulkan 1.4 loader + headers.
  • Toolchain: Rust 1.85+ (edition 2024), Vulkan headers. The optional memory feature (Memory) needs a newer toolchain — Rust 1.89+ (it pulls in the edition-2024 sqlitegraph, which declares rust-version = 1.89; ort declares 1.88).
  • OS: a current Linux with RADV (Arch / CachyOS and similar).

Build

git clone https://github.com/maeddesg/vulkanforge.git
cd vulkanforge
cargo build --release   # Rust 1.85+, Vulkan headers required

The release binary is at target/release/vulkanforge.

Memory (optional). To compile the server-side memory subsystem in, add the memory feature:

cargo build --release --features memory   # Rust 1.89+ — see Prerequisites

This pulls in two native deps — the ONNX Runtime (downloaded by ort at build time) and bundled SQLite (rusqlite, a C compile) — so the first build takes several extra minutes and the binary grows from ~25 MB to ~58 MB. The default cargo build --release stays lean and pulls in neither. Memory is also off at runtime until you pass serve --memory; see Memory (optional) below and Memory.

The CLI chat & agentic coding client vf-clide is a separate crate (no engine dependencies) — build it on demand:

cargo build --release --manifest-path vf-clide/Cargo.toml   # → ./vf-clide/target/release/vf-clide

Kernel parameter (14B+ models)

For 14B+ models, raise the amdgpu compute timeout on the kernel command line — the default 2 s is too short for long prefill submits and will TDR-reset the GPU:

amdgpu.lockup_timeout=10000,10000

Add it to your bootloader (e.g. GRUB GRUB_CMDLINE_LINUX_DEFAULT), regenerate the config, reboot.

Verify the driver path

Check whether native FP8 WMMA is available on your driver:

vulkaninfo 2>/dev/null | grep shaderFloat8CooperativeMatrix

If present, VulkanForge auto-selects the native FP8 path; otherwise it falls back to the BF16 path.

Verify the install

Run a quick benchmark on a Q4_K_M GGUF:

vulkanforge bench --model ~/models/Qwen3-8B-Q4_K_M.gguf

This enumerates the GPU, loads the model, and prints decode + prefill numbers. For a chat sanity check, see Usage.

Memory (optional)

VulkanForge can keep a persistent, project-scoped semantic memory in the serve process — opt-in, off by default (Memory explains what it is and isn't). Local setup:

  1. Build with the feature: cargo build --release --features memory (Rust 1.89+).
  2. Activate per run: vulkanforge serve --model … --memory (or VULKANFORGE_MEMORY=1). Without it, /memory/* returns 503 and inference runs with no memory overhead (no embedder load, no database opened).
  3. Where it lives: one SQLite file at ~/.vulkanforge/memory.db (override with VF_MEMORY_DB), plus the embedding model cached in the sibling ~/.vulkanforge/embed-cache/.
  4. First start needs network once: the first --memory run downloads the Nomic embedding model (INT8, 768-dim) into embed-cache/; every start afterwards is fully offline.

The embedder runs on the CPU (AVX-512/VNNI fast path on Zen4, MLAS fallback elsewhere) — it uses no VRAM and doesn't touch the GPU budget. See Usage for the /memory/* endpoints, Configuration for the flags, and Hardware and Compatibility for the cost.

More driver / environment detail: see docs/INSTALLATION.md.

Clone this wiki locally