-
Notifications
You must be signed in to change notification settings - Fork 1
Installation
VulkanForge is built from source. It is a single static binary once compiled.
-
GPU: AMD RDNA 4 /
gfx1201(Radeon RX 9070 XT). See Hardware and Compatibility. -
Driver: Mesa RADV ≥ 26.1 recommended (native FP8 WMMA via
shaderFloat8CooperativeMatrix). Mesa 26.0.6 also works (GGUF + FP8 via the BF16 conversion path, no native FP8 WMMA). Vulkan 1.4 loader + headers. -
Toolchain: Rust 1.85+ (edition 2024), Vulkan headers. The optional memory feature (Memory)
needs a newer toolchain — Rust 1.89+ (it pulls in the edition-2024
sqlitegraph, which declaresrust-version = 1.89;ortdeclares 1.88). - OS: a current Linux with RADV (Arch / CachyOS and similar).
git clone https://github.com/maeddesg/vulkanforge.git
cd vulkanforge
cargo build --release # Rust 1.85+, Vulkan headers requiredThe release binary is at target/release/vulkanforge.
Memory (optional). To compile the server-side memory subsystem in, add the memory feature:
cargo build --release --features memory # Rust 1.89+ — see PrerequisitesThis pulls in two native deps — the ONNX Runtime (downloaded by ort at build time) and bundled SQLite
(rusqlite, a C compile) — so the first build takes several extra minutes and the binary grows from ~25 MB to
~58 MB. The default cargo build --release stays lean and pulls in neither. Memory is also off at runtime until
you pass serve --memory; see Memory (optional) below and Memory.
The CLI chat & agentic coding client vf-clide is a separate crate (no engine dependencies) — build it on demand:
cargo build --release --manifest-path vf-clide/Cargo.toml # → ./vf-clide/target/release/vf-clideFor 14B+ models, raise the amdgpu compute timeout on the kernel command line — the default 2 s is too short for long prefill submits and will TDR-reset the GPU:
amdgpu.lockup_timeout=10000,10000
Add it to your bootloader (e.g. GRUB GRUB_CMDLINE_LINUX_DEFAULT), regenerate the config, reboot.
Check whether native FP8 WMMA is available on your driver:
vulkaninfo 2>/dev/null | grep shaderFloat8CooperativeMatrixIf present, VulkanForge auto-selects the native FP8 path; otherwise it falls back to the BF16 path.
Run a quick benchmark on a Q4_K_M GGUF:
vulkanforge bench --model ~/models/Qwen3-8B-Q4_K_M.ggufThis enumerates the GPU, loads the model, and prints decode + prefill numbers. For a chat sanity check, see Usage.
VulkanForge can keep a persistent, project-scoped semantic memory in the serve process — opt-in, off by
default (Memory explains what it is and isn't). Local setup:
-
Build with the feature:
cargo build --release --features memory(Rust 1.89+). -
Activate per run:
vulkanforge serve --model … --memory(orVULKANFORGE_MEMORY=1). Without it,/memory/*returns503and inference runs with no memory overhead (no embedder load, no database opened). -
Where it lives: one SQLite file at
~/.vulkanforge/memory.db(override withVF_MEMORY_DB), plus the embedding model cached in the sibling~/.vulkanforge/embed-cache/. -
First start needs network once: the first
--memoryrun downloads the Nomic embedding model (INT8, 768-dim) intoembed-cache/; every start afterwards is fully offline.
The embedder runs on the CPU (AVX-512/VNNI fast path on Zen4, MLAS fallback elsewhere) — it uses no VRAM and
doesn't touch the GPU budget. See Usage for the /memory/* endpoints, Configuration for the flags, and
Hardware and Compatibility for the cost.
More driver / environment detail: see docs/INSTALLATION.md.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases