vf clide

vf-clide — the CLI chat client

vf-clide is a lean command-line chat client for the VulkanForge server, shipped alongside the engine since v0.8.0. It is its own crate (GPL-3.0) with no engine dependencies — it talks only to the OpenAI-compatible API over HTTP, so it builds and runs independently of the Vulkan stack.

As of v0.1.0 it is a chat client (streaming/non-streaming, REPL + headless). The agentic loop (tools, file access, memory) is Phase 2.

Full reference — install, flags, limitations, troubleshooting: vf-clide/README.md.

Build

cargo build --release --manifest-path vf-clide/Cargo.toml   # → ./vf-clide/target/release/vf-clide

Two terminals: server, then client

Terminal 1 — start the server (it stays in the foreground; see Usage):

vulkanforge serve --model ~/models/Qwen_Qwen3-14B-Q4_K_M.gguf --port 8080

Terminal 2 — the client:

# interactive REPL (streams live)
vf-clide --url http://localhost:8080

# headless one-shot
vf-clide --url http://localhost:8080 -p "Capital of Japan? One word."

REPL commands

/clear (drop history) · /model <name> (label) · /max-tokens <N> · /think · /no-think · /quit (/q, /exit).

Flags

Flag	Default	Purpose
`-p`, `--prompt`	—	Ask one question, print the answer, exit (headless)
`--url`	`http://localhost:8080`	Server address
`--model`	`Qwen3-14B-Q4_K_M`	Label sent in the request only (the server decides which model answers)
`--max-tokens`	`6144`	Token budget; generous so thinking models have room
`--no-think`	off	Append `/no_think` → answer without the reasoning block
`--no-stream`	off	Full answer instead of streaming
`--temperature`	`0.0`	Sampling temperature
`--project`	—	Project scope (placeholder; no effect yet)

Good to know

One model per server. --model is only a label — what answers is whatever the server loaded. Switch model = restart the server.
Visible markers, not silent failures. If an answer is cut off at the token limit it prints [truncated at the token limit (N) …] on stderr; if a thinking model produces only a <think> block with no visible answer it prints [empty answer — the budget was likely consumed by the <think> block …]. stdout stays clean.
The REPL needs a real terminal (TTY) — not pipe-able. For scripting use the headless -p mode.
Validated through the client across gemma (QAT / Q3 @KV-FP8), Qwen3 (14B / 8B), Llama-3.1-8B, Mistral-7B and DeepSeek-R1-Distill.

See Usage for the server side and Troubleshooting for empty/truncated answers and the context-size ceiling.

VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 · Repository · Releases

VulkanForge Wiki

Get Started

Use VulkanForge

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vf clide

vf-clide — the CLI chat client

Build

Two terminals: server, then client

REPL commands

Flags

Good to know

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VulkanForge Wiki

Clone this wiki locally