-
Notifications
You must be signed in to change notification settings - Fork 1
vf clide
vf-clide is a lean command-line chat client for the VulkanForge server, shipped alongside the
engine since v0.8.0. It is its own crate (GPL-3.0) with no engine dependencies — it talks
only to the OpenAI-compatible API over HTTP, so it builds and runs independently of the Vulkan stack.
As of v0.1.0 it is a chat client (streaming/non-streaming, REPL + headless). The agentic loop (tools, file access, memory) is Phase 2.
Full reference — install, flags, limitations, troubleshooting:
vf-clide/README.md.
cargo build --release --manifest-path vf-clide/Cargo.toml # → ./vf-clide/target/release/vf-clideTerminal 1 — start the server (it stays in the foreground; see Usage):
vulkanforge serve --model ~/models/Qwen_Qwen3-14B-Q4_K_M.gguf --port 8080Terminal 2 — the client:
# interactive REPL (streams live)
vf-clide --url http://localhost:8080
# headless one-shot
vf-clide --url http://localhost:8080 -p "Capital of Japan? One word."/clear (drop history) · /model <name> (label) · /max-tokens <N> · /think · /no-think ·
/quit (/q, /exit).
| Flag | Default | Purpose |
|---|---|---|
-p, --prompt
|
— | Ask one question, print the answer, exit (headless) |
--url |
http://localhost:8080 |
Server address |
--model |
Qwen3-14B-Q4_K_M |
Label sent in the request only (the server decides which model answers) |
--max-tokens |
6144 |
Token budget; generous so thinking models have room |
--no-think |
off | Append /no_think → answer without the reasoning block |
--no-stream |
off | Full answer instead of streaming |
--temperature |
0.0 |
Sampling temperature |
--project |
— | Project scope (placeholder; no effect yet) |
-
One model per server.
--modelis only a label — what answers is whatever the server loaded. Switch model = restart the server. -
Visible markers, not silent failures. If an answer is cut off at the token limit it prints
[truncated at the token limit (N) …]on stderr; if a thinking model produces only a<think>block with no visible answer it prints[empty answer — the budget was likely consumed by the <think> block …]. stdout stays clean. -
The REPL needs a real terminal (TTY) — not pipe-able. For scripting use the headless
-pmode. - Validated through the client across gemma (QAT / Q3 @KV-FP8), Qwen3 (14B / 8B), Llama-3.1-8B, Mistral-7B and DeepSeek-R1-Distill.
See Usage for the server side and Troubleshooting for empty/truncated answers and the context-size ceiling.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases