-
Notifications
You must be signed in to change notification settings - Fork 1
vf clide
vf-clide is a lean command-line client for the VulkanForge server, shipped alongside the engine.
It is its own crate (GPL-3.0) with no engine dependencies — it talks only to the
OpenAI-compatible API over HTTP, so it builds and runs independently of the Vulkan stack.
As of v0.9.0 it is both a chat client (streaming/non-streaming, REPL + headless) and an
agentic coding client: in --agent mode the model calls tools (read/write/search/shell) in a
loop, gated by a tiered permission model and confined to a workspace. (Chat-only was v0.8.0.)
Full reference — install, flags, limitations, troubleshooting:
vf-clide/README.md.
cargo build --release --manifest-path vf-clide/Cargo.toml # → ./vf-clide/target/release/vf-clideTerminal 1 — start the server (it stays in the foreground; see Usage):
vulkanforge serve --model ~/models/Qwen_Qwen3-14B-Q4_K_M.gguf --port 8080(gemma-4-26B models require VULKANFORGE_KV_FP8=1 — see Supported-Models.)
Terminal 2 — the client:
# interactive chat REPL (streams live)
vf-clide --url http://localhost:8080
# headless one-shot chat
vf-clide --url http://localhost:8080 -p "Capital of Japan? One word."In agent mode the model may request tools; vf-clide runs the roundtrip (model → permission gate → tool → result back → continue) for up to 8 iterations. Default coder = Qwen3-14B-Q4 (JSON tool arguments).
# headless, read-only tools auto-approved:
vf-clide --url http://localhost:8080 --agent --yes --workspace ~/code/myproj \
-p "Search for 'TODO' and summarise the open items."
# REPL agent mode (prompts y/N per tool call):
vf-clide --url http://localhost:8080 --agent --workspace ~/code/myproj| Tool | Does | Risk tier | Confined? |
|---|---|---|---|
read_file |
read a file (256 KB cap) | ReadOnly | yes (workspace) |
search |
substring search, file:line hits (cap 100 / 64 KB) |
ReadOnly | yes (workspace) |
write_file |
create/overwrite a file (parent dirs in-root) | Mutating | yes (workspace) |
shell |
run a command (cwd = workspace, output cap 256 KB, 30 s timeout) | Exec | no (see below) |
Each tool carries a risk tier. Headless auto-approval is opt-in and cumulative (a higher flag implies the lower tiers):
| Flag | auto-approves |
|---|---|
| (none) | nothing — every tool call is denied |
--yes |
ReadOnly (read_file, search) |
--allow-mutating |
ReadOnly + Mutating (write_file) — implies --yes
|
--allow-shell |
ReadOnly + Mutating + Exec (shell) — implies --allow-mutating
|
--yes alone therefore never approves a write or a shell command. In the REPL every call is
confirmed interactively with y/N instead (a louder warning for mutating/exec tools), so the flags
aren't needed there.
| Flag | Default | Purpose |
|---|---|---|
--agent |
off | enable the agent loop (otherwise plain chat) |
--workspace <path> |
current dir | root for the file tools; canonicalized once |
--yes / --allow-mutating / --allow-shell
|
off | auto-approval tiers (above) |
--system <file> |
— | replace the built-in system prompt (constitution) entirely |
--no-system |
off | send no system prompt |
Without --system/--no-system the agent gets a concise built-in system prompt (role, tool use,
permission respect). An AGENTS.md in the workspace root is appended (project-specific
instructions) — read confined, so an AGENTS.md symlinked out of the workspace is ignored.
/clear (drop history) · /model <name> (label) · /max-tokens <N> · /think · /no-think ·
/quit (/q, /exit).
| Flag | Default | Purpose |
|---|---|---|
-p, --prompt
|
— | Ask one question, print, exit (headless) |
--url |
http://localhost:8080 |
Server address |
--model |
Qwen3-14B-Q4_K_M |
Label sent in the request only (the server decides which model answers) |
--max-tokens |
6144 |
Token budget; generous so thinking models have room |
--no-think |
off | Append /no_think → answer without the reasoning block |
--no-stream |
off | Full answer instead of streaming (chat headless) |
--temperature |
0.0 |
Sampling temperature |
-
shellis NOT confined. cwd is the workspace root, but a command can read anywhere (cat ~/.ssh/id_rsaignores cwd). Its guard is the Exec tier —--allow-shellis the deliberate, loudly-named opt-in (or an interactiveyin the REPL). Use it consciously. - No persistence / no memory. The agent loop and history live only in the session; nothing is written to disk (the memory seam is wired but empty).
-
searchis substring-based (no regex);.git/target/node_modules/… are skipped; results capped at 100 hits / 64 KB. Like the other file tools it is workspace-confined — its recursive walk skips symlinks rather than following them out of the workspace (security fix in v0.9.1; earlier 0.9.0/0.2.0 builds could read through an escaping symlink — update recommended). -
One model per server (
--modelis just a label); context ceiling 16384 tokens on RDNA4/gfx1201; gemma-QAT is VRAM-tight (~2.5k ctx) → Qwen3-14B-Q4 is the better coder. - gemma tool-calling is validated for simple arguments; code-carrying arguments follow.
-
Visible markers, not silent failures —
[truncated …]at the token limit,[empty answer …]for a think-only response; permission decisions are logged to stderr; stdout stays clean. - The REPL needs a real terminal (TTY); for scripting use headless
-p.
See Usage for the server side, Supported-Models for the gemma KV-FP8 requirement, and Troubleshooting for empty/truncated answers and the context ceiling.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases