Skip to content
maeddesg edited this page Jun 13, 2026 · 8 revisions

vf-clide — the CLI chat & agentic coding client

vf-clide is a lean command-line client for the VulkanForge server, shipped alongside the engine. It is its own crate (GPL-3.0) with no engine dependencies — it talks only to the OpenAI-compatible API over HTTP, so it builds and runs independently of the Vulkan stack.

As of v0.9.0 it is both a chat client (streaming/non-streaming, REPL + headless) and an agentic coding client: in --agent mode the model calls tools (read/write/search/shell) in a loop, gated by a tiered permission model and confined to a workspace. (Chat-only was v0.8.0.)

Full reference — install, flags, limitations, troubleshooting: vf-clide/README.md.

Build

cargo build --release --manifest-path vf-clide/Cargo.toml   # → ./vf-clide/target/release/vf-clide

Two terminals: server, then client

Terminal 1 — start the server (it stays in the foreground; see Usage):

vulkanforge serve --model ~/models/Qwen_Qwen3-14B-Q4_K_M.gguf --port 8080

(gemma-4-26B models require VULKANFORGE_KV_FP8=1 — see Supported-Models.)

Terminal 2 — the client:

# interactive chat REPL (streams live)
vf-clide --url http://localhost:8080

# headless one-shot chat
vf-clide --url http://localhost:8080 -p "Capital of Japan? One word."

Agent mode (--agent)

In agent mode the model may request tools; vf-clide runs the roundtrip (model → permission gate → tool → result back → continue) for up to 8 iterations. Default coder = Qwen3-14B-Q4 (JSON tool arguments).

# headless, read-only tools auto-approved:
vf-clide --url http://localhost:8080 --agent --yes --workspace ~/code/myproj \
  -p "Search for 'TODO' and summarise the open items."

# REPL agent mode (prompts y/N per tool call):
vf-clide --url http://localhost:8080 --agent --workspace ~/code/myproj

The four tools

Tool Does Risk tier Confined?
read_file read a file (256 KB cap) ReadOnly yes (workspace)
search substring search, file:line hits (cap 100 / 64 KB) ReadOnly yes (workspace)
write_file create/overwrite a file (parent dirs in-root) Mutating yes (workspace)
shell run a command (cwd = workspace, output cap 256 KB, 30 s timeout) Exec no (see below)

Permission model — three tiers, cumulative

Each tool carries a risk tier. Headless auto-approval is opt-in and cumulative (a higher flag implies the lower tiers):

Flag auto-approves
(none) nothing — every tool call is denied
--yes ReadOnly (read_file, search)
--allow-mutating ReadOnly + Mutating (write_file) — implies --yes
--allow-shell ReadOnly + Mutating + Exec (shell) — implies --allow-mutating

--yes alone therefore never approves a write or a shell command. In the REPL every call is confirmed interactively with y/N instead (a louder warning for mutating/exec tools), so the flags aren't needed there.

Workspace & constitution

Flag Default Purpose
--agent off enable the agent loop (otherwise plain chat)
--workspace <path> current dir root for the file tools; canonicalized once
--yes / --allow-mutating / --allow-shell off auto-approval tiers (above)
--system <file> replace the built-in system prompt (constitution) entirely
--no-system off send no system prompt

Without --system/--no-system the agent gets a concise built-in system prompt (role, tool use, permission respect). An AGENTS.md in the workspace root is appended (project-specific instructions) — read confined, so an AGENTS.md symlinked out of the workspace is ignored.

REPL commands

/clear (drop history) · /model <name> (label) · /max-tokens <N> · /think · /no-think · /quit (/q, /exit).

Common flags (chat + agent)

Flag Default Purpose
-p, --prompt Ask one question, print, exit (headless)
--url http://localhost:8080 Server address
--model Qwen3-14B-Q4_K_M Label sent in the request only (the server decides which model answers)
--max-tokens 6144 Token budget; generous so thinking models have room
--no-think off Append /no_think → answer without the reasoning block
--no-stream off Full answer instead of streaming (chat headless)
--temperature 0.0 Sampling temperature

Limitations

  • shell is NOT confined. cwd is the workspace root, but a command can read anywhere (cat ~/.ssh/id_rsa ignores cwd). Its guard is the Exec tier--allow-shell is the deliberate, loudly-named opt-in (or an interactive y in the REPL). Use it consciously.
  • No persistence / no memory. The agent loop and history live only in the session; nothing is written to disk (the memory seam is wired but empty).
  • search is substring-based (no regex); .git/target/node_modules/… are skipped; results capped at 100 hits / 64 KB. Like the other file tools it is workspace-confined — its recursive walk skips symlinks rather than following them out of the workspace (security fix in v0.9.1; earlier 0.9.0/0.2.0 builds could read through an escaping symlink — update recommended).
  • One model per server (--model is just a label); context ceiling 16384 tokens on RDNA4/gfx1201; gemma-QAT is VRAM-tight (~2.5k ctx) → Qwen3-14B-Q4 is the better coder.
  • gemma tool-calling is validated for simple arguments; code-carrying arguments follow.
  • Visible markers, not silent failures[truncated …] at the token limit, [empty answer …] for a think-only response; permission decisions are logged to stderr; stdout stays clean.
  • The REPL needs a real terminal (TTY); for scripting use headless -p.

See Usage for the server side, Supported-Models for the gemma KV-FP8 requirement, and Troubleshooting for empty/truncated answers and the context ceiling.

Clone this wiki locally