Skip to content
maeddesg edited this page Jun 20, 2026 · 8 revisions

vf-clide — the CLI chat & agentic coding client

vf-clide is a lean command-line client for the VulkanForge server, shipped alongside the engine. It is its own crate (GPL-3.0) with no engine dependencies — it talks only to the OpenAI-compatible API over HTTP, so it builds and runs independently of the Vulkan stack.

As of v0.9.0 it is both a chat client (streaming/non-streaming, REPL + headless) and an agentic coding client: in --agent mode the model calls tools (read/write/search/shell) in a loop, gated by a tiered permission model and confined to a workspace. (Chat-only was v0.8.0.) Since v0.9.2 the REPL also shows a pinned status line with a live token meter and the current action (see below); headless -p output is unaffected.

Full reference — install, flags, limitations, troubleshooting: vf-clide/README.md.

Build

cargo build --release --manifest-path vf-clide/Cargo.toml   # → ./vf-clide/target/release/vf-clide

Two terminals: server, then client

Terminal 1 — start the server (it stays in the foreground; see Usage):

vulkanforge serve --model ~/models/Qwen_Qwen3-14B-Q4_K_M.gguf --port 8080

(gemma-4-26B models require VULKANFORGE_KV_FP8=1 — see Supported-Models.)

Terminal 2 — the client:

# interactive chat REPL (streams live)
vf-clide --url http://localhost:8080

# headless one-shot chat
vf-clide --url http://localhost:8080 -p "Capital of Japan? One word."

Agent mode (--agent)

In agent mode the model may request tools; vf-clide runs the roundtrip (model → permission gate → tool → result back → continue) for up to 8 iterations. Default coder = Qwen3-14B-Q4 (JSON tool arguments).

# headless, read-only tools auto-approved:
vf-clide --url http://localhost:8080 --agent --yes --workspace ~/code/myproj \
  -p "Search for 'TODO' and summarise the open items."

# REPL agent mode (prompts y/N per tool call):
vf-clide --url http://localhost:8080 --agent --workspace ~/code/myproj

The tools

The four file/shell tools are always present; with the server started --memory, three memory tools are added (recall, remember, archive).

Tool Does Risk tier Confined?
read_file read a file (256 KB cap) ReadOnly yes (workspace)
search substring search, file:line hits (cap 100 / 64 KB) ReadOnly yes (workspace)
write_file create/overwrite a file (parent dirs in-root) Mutating yes (workspace)
shell run a command (cwd = workspace, output cap 256 KB, 30 s timeout) Exec no (see below)
recall semantic recall from project memory (serve --memory) — (own axis) n/a (server memory)
remember write a note to project memory (serve --memory) — (own axis) n/a (server memory)
archive archive a note recalled this session — drops it from recall, reversible via /unarchive (v1.0.3) — (own axis, confirm-always) n/a (server memory)

Permission model — three tiers, cumulative

Each tool carries a risk tier. Auto-approval is opt-in and cumulative (a higher flag implies the lower tiers), and the resulting ceiling applies in both modes (REPL and headless):

Flag auto-approves (ceiling)
(none) nothing
--yes ReadOnly (read_file, search)
--allow-mutating ReadOnly + Mutating (write_file) — implies --yes
--allow-shell ReadOnly + Mutating + Exec (shell) — implies --allow-mutating

--yes alone therefore never approves a write or a shell command. A call at or below the ceiling is auto-approved (and still printed, so you see every tool that ran); a call above it is handled per mode:

  • REPL — prompts y/N (a louder warning for mutating/exec tools). So --agent with no flags prompts for everything; --agent --yes auto-runs reads and only prompts for write/shell; and so on.
  • Headless (-p) — denied (not prompted), so a scripted run never blocks on input.

Changed in v0.9.4: earlier versions prompted for every call in the REPL and honored the flags headless-only. The ceiling now applies in the REPL too — consistent with headless, not laxer (workspace confinement still bounds the file tools independently).

Memory tools are a separate axis (v1.0.2–v1.0.3). With the server on --memory, the agent gets recall / remember / archive. These call the server's /memory/* over HTTP — they touch neither files nor the shell — so they are not bound by the file/shell ceiling: they run (and print a visible marker) whenever memory is enabled, regardless of --yes / --allow-mutating. archive (v1.0.3) sits on this axis too, but is confirm-always — even --allow-shell doesn't auto-approve it, headless denies it, and the prompt shows the note's real stored text (never the model's claim). The recovery/irreversible commands /unarchive and /forget are user-only REPL commands — the agent never restores or deletes.

Workspace & constitution

Flag Default Purpose
--agent off enable the agent loop (otherwise plain chat)
--workspace <path> current dir root for the file tools; canonicalized once
--yes / --allow-mutating / --allow-shell off auto-approval tiers (above)
--system <file> replace the built-in system prompt (constitution) entirely
--no-system off send no system prompt

Without --system/--no-system the agent gets a concise built-in system prompt (role, tool use, permission respect). An AGENTS.md in the workspace root is appended (project-specific instructions) — read confined, so an AGENTS.md symlinked out of the workspace is ignored.

Memory (v1.0.2–v1.0.5)

With the server started --memory, the agent gains three tools — recall, remember, and archive — and the REPL gains the full memory surface: /project, /recall (with --explain / --type / --include-superseded / --frontier), /remember (with --type), the curation commands /archive <id>, /unarchive <id>, /forget <id>, /retype <id> <T>, and the connection layer /supersede / /unsupersede / /derive / /underive / /why / /contradict / /uncontradict. The three agent tools run on their own axis (direct /memory/* calls, not the file/shell gate), so they're available and visible whenever memory is on.

The agent tool set stayed three across v1.0.4–v1.0.5: the recall diagnostics (--explain), note typing, and the edges (SUPERSEDES / DERIVES_FROM + /why in v1.0.4; the symmetric CONTRADICTS and the opt-in --frontier retrieval in v1.0.5) are user curation — REPL-only, never agent tools. The agent contributes notes; the user draws the structure. Full reference: Memory.

Curation split (v1.0.3). The agent may archive a note it recalled this session — but only behind an always-on confirmation that renders the note's real stored text (never the model's claim) plus a required reason; even --allow-shell doesn't auto-approve it, and headless denies it. Archiving is reversible/unarchive <id> restores a note to recall. The recovery and the irreversible delete — /unarchive and /forget — are user-only; the agent has no un-archive or delete tool and is told so.

The agent is also given an accurate self-state: its real tools, its live permission ceiling (built from the actual gate, not guessed — shell is described as un-confined, write_file as confirm-gated without --allow-mutating), and its memory boundaries (it may archive a recalled note, with confirmation; it cannot delete or un-archive). So it recalls a remembered fact instead of file-searching for it, cites the real note id, and never offers a permission it lacks or invents a way to delete. Recall stays an explicit, visible call — nothing is auto-injected. See Memory.

REPL commands

/clear (drop history) · /model <name> (label) · /max-tokens <N> · /think · /no-think · /quit (/q, /exit).

Memory (serve --memory): /project [key] · /recall <query> [--explain] [--frontier] [--type <T>] [--include-superseded] · /remember [--type <T>] <text> · /retype <id> <T> · /supersede <new> <old> · /unsupersede <new> <old> · /derive <A> from <B…> · /underive <A> from <B> · /why <id> · /contradict <id> <id> · /uncontradict <id> <id> · /archive <id> · /unarchive <id> · /forget <id>.

Status line & token meter (v0.9.2)

In an interactive terminal the REPL pins a status line to the bottom row (a raw ANSI scroll region — no TUI framework, no extra dependency beyond terminal-size detection). It shows:

tokens: 1234↑ 5678↓ (6912) · session 23.4k · generating…
  • Token meter prompt tokens, completion tokens, (total) for the last turn, plus a running session total. These are the server's real usage counts (chat, the agent tool-calling loop, and streaming all report usage), not a local estimate.
  • Action — what the client is doing right now: idle at the prompt, generating… while a chat reply streams, and in agent mode thinking… before each model call and running <tool>(…) while a tool runs.

It updates only at turn boundaries (no background repaint, so it never corrupts what you're typing), and it is a no-op when stdout isn't a TTY — so headless -p output stays byte-for-byte unchanged and fully scriptable. Ctrl+C / /quit restore the terminal cleanly.

Common flags (chat + agent)

Flag Default Purpose
-p, --prompt Ask one question, print, exit (headless)
--url http://localhost:8080 Server address
--model Qwen3-14B-Q4_K_M Label sent in the request only (the server decides which model answers)
--max-tokens 6144 Token budget; generous so thinking models have room
--no-think off Append /no_think → answer without the reasoning block
--no-stream off Full answer instead of streaming (chat headless)
--temperature 0.0 Sampling temperature

Limitations

  • shell is NOT confined. cwd is the workspace root, but a command can read anywhere (cat ~/.ssh/id_rsa ignores cwd). Its guard is the Exec tier--allow-shell is the deliberate, loudly-named opt-in (or an interactive y in the REPL). Use it consciously.
  • Chat history is session-only — project memory persists. The agent loop and chat history live only in the session, but the project memory (server --memory) survives restarts and model swaps, reached through the recall/remember tools and the REPL memory commands. See Memory.
  • search is substring-based (no regex); .git/target/node_modules/… are skipped; results capped at 100 hits / 64 KB. Like the other file tools it is workspace-confined — its recursive walk skips symlinks rather than following them out of the workspace (security fix in v0.9.1; earlier 0.9.0/0.2.0 builds could read through an escaping symlink — update recommended).
  • One model per server (--model is just a label); context ceiling 16384 tokens on RDNA4/gfx1201; gemma-QAT is VRAM-tight (~2.5k ctx) → Qwen3-14B-Q4 is the better coder.
  • gemma tool-calling is validated for simple arguments; code-carrying arguments follow.
  • Visible markers, not silent failures[truncated …] at the token limit, [empty answer …] for a think-only response; permission decisions are logged to stderr; stdout stays clean.
  • The REPL needs a real terminal (TTY); for scripting use headless -p.

See Usage for the server side, Supported-Models for the gemma KV-FP8 requirement, and Troubleshooting for empty/truncated answers and the context ceiling.

Clone this wiki locally