-
Notifications
You must be signed in to change notification settings - Fork 1
vf clide
vf-clide is a lean command-line client for the VulkanForge server, shipped alongside the engine.
It is its own crate (GPL-3.0) with no engine dependencies — it talks only to the
OpenAI-compatible API over HTTP, so it builds and runs independently of the Vulkan stack.
As of v0.9.0 it is both a chat client (streaming/non-streaming, REPL + headless) and an
agentic coding client: in --agent mode the model calls tools (read/write/search/shell) in a
loop, gated by a tiered permission model and confined to a workspace. (Chat-only was v0.8.0.)
Since v0.9.2 the REPL also shows a pinned status line with a live token meter and the
current action (see below); headless -p output is unaffected.
Full reference — install, flags, limitations, troubleshooting:
vf-clide/README.md.
cargo build --release --manifest-path vf-clide/Cargo.toml # → ./vf-clide/target/release/vf-clideTerminal 1 — start the server (it stays in the foreground; see Usage):
vulkanforge serve --model ~/models/Qwen_Qwen3-14B-Q4_K_M.gguf --port 8080(gemma-4-26B models require VULKANFORGE_KV_FP8=1 — see Supported-Models.)
Terminal 2 — the client:
# interactive chat REPL (streams live)
vf-clide --url http://localhost:8080
# headless one-shot chat
vf-clide --url http://localhost:8080 -p "Capital of Japan? One word."In agent mode the model may request tools; vf-clide runs the roundtrip (model → permission gate → tool → result back → continue) for up to 8 iterations. Default coder = Qwen3-14B-Q4 (JSON tool arguments).
# headless, read-only tools auto-approved:
vf-clide --url http://localhost:8080 --agent --yes --workspace ~/code/myproj \
-p "Search for 'TODO' and summarise the open items."
# REPL agent mode (prompts y/N per tool call):
vf-clide --url http://localhost:8080 --agent --workspace ~/code/myprojThe four file/shell tools are always present; with the server started --memory, three memory tools are added
(recall, remember, archive).
| Tool | Does | Risk tier | Confined? |
|---|---|---|---|
read_file |
read a file (256 KB cap) | ReadOnly | yes (workspace) |
search |
substring search, file:line hits (cap 100 / 64 KB) |
ReadOnly | yes (workspace) |
write_file |
create/overwrite a file (parent dirs in-root) | Mutating | yes (workspace) |
shell |
run a command (cwd = workspace, output cap 256 KB, 30 s timeout) | Exec | no (see below) |
recall |
semantic recall from project memory (serve --memory) |
— (own axis) | n/a (server memory) |
remember |
write a note to project memory (serve --memory) |
— (own axis) | n/a (server memory) |
archive |
archive a note recalled this session — drops it from recall, reversible via /unarchive (v1.0.3) |
— (own axis, confirm-always) | n/a (server memory) |
Each tool carries a risk tier. Auto-approval is opt-in and cumulative (a higher flag implies the lower tiers), and the resulting ceiling applies in both modes (REPL and headless):
| Flag | auto-approves (ceiling) |
|---|---|
| (none) | nothing |
--yes |
ReadOnly (read_file, search) |
--allow-mutating |
ReadOnly + Mutating (write_file) — implies --yes
|
--allow-shell |
ReadOnly + Mutating + Exec (shell) — implies --allow-mutating
|
--yes alone therefore never approves a write or a shell command. A call at or below the
ceiling is auto-approved (and still printed, so you see every tool that ran); a call above it is
handled per mode:
-
REPL — prompts
y/N(a louder warning for mutating/exec tools). So--agentwith no flags prompts for everything;--agent --yesauto-runs reads and only prompts for write/shell; and so on. -
Headless (
-p) — denied (not prompted), so a scripted run never blocks on input.
Changed in v0.9.4: earlier versions prompted for every call in the REPL and honored the flags headless-only. The ceiling now applies in the REPL too — consistent with headless, not laxer (workspace confinement still bounds the file tools independently).
Memory tools are a separate axis (v1.0.2–v1.0.3). With the server on
--memory, the agent getsrecall/remember/archive. These call the server's/memory/*over HTTP — they touch neither files nor the shell — so they are not bound by the file/shell ceiling: they run (and print a visible marker) whenever memory is enabled, regardless of--yes/--allow-mutating.archive(v1.0.3) sits on this axis too, but is confirm-always — even--allow-shelldoesn't auto-approve it, headless denies it, and the prompt shows the note's real stored text (never the model's claim). The recovery/irreversible commands/unarchiveand/forgetare user-only REPL commands — the agent never restores or deletes.
| Flag | Default | Purpose |
|---|---|---|
--agent |
off | enable the agent loop (otherwise plain chat) |
--workspace <path> |
current dir | root for the file tools; canonicalized once |
--yes / --allow-mutating / --allow-shell
|
off | auto-approval tiers (above) |
--system <file> |
— | replace the built-in system prompt (constitution) entirely |
--no-system |
off | send no system prompt |
Without --system/--no-system the agent gets a concise built-in system prompt (role, tool use,
permission respect). An AGENTS.md in the workspace root is appended (project-specific
instructions) — read confined, so an AGENTS.md symlinked out of the workspace is ignored.
With the server started --memory, the agent gains three tools — recall, remember, and archive — and the REPL
gains the full memory surface: /project, /recall (with --explain / --type / --include-superseded /
--frontier), /remember (with --type), the curation commands /archive <id>, /unarchive <id>, /forget <id>,
/retype <id> <T>, and the connection layer /supersede / /unsupersede / /derive / /underive / /why /
/contradict / /uncontradict. The three agent tools run on their own axis (direct /memory/* calls, not the
file/shell gate), so they're available and visible whenever memory is on.
The agent tool set stayed three across v1.0.4–v1.0.5: the recall diagnostics (--explain), note typing, and the
edges (SUPERSEDES / DERIVES_FROM + /why in v1.0.4; the symmetric CONTRADICTS and the opt-in --frontier
retrieval in v1.0.5) are user curation — REPL-only, never agent tools. The agent contributes notes; the user
draws the structure. Full reference: Memory.
Curation split (v1.0.3). The agent may archive a note it recalled this session — but only behind an
always-on confirmation that renders the note's real stored text (never the model's claim) plus a required
reason; even --allow-shell doesn't auto-approve it, and headless denies it. Archiving is reversible —
/unarchive <id> restores a note to recall. The recovery and the irreversible delete — /unarchive and /forget —
are user-only; the agent has no un-archive or delete tool and is told so.
The agent is also given an accurate self-state: its real tools, its live permission ceiling (built from the
actual gate, not guessed — shell is described as un-confined, write_file as confirm-gated without
--allow-mutating), and its memory boundaries (it may archive a recalled note, with confirmation; it cannot delete
or un-archive). So it recalls a remembered fact instead of file-searching for it, cites the real note id, and
never offers a permission it lacks or invents a way to delete. Recall stays an explicit, visible call — nothing is
auto-injected. See Memory.
/clear (drop history) · /model <name> (label) · /max-tokens <N> · /think · /no-think ·
/quit (/q, /exit).
Memory (serve --memory):
/project [key] · /recall <query> [--explain] [--frontier] [--type <T>] [--include-superseded] ·
/remember [--type <T>] <text> · /retype <id> <T> · /supersede <new> <old> · /unsupersede <new> <old> ·
/derive <A> from <B…> · /underive <A> from <B> · /why <id> · /contradict <id> <id> ·
/uncontradict <id> <id> · /archive <id> · /unarchive <id> · /forget <id>.
In an interactive terminal the REPL pins a status line to the bottom row (a raw ANSI scroll region — no TUI framework, no extra dependency beyond terminal-size detection). It shows:
tokens: 1234↑ 5678↓ (6912) · session 23.4k · generating…
-
Token meter —
↑prompt tokens,↓completion tokens,(total)for the last turn, plus a running session total. These are the server's real usage counts (chat, the agent tool-calling loop, and streaming all reportusage), not a local estimate. -
Action — what the client is doing right now:
idleat the prompt,generating…while a chat reply streams, and in agent modethinking…before each model call andrunning <tool>(…)while a tool runs.
It updates only at turn boundaries (no background repaint, so it never corrupts what you're typing),
and it is a no-op when stdout isn't a TTY — so headless -p output stays byte-for-byte
unchanged and fully scriptable. Ctrl+C / /quit restore the terminal cleanly.
| Flag | Default | Purpose |
|---|---|---|
-p, --prompt
|
— | Ask one question, print, exit (headless) |
--url |
http://localhost:8080 |
Server address |
--model |
Qwen3-14B-Q4_K_M |
Label sent in the request only (the server decides which model answers) |
--max-tokens |
6144 |
Token budget; generous so thinking models have room |
--no-think |
off | Append /no_think → answer without the reasoning block |
--no-stream |
off | Full answer instead of streaming (chat headless) |
--temperature |
0.0 |
Sampling temperature |
-
shellis NOT confined. cwd is the workspace root, but a command can read anywhere (cat ~/.ssh/id_rsaignores cwd). Its guard is the Exec tier —--allow-shellis the deliberate, loudly-named opt-in (or an interactiveyin the REPL). Use it consciously. -
Chat history is session-only — project memory persists. The agent loop and chat history live only in the
session, but the project memory (server
--memory) survives restarts and model swaps, reached through therecall/remembertools and the REPL memory commands. See Memory. -
searchis substring-based (no regex);.git/target/node_modules/… are skipped; results capped at 100 hits / 64 KB. Like the other file tools it is workspace-confined — its recursive walk skips symlinks rather than following them out of the workspace (security fix in v0.9.1; earlier 0.9.0/0.2.0 builds could read through an escaping symlink — update recommended). -
One model per server (
--modelis just a label); context ceiling 16384 tokens on RDNA4/gfx1201; gemma-QAT is VRAM-tight (~2.5k ctx) → Qwen3-14B-Q4 is the better coder. - gemma tool-calling is validated for simple arguments; code-carrying arguments follow.
-
Visible markers, not silent failures —
[truncated …]at the token limit,[empty answer …]for a think-only response; permission decisions are logged to stderr; stdout stays clean. - The REPL needs a real terminal (TTY); for scripting use headless
-p.
See Usage for the server side, Supported-Models for the gemma KV-FP8 requirement, and Troubleshooting for empty/truncated answers and the context ceiling.
VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 ·
Repository · Releases