vf clide

vf-clide — the CLI chat & agentic coding client

vf-clide is a lean command-line client for the VulkanForge server, shipped alongside the engine. It is its own crate (GPL-3.0) with no engine dependencies — it talks only to the OpenAI-compatible API over HTTP, so it builds and runs independently of the Vulkan stack.

As of v0.9.0 it is both a chat client (streaming/non-streaming, REPL + headless) and an agentic coding client: in --agent mode the model calls tools (read/write/search/shell) in a loop, gated by a tiered permission model and confined to a workspace. (Chat-only was v0.8.0.) Since v0.9.2 the REPL also shows a pinned status line with a live token meter and the current action (see below); headless -p output is unaffected.

Full reference — install, flags, limitations, troubleshooting: vf-clide/README.md.

Build

cargo build --release --manifest-path vf-clide/Cargo.toml   # → ./vf-clide/target/release/vf-clide

Two terminals: server, then client

Terminal 1 — start the server (it stays in the foreground; see Usage):

vulkanforge serve --model ~/models/Qwen_Qwen3-14B-Q4_K_M.gguf --port 8080

(gemma-4-26B models require VULKANFORGE_KV_FP8=1 — see Supported-Models.)

Terminal 2 — the client:

# interactive chat REPL (streams live)
vf-clide --url http://localhost:8080

# headless one-shot chat
vf-clide --url http://localhost:8080 -p "Capital of Japan? One word."

Agent mode (`--agent`)

In agent mode the model may request tools; vf-clide runs the roundtrip (model → permission gate → tool → result back → continue) for up to 8 iterations. Default coder = Qwen3-14B-Q4 (JSON tool arguments).

# headless, read-only tools auto-approved:
vf-clide --url http://localhost:8080 --agent --yes --workspace ~/code/myproj \
  -p "Search for 'TODO' and summarise the open items."

# REPL agent mode (prompts y/N per tool call):
vf-clide --url http://localhost:8080 --agent --workspace ~/code/myproj

The tools

The four file/shell tools are always present; with the server started --memory, three memory tools are added (recall, remember, archive).

Tool	Does	Risk tier	Confined?
`read_file`	read a file (256 KB cap)	ReadOnly	yes (workspace)
`search`	substring search, `file:line` hits (cap 100 / 64 KB)	ReadOnly	yes (workspace)
`write_file`	create/overwrite a file (parent dirs in-root)	Mutating	yes (workspace)
`shell`	run a command (cwd = workspace, output cap 256 KB, 30 s timeout)	Exec	no (see below)
`recall`	semantic recall from project memory (`serve --memory`)	— (own axis)	n/a (server memory)
`remember`	write a note to project memory (`serve --memory`)	— (own axis)	n/a (server memory)
`archive`	archive a note recalled this session — drops it from recall, reversible via `/unarchive` (v1.0.3)	— (own axis, confirm-always)	n/a (server memory)

Permission model — three tiers, cumulative

Each tool carries a risk tier. Auto-approval is opt-in and cumulative (a higher flag implies the lower tiers), and the resulting ceiling applies in both modes (REPL and headless):

Flag	auto-approves (ceiling)
(none)	nothing
`--yes`	ReadOnly (`read_file`, `search`)
`--allow-mutating`	ReadOnly + Mutating (`write_file`) — implies `--yes`
`--allow-shell`	ReadOnly + Mutating + Exec (`shell`) — implies `--allow-mutating`

--yes alone therefore never approves a write or a shell command. A call at or below the ceiling is auto-approved (and still printed, so you see every tool that ran); a call above it is handled per mode:

REPL — prompts y/N (a louder warning for mutating/exec tools). So --agent with no flags prompts for everything; --agent --yes auto-runs reads and only prompts for write/shell; and so on.
Headless (-p) — denied (not prompted), so a scripted run never blocks on input.

Changed in v0.9.4: earlier versions prompted for every call in the REPL and honored the flags headless-only. The ceiling now applies in the REPL too — consistent with headless, not laxer (workspace confinement still bounds the file tools independently).

Memory tools are a separate axis (v1.0.2–v1.0.3). With the server on --memory, the agent gets recall / remember / archive. These call the server's /memory/* over HTTP — they touch neither files nor the shell — so they are not bound by the file/shell ceiling: they run (and print a visible marker) whenever memory is enabled, regardless of --yes / --allow-mutating. archive (v1.0.3) sits on this axis too, but is confirm-always — even --allow-shell doesn't auto-approve it, headless denies it, and the prompt shows the note's real stored text (never the model's claim). The recovery/irreversible commands /unarchive and /forget are user-only REPL commands — the agent never restores or deletes.

Workspace & constitution

Flag	Default	Purpose
`--agent`	off	enable the agent loop (otherwise plain chat)
`--workspace <path>`	current dir	root for the file tools; canonicalized once
`--yes` / `--allow-mutating` / `--allow-shell`	off	auto-approval tiers (above)
`--system <file>`	—	replace the built-in system prompt (constitution) entirely
`--no-system`	off	send no system prompt

Without --system/--no-system the agent gets a concise built-in system prompt (role, tool use, permission respect). An AGENTS.md in the workspace root is appended (project-specific instructions) — read confined, so an AGENTS.md symlinked out of the workspace is ignored.

Memory (v1.0.2–v1.0.5)

With the server started --memory, the agent gains three tools — recall, remember, and archive — and the REPL gains the full memory surface: /project, /recall (with --explain / --type / --include-superseded / --frontier), /remember (with --type), the curation commands /archive <id>, /unarchive <id>, /forget <id>, /retype <id> <T>, and the connection layer /supersede / /unsupersede / /derive / /underive / /why / /contradict / /uncontradict. The three agent tools run on their own axis (direct /memory/* calls, not the file/shell gate), so they're available and visible whenever memory is on.

The agent tool set stayed three across v1.0.4–v1.0.5: the recall diagnostics (--explain), note typing, and the edges (SUPERSEDES / DERIVES_FROM + /why in v1.0.4; the symmetric CONTRADICTS and the opt-in --frontier retrieval in v1.0.5) are user curation — REPL-only, never agent tools. The agent contributes notes; the user draws the structure. Full reference: Memory.

Curation split (v1.0.3). The agent may archive a note it recalled this session — but only behind an always-on confirmation that renders the note's real stored text (never the model's claim) plus a required reason; even --allow-shell doesn't auto-approve it, and headless denies it. Archiving is reversible — /unarchive <id> restores a note to recall. The recovery and the irreversible delete — /unarchive and /forget — are user-only; the agent has no un-archive or delete tool and is told so.

The agent is also given an accurate self-state: its real tools, its live permission ceiling (built from the actual gate, not guessed — shell is described as un-confined, write_file as confirm-gated without --allow-mutating), and its memory boundaries (it may archive a recalled note, with confirmation; it cannot delete or un-archive). So it recalls a remembered fact instead of file-searching for it, cites the real note id, and never offers a permission it lacks or invents a way to delete. Recall stays an explicit, visible call — nothing is auto-injected. See Memory.

REPL commands

/clear (drop history) · /model <name> (label) · /max-tokens <N> · /think · /no-think · /quit (/q, /exit).

Memory (serve --memory): /project [key] · /recall <query> [--explain] [--frontier] [--type <T>] [--include-superseded] · /remember [--type <T>] <text> · /retype <id> <T> · /supersede <new> <old> · /unsupersede <new> <old> · /derive <A> from <B…> · /underive <A> from <B> · /why <id> · /contradict <id> <id> · /uncontradict <id> <id> · /archive <id> · /unarchive <id> · /forget <id>.

Status line & token meter (v0.9.2)

In an interactive terminal the REPL pins a status line to the bottom row (a raw ANSI scroll region — no TUI framework, no extra dependency beyond terminal-size detection). It shows:

tokens: 1234↑ 5678↓ (6912) · session 23.4k · generating…

Token meter — ↑ prompt tokens, ↓ completion tokens, (total) for the last turn, plus a running session total. These are the server's real usage counts (chat, the agent tool-calling loop, and streaming all report usage), not a local estimate.
Action — what the client is doing right now: idle at the prompt, generating… while a chat reply streams, and in agent mode thinking… before each model call and running <tool>(…) while a tool runs.

It updates only at turn boundaries (no background repaint, so it never corrupts what you're typing), and it is a no-op when stdout isn't a TTY — so headless -p output stays byte-for-byte unchanged and fully scriptable. Ctrl+C / /quit restore the terminal cleanly.

Common flags (chat + agent)

Flag	Default	Purpose
`-p`, `--prompt`	—	Ask one question, print, exit (headless)
`--url`	`http://localhost:8080`	Server address
`--model`	`Qwen3-14B-Q4_K_M`	Label sent in the request only (the server decides which model answers)
`--max-tokens`	`6144`	Token budget; generous so thinking models have room
`--no-think`	off	Append `/no_think` → answer without the reasoning block
`--no-stream`	off	Full answer instead of streaming (chat headless)
`--temperature`	`0.0`	Sampling temperature

Limitations

shell is NOT confined. cwd is the workspace root, but a command can read anywhere (cat ~/.ssh/id_rsa ignores cwd). Its guard is the Exec tier — --allow-shell is the deliberate, loudly-named opt-in (or an interactive y in the REPL). Use it consciously.
Chat history is session-only — project memory persists. The agent loop and chat history live only in the session, but the project memory (server --memory) survives restarts and model swaps, reached through the recall/remember tools and the REPL memory commands. See Memory.
search is substring-based (no regex); .git/target/node_modules/… are skipped; results capped at 100 hits / 64 KB. Like the other file tools it is workspace-confined — its recursive walk skips symlinks rather than following them out of the workspace (security fix in v0.9.1; earlier 0.9.0/0.2.0 builds could read through an escaping symlink — update recommended).
One model per server (--model is just a label); context ceiling 16384 tokens on RDNA4/gfx1201; gemma-QAT is VRAM-tight (~2.5k ctx) → Qwen3-14B-Q4 is the better coder.
gemma tool-calling is validated for simple arguments; code-carrying arguments follow.
Visible markers, not silent failures — [truncated …] at the token limit, [empty answer …] for a think-only response; permission decisions are logged to stderr; stdout stays clean.
The REPL needs a real terminal (TTY); for scripting use headless -p.

See Usage for the server side, Supported-Models for the gemma KV-FP8 requirement, and Troubleshooting for empty/truncated answers and the context ceiling.

VulkanForge v1.0.4 · single-user RDNA 4 / gfx1201 Vulkan inference · GPL-3.0 · Repository · Releases

VulkanForge Wiki

Get Started

Use VulkanForge

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vf clide

vf-clide — the CLI chat & agentic coding client

Build

Two terminals: server, then client

Agent mode (`--agent`)

The tools

Permission model — three tiers, cumulative

Workspace & constitution

Memory (v1.0.2–v1.0.5)

REPL commands

Status line & token meter (v0.9.2)

Common flags (chat + agent)

Limitations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VulkanForge Wiki

Clone this wiki locally

vf clide

vf-clide — the CLI chat & agentic coding client

Build

Two terminals: server, then client

Agent mode (--agent)

The tools

Permission model — three tiers, cumulative

Workspace & constitution

Memory (v1.0.2–v1.0.5)

REPL commands

Status line & token meter (v0.9.2)

Common flags (chat + agent)

Limitations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VulkanForge Wiki

Clone this wiki locally

Agent mode (`--agent`)