v0.9.2 — vf-clide token meter + clean server shutdown
Feature + bugfix release. vf-clide gains live token accounting and a pinned status line; the engine's vulkanforge serve now shuts down cleanly. Supported-config inference output is unchanged — no decode/prefill/behavior change.
vf-clide 0.3.0 — token meter + pinned status line (feature)
- Token accounting. The client surfaces real token usage on every path: the non-streaming response, the tool-calling loop, and the streaming path (via
stream_options.include_usage, which the server emits as a finalusagechunk). No local tokenizer, no estimation — the numbers are the server's own counts. - Pinned status line. The REPL pins a bottom status line (raw ANSI scroll region, no TUI framework) with a token meter —
↑prompt ↓completion (total) · session …— and the current action (idle/generating…/thinking…/running <tool>(…)). It is a no-op when stdout isn't a TTY, so headless-poutput stays byte-for-byte unchanged and fully scriptable.
Engine 0.9.2 — clean serve shutdown (bugfix)
Ctrl+C / SIGTERM on vulkanforge serve previously left the GPU objects undestroyed (the validation layer reported hundreds of leaked objects) and then freed memory against an already-destroyed device → SIGSEGV. The shutdown path now:
- waits for the device to go idle (
device_wait_idle), - runs the explicit resource-teardown chain in order while the device is still alive, and
- drops the memory allocator before the device.
Result: 0 leaked objects, clean exit, no crash on both Ctrl+C and SIGTERM. Shutdown-path only — steady-state decode is untouched.
Versions: engine 0.9.0 → 0.9.2, vf-clide 0.2.1 → 0.3.0. (v0.9.1 was a vf-clide-only search-confinement security patch; the engine stayed at 0.9.0 through it.) Validated on AMD RX 9070 XT (RADV/gfx1201), Mesa 26.1.2.