Releases: PythonLuvr/openwar
v0.13.0
Use OpenWar from any tool that speaks the OpenAI API.
v0.13.0 ships openwar serve --openai-compat, an OpenAI Chat Completions HTTP server in front of OpenWar's runtime. Any tool that speaks OpenAI's API (Aider, Continue, Cline, the OpenAI SDKs, homegrown wrappers) points at the proxy with one env-var change and runs through OpenWar's phase-gated, traced, detector-enforced execution. The tool does not know OpenWar exists.
This is the MVP cut. Plain-text streaming and non-streaming chat completions work end-to-end against any upstream adapter. Tool round-trip and PermissionBridge negotiation land in v0.13.1.
What ships
openwar serve --openai-compat
A new CLI subcommand. Hand-rolled node:http server with a hand-rolled SSE encoder. Zero new dependencies.
Endpoints:
POST /v1/chat/completions: streaming SSE and non-streaming JSON.GET /v1/models: returns the configured upstream as a single model entry.GET /healthz: liveness probe.- All other paths return an OpenAI-shaped 404.
Security defaults
- Localhost-default bind (
127.0.0.1). Binding to0.0.0.0requires explicit intent and warns at startup. - Bearer-token auth with constant-time comparison. OpenAI-shaped 401 on failure.
--no-authexists for local dev and warns every startup. - Conservative
authorized_costsdefault (filesystem_readonly). The startup banner explains the expansion pattern for agentic clients. --max-concurrentgate (default 4). Excess requests get an OpenAI-shapedrate_limit_error429.
Observability
- Every proxied request writes a trace at
~/.openwar/sessions/proxy-<uuid>.trace.ndjson. - Every response carries an
X-OpenWar-Trace-Idheader for correlation. Standard OpenAI clients ignore it; OpenWar-aware tooling can read it and runopenwar inspect. - Two new trace event types:
proxy_requestandproxy_response.TRACE_SCHEMA_VERSIONbumps from 4 to 5. Additive.
Phase machine in proxy mode
Each request synthesizes an in-memory brief (mode: auto, scope_locked: true). Phase 0 and Phase 4 run without operator prompts (the proxy cannot block on stdin). Phase 3 still fires; v0.13.0 ships the denial path (a refusal-text completion with finish_reason: content_filter when a destructive action is blocked). Full PermissionBridge negotiation lands in v0.13.1.
Upstream composition
The proxy routes the actual completion to any configured upstream adapter: Anthropic, OpenAI, Gemini, Grok, openai-compat, or cli-bridge. When the upstream is cli-bridge, the proxy warns about per-request CLI cold-start cost and recommends --max-concurrent 1.
Stats
- 909 tests (up from 849). 60 new across
tests/serve/. - Coverage gates green.
- Zero new runtime dependencies.
node:http, hand-rolled SSE,node:crypto.randomUUID.
Deferred to v0.13.1
- Tool-call round-trip (request
toolsto responsetool_calls). v0.13.0 records the tool count in the trace but does not dispatch tools. - PermissionBridge negotiation via
openwar:request_permissiontool calls. - Comprehensive Continue / Cline / Cursor integration examples.
Upgrade notes
Drop-in. npm update @pythonluvr/openwar from v0.12.1 picks this up automatically. No existing CLI command, library export, or trace event shape changes. The openwar serve subcommand and the two new trace events are additive.
To use the proxy: run openwar serve --openai-compat --auth-token <token>, then point your OpenAI-API tool at http://127.0.0.1:1234/v1 with the token as its API key. Full setup, the security model, and the supported request surface are documented in docs/openai-proxy.md.
v0.12.1
The runtime now sees what the bridged CLI sees.
v0.12.1 wires up Squire 1.1.0's structured event surface. The runtime captures every tool the bridged Claude Code or Gemini CLI invokes, the arguments, the result, the thinking-mode tokens, and the usage report. What used to disappear into a stdout text blob now flows through OpenWar's trace, inspect, and cost ledger.
What ships
Four new StreamEvent variants (additive, backwards compatible)
bridged_tool_call: a tool fired inside the bridged CLI's own run. Carriestool_name,arguments,call_id, and the bridged binary.bridged_tool_result: the result of a bridged tool call. Carriescall_id,result,is_error, and the bridged binary.bridged_thinking_delta: thinking-mode tokens emitted by the bridged CLI. Kept distinct fromtext_deltaso observers can filter them.bridged_usage: token counts from the bridged CLI. Input, output, cache read, cache write fields.
The bridged_ prefix is load-bearing. It distinguishes events that came from inside a bridged CLI's run from events OpenWar's own runtime fired.
Four new trace event types
Same four shapes flow into trace.ndjson. TRACE_SCHEMA_VERSION bumps from 3 to 4. Additive; old readers ignore unknown event types.
Shared bridged-event router
New convergence point at src/runtime/bridged-events.ts. Single-agent (execute.ts) and multi-agent (driver.ts) dispatch paths route bridged events identically: trace event always, cost-ledger feed when a ledger exists. Closes the single-agent observability gap (sessions without a coordinator still get the trace).
Cost tracker integration
addBridgedUsage(usage, u) helper feeds bridged-CLI tokens into the cost ledger. Input + output count toward tokens_used for budget arithmetic. Cache reads and writes are recorded on dedicated bridged_tokens_cache_read / bridged_tokens_cache_write counters but excluded from tokens_used. Cache reads bill at a fraction of normal input rate; inflating the budget total with them would trip --max-tokens gates prematurely.
openwar inspect --tools grouping
Output now groups by source:
Native tool calls (OpenWar runtime)
...
Bridged CLI tool calls (from inside the bridged CLI's run)
...
Native first when both are present. Chronological within each section.
Test fixtures
Snapshot copy of Squire 1.1.0's real Claude Code and Gemini CLI fixtures at tests/fixtures/squire-snapshot/. README documents Squire 1.1.0 as the version snapshot point; future Squire releases that touch structured-event shapes will require a re-sync.
Stats
- 849 tests (up from 819). 30 new across
tests/adapters/,tests/state/,tests/coordinator/,tests/cli/. - Coverage gates: every tracked dir above 85% line coverage.
src/state/trace.tsat 100%. - Zero new runtime dependencies. Squire stays pinned at
^1.1.0.
Upgrade notes
Drop-in. npm update @pythonluvr/openwar from v0.12.0 picks this up automatically. Existing StreamEvent consumers continue to work unchanged; the four new variants are additive. Existing cost-tracker behavior is unchanged for native-adapter runs. The discipline layer does not change. The observability layer does.
Operators running bridged Claude Code or Gemini CLI sessions before this release saw stdout text in their traces; after upgrading, they see structured tool calls. No code change required on the operator side.
v0.12.0
PermissionBridge turns Phase 3 into a conversation.
v0.12 makes the destructive-action gate articulate. Bridged CLIs (and any tool-calling agent) can now call request_permission before a destructive action with a structured payload (action, scope, reasoning, fallback). The operator answers at one of three scopes: this_call covers one upcoming destructive tool call, this_session lasts until the run ends, persistent survives across sessions via a per-project JSONL store. Phase 3 honors matching grants on the subsequent destructive call without re-prompting, emits permission_grant_consumed for the audit trail, and still fires when no grant matches.
PermissionBridge does not relax the gate. It makes the gate articulate.
What ships
New native tool: request_permission
Exposed via MCP-server-mode as openwar:request_permission. Default-allowed (requesting permission is itself never destructive). Input takes action, scope (this_call / this_session / persistent), reasoning, and optional fallback and category. Output returns granted, scope_granted, operator_note, and grant_id.
Grant ledger
Per-session GrantLedger lives on the active Session. Persistent grants serialize to ~/.openwar/projects/<slug>/permission_grants.jsonl (append-only). Persistent grants are seeded into the in-memory ledger at session start for the active project_slug.
Phase 3 integration
Phase 3's destructive-detector decision path checks the grant ledger before firing the operator prompt. Matching grants emit permission_grant_consumed and a synthesized auth-allow event, skipping the prompt. Non-matching destructive calls fire the existing Phase 3 halt path. Category-only matching; the most recent unconsumed this_call grant wins over this_session / persistent.
Chat REPL prompt
Permission request from agent:
ACTION Delete the file src/legacy.ts
REASON File is unreferenced; cleaning up before the refactor.
FALLBACK Skip the cleanup; refactor proceeds with the file present.
CATEGORY filesystem_write
REQUESTED SCOPE this_call
Approve at what scope?
y grant at requested scope (this_call)
s grant for the rest of this session
p grant persistently (saved to project memory)
n deny
n: <msg> deny with a note for the agent
>
Slash commands: /grants lists active grants, /revoke <grant_id> invalidates a grant mid-session.
Public API additions (all additive, backwards compatible)
Session.listActiveGrants(): readonly Grant[]Session.revokeGrant(grant_id: string): booleanSandboxContext.io?,SandboxContext.grantLedger?,SandboxContext.tracer?(optional fields; existing tools ignore)- Five new trace event variants:
permission_requested,permission_granted,permission_denied,permission_grant_consumed,permission_revoked TRACE_SCHEMA_VERSIONbumped 2 → 3 (additive; old readers ignore unknown event types)openwar inspect <brief_id> --permissionsshows a per-grant audit row- New library export:
formatPermissions
Stats
- 819 tests (up from 769). 50 new tests across
tests/tools/request-permission.test.ts,tests/runner/grants.test.ts,tests/runner/permission-grant-consumption.test.ts,tests/cli/chat-permission-prompt.test.ts,tests/state/permission-trace-events.test.ts,tests/cli/inspect-permissions.test.ts. - Coverage gates: every tracked dir above 85% line coverage.
src/state/trace.tsandsrc/state/heuristics.tsat 100%. - Zero new runtime dependencies.
AbortController,node:crypto.randomUUID, and the existing MCP server infrastructure cover everything.
Upgrade notes
Drop-in. npm update @pythonluvr/openwar from v0.11.2 picks this up automatically. Existing programs see no behavior change; the Session interface gains additive methods, SandboxContext gains optional fields, and the new native tool is opt-in. Phase 3 with no active grants behaves exactly as in v0.11.2. Existing trace consumers continue to work; new event types are additive.
Persistent grants live until explicitly revoked. No TTL, no expiration. Operators auditing past behavior can run openwar inspect <brief_id> --permissions for a full grant history per session.
v0.11.2
Patch release. Restores TypeScript build compatibility with the just-shipped Squire v1.1.0.
What broke
Squire v1.1.0 added four new SquireEvent variants (tool_call, tool_result, thinking_delta, usage) for its new vendor-aware adapters. OpenWar's cli-bridge.ts does exhaustive narrowing on SquireEvent.type via a never assignment. The new variants caused a compile-time error for anyone building OpenWar from source once npm install resolved ^1.0.0 to 1.1.0. End users running the precompiled dist/ were unaffected; contributors, forks, and fresh CI clones were not.
What v0.11.2 does
Explicit no-op case arms for the four new Squire variants in the cli-bridge.ts event translator. The exhaustive never check is preserved so any future additive Squire release surfaces the same way (compile-time call to add an arm). The Squire dep range moves from ^1.0.0 to ^1.1.0 so the lockfile pins against the version that introduced the variants.
OpenWar does not yet translate the new structured events (tool_call, tool_result) into its own StreamEvent surface. Adoption of the richer tool-call surface is a separate decision tracked for v0.12 or later.
Stats
- 769 tests green against
@pythonluvr/squire@1.1.0. - Cross-platform CI matrix (Ubuntu / macOS / Windows × Node 20 / 22): green.
- Sanity-regex gate, em-dash gate, coverage gates: green.
Upgrade notes
Drop-in. npm update @pythonluvr/openwar from v0.11.1 picks this up automatically and resolves Squire to 1.1.0. No code changes required on the consumer side.
v0.11.1
Talk to your agent. The runtime keeps the phases honest, the destructives gated, and the trace intact.
Two honest gaps closed without expanding scope, plus the published-Squire adoption that unblocks Squire's first real-world consumer claim on npm.
Mid-tool-call cancellation
v0.11.1 makes the runtime survive a slow tool call. Ctrl-C in the chat REPL aborts the in-flight tool, surfaces a structured cancelled tool-result to the model, and writes a tool_cancelled event to the trace. A second Ctrl-C within 2 seconds exits cleanly with the chat log saved. shell_exec gets a 3-second SIGTERM grace before SIGKILL. apply_patch rolls back already-written files. MCP servers that ignore the abort get 5 seconds before the runtime synthesizes the cancellation locally. The phase machine continues; the agent decides whether to retry or switch approach. Zero new runtime deps; everything is AbortController / AbortSignal from Node stdlib.
README hero rewrite
The README now leads with openwar chat and the conversational front door, ordered before the discipline-not-intelligence framing. The "Just talk to it" section is renamed "Start with a conversation" and shows a one-input / one-output sample turn. WarBit chaos imagery moves out of the hero into the contrast section below.
Published Squire adoption
@pythonluvr/squire@1.0.0 is now consumed from the npm registry instead of a local workspace link. No source change required (the import path was already correct); the lockfile resolves from the registry. OpenWar is officially Squire's first real-world consumer on npm.
Public API additions (all additive, backwards compatible)
RunOptions.signal?: AbortSignalRunOptions.onSession?: (session: Session) => voidSession.cancelCurrentToolCall(): Promise<boolean>SandboxContext.signal?: AbortSignalfor custom-tool authorstool_cancelledtrace event variantTRACE_SCHEMA_VERSIONbumped from 1 to 2 (old readers ignore unknown event types)
Stats
- 32 new tests across
tests/tools/cancellation.test.ts,tests/runner/cancel.test.ts,tests/cli/chat-cancel.test.ts,tests/state/tool-cancelled-event.test.ts. Total: 769 (up from 737). - Coverage gates: every tracked dir above 85% line coverage.
src/state/trace.tsandsrc/state/heuristics.tsat 100%. - Sanity-regex gate, em-dash gate, cross-platform CI matrix: green.
Upgrade notes
Drop-in. npm update @pythonluvr/openwar picks this up from v0.11.0 automatically. No code changes required to opt out of cancellation; the new RunOptions fields are optional. The trace schema bump is additive: existing trace consumers continue to work; new ones can read the tool_cancelled event directly.
v0.11.0 — cli-bridge powered by @pythonluvr/squire
0.11.0
cli-bridge becomes a thin wrapper over @pythonluvr/squire, a new standalone npm package extracted from this codebase. Public behavior is unchanged: openwar run brief.md produces identical traces, identical phase events, identical tool-call shapes before and after the split. The architectural change is for discoverability. Squire ships its own README, its own front door, and lets developers searching "run Claude Code from Node.js" or "orchestrate multiple CLI agents" land on a focused tool with a clean API instead of a buried module path inside OpenWar.
Originally scoped as one ship covering structured event streaming for Claude Code / Codex / Gemini CLI. Phase 0 review caught that the brief's claimed SquireEvent surface (tool_call, tool_result, message_start/stop) does not match the code that exists today (text-stream only, no per-CLI parsers). Shipping the full surface would have required building three new vendor JSON-stream parsers from scratch and snapshot fixtures from real CLI runs. Split into v0.11.0 (foundations + honest event union) and v0.11.1 (per-CLI parsers and the richer event union) on the same pattern as v0.7 / v0.9 / v0.10 splits. Four for four.
Added (Squire side, separate package)
- @pythonluvr/squire v1.0.0. General-purpose runtime for spawning CLI AI agents (Claude Code, Codex, Gemini CLI) as subprocesses. MIT-licensed. Public API frozen at v1.0.0; additive changes only on the v1.x line. Zero runtime dependencies (Node stdlib only). Cross-platform (Windows / macOS / Linux). Bundled TypeScript types.
Squireclass withstart(prompt) / send(followup) / stop({graceful})lifecycle plusstdout/stderr/event/exitevents.SquireEventdiscriminated union:stdout,stderr,text_delta,message_start,message_stop,error. Honest about v1.0 scope; per-CLI adapters withtool_call/tool_resultevents arrive in v1.x.- MCP forwarding: pass
mcp.servers(inline) ormcp.configPath(pre-built) and Squire wires--mcp-config <path>(or a configurable flag) for the child CLI. Temp config files are cleaned up on stop. - Claude Code permission auto-setup:
autoSetup.claudeCodemergesallowedToolspatterns into~/.claude/settings.jsonatomically while preserving everything else. Idempotent. SquireAdapterinterface exported as v1.0 contract for custom per-CLI parsers.registerSquireAdapter(adapter)for process-global registration.- Cross-platform spawn: Windows
.cmd/.bat/ extensionless-binary handling baked in.needsShellexported for callers that want to share the auto-detection logic. - Typed error surface (
SquireError,SquireAutoSetupError) with stablecodestrings.
Added (OpenWar side)
@pythonluvr/squireas a runtime dependency. During local dev this isfile:../squire; the publisher swaps to^1.0.0before npm publish.docs/adapters.md"Powered by" section linking to the Squire repo and explaining the dependency split is purely architectural.- README "Powered by" footer linking to Squire as the underlying CLI-agent runtime.
Changed
src/adapters/cli-bridge.tsrewritten as a thin wrapper over Squire. Down from 330 lines of subprocess plumbing to ~210 lines of translation + serialization. The Windows quirks, timeout machinery, abort-signal handling, EPIPE recovery, and stderr buffering all move into Squire.addExtraArgsis preserved for the v0.7 MCP wiring.- OpenWar's
StreamEventshape at the public surface is unchanged. Existing cli-bridge tests that read events at OpenWar's boundary remain green.
Design notes (Phase 0 deviations approved)
- Split into v0.11.0 + v0.11.1. Spawn + MCP forwarding + auto-setup ship as Squire v1.0.0 with an honest event union now; per-CLI vendor JSON-stream parsers (Claude Code, Codex, Gemini CLI) and the richer
SquireEventvariants land in v1.1.0 as a minor additive bump once real CLI snapshot fixtures are captured. - Decoupling is enforced by import discipline. Squire's
src/has zeroimportstatements referencing OpenWar. The standalone positioning fails if Squire's API can't be used without OpenWar's types; this gate makes the failure visible. - OpenWar's MCP server is NOT moving to Squire. Squire knows how to pass
--mcp-config <path>to a child; building the config file (with project-memory tools, brief-aware authorization, registry-specific format quirks for Gemini's.gemini/settings.json) stays in OpenWar. That logic is genuinely OpenWar-specific.
Test count
OpenWar: no net change (existing cli-bridge tests stay green against the wrapper). Squire: ~40 new (spawn, events, adapters, MCP, autosetup, lifecycle). Target ~60 lands with v1.1.0 per-CLI adapter tests.
Publisher queue
Squire must publish first (OpenWar depends on it). Order:
github.com/PythonLuvr/squire: push, tagv1.0.0, GitHub release,npm publishvia WebAuthn.- Swap OpenWar's
package.jsondep fromfile:../squireto^1.0.0. github.com/PythonLuvr/openwar: push, tagv0.11.0, GitHub release,npm publishvia WebAuthn.
v0.10.0 — openwar chat (the front door)
0.10.0
openwar chat: the front door. (Patch applied post-initial-commit to close three corners caught during honest self-review: Windows readline handling, adversarial agent-drift coverage at the session level, and full emission of all three chat trace events. See "Post-commit fixes" section below.) The runtime is the same; the entry point is new. A non-developer describes what they want in plain English, OpenWar asks clarifying questions if needed, proposes a plan, gets approval, executes through the existing phase machine, and surfaces destructive prompts as plain English questions instead of y/n flags. The audit trail underneath (trace.ndjson, phase events, detector log, learned profile) all still exists. Power users keep writing briefs by hand. Everyone else just talks.
Originally scoped as one ship. Split during Phase 0 review into v0.10.0 (functional chat layer) and v0.10.1 (positioning + UX refinements based on real adoption signal) on the same pattern as the v0.7 reorder and v0.9 split. Three for three on this pattern.
Added
openwar chatsubcommand. Interactive readline session. Default conversation-agent adapter precedence:ANTHROPIC_API_KEY>OPENAI_API_KEY>GEMINI_API_KEY(orGOOGLE_API_KEY) >XAI_API_KEY>OPENAI_COMPAT_API_KEY. Hard error with install hint if none are set; cli-bridge stays fully supported for hand-authored briefs as the BYOK-free escape hatch. Flags:--resume <id|last>,--adapter,--model,--exec-adapter,--exec-binary,--project,--no-save.- Per-role adapter split for chat sessions. Conversation-agent adapter (must support tool calls) is separate from execution adapter. Default: same. Override execution to cli-bridge via
--exec-adapter cli-bridge --exec-binary claudefor free local execution on an existing Claude Code subscription while keeping intent extraction deterministic via a BYOK key. - Structured tool-call intent contract (
src/chat/intent.ts). Conversation agent declares intent via four tool calls (ask_clarification,propose_plan,start_execution,summarize_result), not free text. Adversarial fixtures intests/chat/intent.test.tspin every failure mode (no_tool_call, multiple_tool_calls, unknown_tool, invalid_args, fabricated_approval). Drift counter falls back to a deterministic user question after 3 failed turns; hard-fails the session after 5 with a save-and-resume pointer. - Conservative-authorization compiler (
src/chat/compile.ts). LOAD-BEARING INVARIANT: destructive categories (filesystem_delete,shell_exec,http_fetch,paid_api_call,git_write,git_push,deploy,external_message) are NEVER auto-granted. They route through Phase 3 at execution time so the user sees them as natural-language confirms instead of silent grants in a skimmed plan.SAFE_AUTOGRANTis{filesystem_read, filesystem_write}only; expanding it is a P0 regression. Adversarial fixtures pin each destructive category individually. - Plain-English plan presenter (
src/chat/plan.ts). Three sections: Plan (bulleted), Authorized (plain-language descriptions of each cost category), Not authorized (explicit list with consequence sentences and a reassuring "I'll ask you in plain English first" note). - Phase event renderer (
src/chat/render.ts) + destructive phrase templates (src/chat/destructive-phrases.ts). Translates runtime trace events to chat output.git pushbecomes "publish this change to your repository; that will push your local commit to the remote." Tool-call debouncing avoids spamming on rapid sequential calls. Per-subtype templates cover every destructive subtype the runtime emits; missing-template would fail CI. - Chat session manager (
src/chat/session.ts). Orchestrates the full clarify -> propose -> approve -> execute -> summarize -> save loop. Threads detector sensitivity (from learned profile) through to the runtime. Routes destructive prompts to the user and back to the runtime gate. Save-brief writes a v0.x-compatible YAML-frontmatter markdown file with the source conversation as a blockquote. - Chat store (
src/state/chat-store.ts). NDJSON append-only at~/.openwar/chats/<chat_id>.ndjson. First line is achat_session_startedheader withschema_version: 1. Mismatch on resume raises a typed error with remediation. Same shape contract as v0.8 trace files. - Slash commands (
src/chat/commands.ts)./help,/save,/inspect,/history,/resume,/abort,/quit. Path-vs-command heuristic:/index.htmlis treated as text, not as an unknown command, so users can naturally reference paths mid-conversation. - Project memory + learned profile integration (
src/chat/context.ts). At session start, loads recent project memory entries and the learned profile (if any) and surfaces them to the conversation agent. When a learned profile exists, the chat session stampslearned_profile: <slug>into the compiled brief's frontmatter so the runtime applies the profile at execution time (per v0.9.1 contract). - Three new trace event types:
chat_session_compiled(emitted into the brief's trace at session start when a run is chat-originated, via the newRunOptions.chatIdfield),chat_session_resumedandchat_brief_saved(defined for forward-compat; primary persistence is the chat-store NDJSON). - 133 new tests across
tests/chat/*.test.ts,tests/cli/chat-cli.test.ts, and the headlinetests/integration/chat-full-cycle.test.ts(full chat -> propose -> approve -> execute -> save -> replay loop). Total 713 (was 580 at v0.9.1). Right in the brief's 700-720 target range. docs/chat.md(new). Operator guide. Walkthrough, flag surface, conservative-auth invariant, slash commands, saved-brief replay semantics, project memory + learned profile integration, intent contract.- README "New in v0.10" section. The chat path is added to the quickstart without touching the existing hero pitch.
- Library exports for integrators: intent contract types, compiler, plan presenter, renderer, session manager, chat-store reader/writer.
Design notes (Phase 0 deviations approved)
- Split into v0.10.0 + v0.10.1. Functional chat layer ships now; README hero rewrite and mid-tool-call cancellation wait for adoption signal. Same pattern as v0.7 / v0.9 splits.
- Tool-call intent extraction, not free-text classification. Free-text would drift at scale. The four-tool contract is testable from day one with adversarial fixtures.
- cli-bridge incompatible with conversation agent (architectural). cli-bridge does not surface tool-call events to OpenWar; the bridged binary's stdout is free text. We picked option (c) hybrid: the conversation agent uses a tool-call-capable BYOK adapter; the execution adapter can still be cli-bridge for free local Claude Code use. Stronger than the brief's binary (a)/(b) options.
- Conservative authorization is load-bearing. Adversarial fixtures pin every destructive category individually. The plan presenter explicitly lists "Not authorized" with consequence sentences so the user sees what is excluded as visibly as what is included.
- Off-topic mid-conversation: single-task focus (option D, not in brief). Agent says "I'm focused on X right now. After it's done I can help with Y. Want me to remember it for after?" rather than compiling a second brief in the same session.
- Polite abort only in v0.10.0. Mid-tool-call cancellation deferred to v0.10.1.
- Save-brief replay semantics explicitly documented. "Replays deliverables on the named project; if repo state has drifted, the agent may need different actions." Header in the file + paragraph in docs/chat.md.
- Determinism scoping honest. Compiler is pure; conversation feeding it is stochastic. Docs say so.
- Path-vs-command heuristic (caught during integration testing). Real slash commands are single-word
[a-z]+; paths like/index.htmlor/usr/local/binare treated as user text and routed to the conversation agent.
Out of scope (deferred to v0.10.1)
- README hero rewrite (positioning change pending adoption signal).
- Mid-tool-call cancellation (
/abortis polite in v0.10.0). - Multi-channel chat surfaces (Discord, Slack, Telegram).
- Streaming responses during agent turns.
Notes for forkers and War Room integrators
- v0.10.0 is fully backward compatible with v0.9.x and all earlier versions. Existing briefs run identically. Existing scripts wrapping
openwar runare unaffected. The newopenwar chatsubcommand is additive. - The chat store at
~/.openwar/chats/<chat_id>.ndjsonis a separate persistence stream from~/.openwar/sessions/<brief_id>.trace.ndjson. Library consumers can ingest both viareadChat()andreadTrace()exports. - A chat-originated brief's trace stamps
chat_session_compiledwith the originating chat id;openwar inspect <brief_id> --traceshows the correlation.chat_brief_savedandchat_session_resumedalso mirror into the most recent brief's trace when one is active, so all three chat-correlation events surface via--trace.
Post-commit fixes
After the initial commit, three corners were caught during honest self-review and patched before publisher push:
- Windows readline handling. The brief budgeted explicit work for Windows-specific readline behavior. Initial commit shipped without it. Patched: SIGINT handler that closes readline cleanly so Ctrl-C routes through the
/quitpath with a saved-session banner; EOF on piped stdin treated identically to/quit; CRLF line endings parsed correctly;runChatCommandnow accepts optionalstdin/stdoutoverrides for embedders and programmatic test harnesses. Five new tests intests/cli/chat-readline.test.tspin the cross-platform behavior (EOF routing, /quit on pipes, /help on pipes, CRLF input, 250-entry history volume). - Adversarial agent-drift coverage at the session level. Initial commit had parser-level adversarial f...
v0.9.1 — adaptive autonomy plumbing (conservative defaults)
0.9.1
Adaptive autonomy plumbing with conservative defaults. The plumbing is the deliverable; the thresholds are the patch-release dial.
v0.9.0 deferred the prescriptive layer of adaptive autonomy because the data foundation did not yet exist. v0.9.1 reframes the question: the threshold values needed real distributions to calibrate against, but the structural work (profile schema, runner integration, detector sensitivity wiring, new trace events) did not. v0.9.1 ships the plumbing with thresholds set high enough that the system is effectively a no-op for the first nine runs against any project. The first usable recommendation arrives around run 10. v0.9.2+ patch releases tune the constants in src/state/heuristics.ts once real distributions surface.
Added
openwar learn <slug>subcommand (src/cli/learn.ts). Reuses the v0.9.0 history aggregator, applies heuristic recommendation generators with conservative thresholds, prints a candidatelearned.json(default) or writes it to disk (--apply). Flags:openwar learn <slug>(dry run)openwar learn <slug> --apply(write)openwar learn <slug> --reset(delete existing profile)openwar learn <slug> --since <ISO>(filter trace window)openwar learn <slug> --min-samples <N>(floor 5; default 10)openwar learn <slug> --emit-frontmatter(print paste-into-brief YAML)
src/state/learned-profile.ts: profile schema (schema_version: 1), atomic save (tmp+rename), schema-version check that raises a typedLearnedProfileSchemaErroronMISSING_VERSION/VERSION_MISMATCH/PARSE/SHAPErather than silently defaulting.src/state/heuristics.ts: conservative-threshold constants with one paragraph each explaining what would justify lowering them. v0.9.1 values:DETECTOR_LOOSE_FIRE_RATE_BAR=0.85,DETECTOR_LOOSE_MIN_SAMPLES=10,DETECTOR_DISABLED_FIRE_RATE_BAR=0.95,DETECTOR_DISABLED_MIN_SAMPLES=20,PHASE_BUDGET_MIN_SAMPLES=10,PHASE_BUDGET_FORMULA="p90+5",DEAD_TOOL_MIN_SAMPLES=10. Pinned bytests/state/heuristics.test.tsso accidental tuning during refactors fails CI.DETECTOR_SAFETYregistry:blocker,destructive,completion,confirmationaresafety_critical: true;banned_phrasesandphase_markerare false.disabledis blocked for safety-critical detectors. The consultation record still surfaces the attempted override.- Detector sensitivity refactor: each detector's exported function gains an optional
sensitivity: Sensitivityparameter (defaults to"default"; current behavior).looserequires stricter signal per detector (e.g., explicit Phase 2 marker for blocker; banned_phrases count >= 2; explicit Phase 4 for completion; explicit Confirmation Summary marker for confirmation; imminent-action marker for destructive).strictis a TODO marker; treated as default until v0.9.2+. ThesnapshotWithConsultations()dispatcher centralizes thedisabled+ safety-critical gate and produces aDetectorConsultationaudit list. - Brief frontmatter: optional
learned_profile: <slug>field. Explicit-only loading; the runner does NOT auto-discover a profile from the project slug even if one exists on disk. - Runner integration: when
learned_profile:is set, the runner loads the profile, threads aDetectorSensitivityMapthroughrunExecute, applies the execute-phase budget tomaxSteps, and emitslearned_profile_appliedonce at session start. Missing profile is a soft warning; schema mismatch is a hard remediation message. - Three new trace event types (additive to the v0.8 union):
learned_profile_applied: once per session at profile load. Counts detector overrides, phase budgets, and dead-tool callouts.learned_sensitivity_consulted: per detector consultation with non-default sensitivity. Records sensitivity value and whether the detector fired or was suppressed.learned_budget_consulted: at execute-phase enter. Carries recommended budget, applied value, and source (learned/brief/default).
openwar inspect <brief_id> --learned: brief-scoped view that combines the on-disk profile for the brief's project slug with consultation history from the brief's trace events. Renders detector overrides + phase budgets + tool usage + consultation summary in column-pinned tables.- 66 new tests (
tests/state/heuristics.test.ts,tests/state/learned-profile.test.ts,tests/detectors/sensitivity.test.ts,tests/cli/learn.test.ts,tests/runner/learned-profile-apply.test.ts,tests/cli/inspect-learned.test.ts). Total 580 (was 514 at v0.9.0). Right in the brief's 560-580 target. - Library exports for integrators: heuristics constants, profile load/save, sensitivity map projection, learn subcommand, inspect-learned formatter.
Design notes (Phase 0 picks)
- Reframed scope from "adaptive autonomy" to "plumbing with conservative defaults". The previous v0.9.0 deferral applied to threshold values, not to runtime plumbing. Building the plumbing now lets v0.9.2+ become a thresholds-only patch.
- No threshold constants were tuned during Phase 1 development. Every value matches the brief.
- Determinism: profile saves are deterministic via
stringifyDeterministic; same trace inputs produce byte-identical files (modulogenerated_at).source_runssorted lexicographically bybuildLearnedProfile. - Detector refactor is fully backward-compatible:
sensitivityparameter defaults to"default"so all existing detector callers and the v0.8 / v0.9.0 test suite pass unchanged. The 514 prior tests stay green. - Multi-agent coordinator does not consume learned phase budgets in v0.9.1 (different budget primitives; revisited in v0.9.2+). Detector sensitivities still apply via the coordinator's executor path.
Out of scope (deferred to v0.9.2+)
- Threshold tuning against observed real-world distributions.
- Per-detector
strictsemantics (parameter accepted, treated as default). - Multi-agent coordinator budget integration.
- Auto-recommendation expiry / age-off.
- A/B harness for sensitivity tuning.
- OpenTelemetry exporter for the three new event types.
Notes for forkers and War Room integrators
- v0.9.1 is fully backward compatible with v0.9.0 and v0.8.x. Briefs without
learned_profile:behave identically. The detector refactor preserves default-sensitivity behavior bit-for-bit. - War Room integrators can read learned profiles via
loadLearnedProfile(slug)from the library entry point and consume the three new trace events throughreadTrace(brief_id). - Operators on a fresh install will see "this profile is effectively a no-op at current sample size" until they accumulate ~10 runs against a project slug. This is intentional; expect v0.9.2 to tune the constants once real distributions surface.
v0.9.0 — openwar history (descriptive analytics)
0.9.0
openwar history: descriptive analytics over accumulated v0.8 traces. Read-only by design.
Originally scoped as "adaptive autonomy" with detector sensitivity overrides, recommended phase budgets, and a runtime-applied learned_profile. The brief was split during Phase 0 review on 2026-05-18 because the data foundation did not yet exist: v0.8.0 had landed three hours earlier and zero real traces accumulated against any project. Adapting against synthetic or thin samples would have shipped wrong-shaped defaults baked into runtime behavior. The original brief's own anti-gaming clause warned against this exact failure mode.
v0.9.0 ships the inspection layer. v0.9.1 will ship the prescriptive layer once one to two release cycles of v0.9.0 have accumulated real traces and we can calibrate heuristics against actual distributions.
Added
openwar history <project_slug>subcommand. Reads every trace.ndjson whose session metadata carries the slug, computes:- Per-tool call counts + last-used timestamps + "dead" flag (zero calls when sample >= 3).
- Per-phase tool-call P50 / P90 / max, summed across runs. Tool-call attribution uses a most-recent-
phase_enterwalker so calls fall into the right bucket. - Per-detector total fires + fires-per-run + runs-with-fire.
- Per-phase total + average
duration_msfromphase_exitevents. - Operator-readable notes: thin-sample warnings, dead-tool callouts, corrupted-line totals, v0.9.0-is-descriptive-only banner.
--since <ISO>filter,--min-samples Nthreshold (>= 2),--jsondeterministic output.
openwar inspect <brief_id> --history. Brief-scoped surface: looks up the session's project slug and renders the same history report.docs/learning.md(new). Locks per-detector false-positive semantics for v0.9.1 even though v0.9.0 does not use them. Half a day of design work now while the question is fresh is cheaper than rebuilding the analysis in v0.9.1 against muscle-memory assumptions. Also documents the v0.9.0 vs v0.9.1 scope split and the safety-critical flag plan.- 24 new tests (
tests/state/history.test.ts,tests/cli/history.test.ts,tests/cli/inspect-history.test.ts). Total 514 (was 490 at v0.8.0). Math correctness, determinism guarantees, filter semantics, schema_version anchoring, traceless-session reporting, brief-to-project lookup. - Library exports (
src/index.ts):summarizeRun,aggregateRuns,buildHistoryReport,runHistory,formatHistoryReport,quantile,stringifyDeterministic, plus theRunSummary/HistoryReport/ row types. Integrators (War Room, etc.) can build their own reporting layers on top.
Design notes (Phase 0 deviations approved)
- Renamed from "adaptive autonomy" to "history". A capability whose first impression is "tells you what your runs look like" should not ship under a name that promises runtime behavior change. v0.9.1 reclaims "adaptive autonomy" when it actually adapts.
- No
learned_profileschema, no runner integration, no detector sensitivity refactor, no new trace events. All deferred to v0.9.1. v0.9.0 carries no forward-compat stubs in the schema either; cleaner to add fields in v0.9.1 with real data informing their shape. - The only confident heuristic in v0.9.0 is dead-tool detection. Everything else is descriptive math (counts, quantiles, sums) with no thresholds attached. P50 + 1.5 * IQR for phase budgets is deferred because the IQR shape on real long-tail distributions is unknown.
- Phase-attribution walker built now, inherited by v0.9.1. Tool calls credit to the most-recent
phase_enter. v0.9.1's budget math reuses the same walker. - Determinism is load-bearing.
source_runsarrays sort lexicographically. JSON output goes throughstringifyDeterministicwith sorted object keys. Same trace inputs produce the same report bit-for-bit (modulogenerated_attimestamp). Tested intests/state/history.test.ts.
Out of scope (deferred to v0.9.1 or later)
openwar learnsubcommand and thelearned.jsonprofile schema.learned_profile:brief frontmatter field.- Detector sensitivity overrides (loose / strict / disabled).
- Recommended phase budgets.
- Runner-side application of any of the above.
- The three planned trace events (
learned_profile_applied,learned_sensitivity_consulted,learned_budget_consulted). - Recommendation expiry, A/B harness for sensitivity tuning, cross-project learning.
Notes for forkers and War Room integrators
- v0.9.0 is fully backwards compatible with v0.8.x. No new brief frontmatter fields. No runtime behavior changes. Existing sessions inspect identically; the new
--historysurface is purely additive. - Operators on v0.8.x can upgrade to v0.9.0 with no migration cost. Accumulated v0.8 traces are immediately usable as history input.
- v0.9.1 (when it ships) will use the same trace format and the same phase-attribution walker; profiles will read this v0.9.0 history data plus locked FP semantics from
docs/learning.md.
v0.8.0 — Observability and tracing
0.8.0
Observability and tracing. The first version that gives operators (and integrators like War Room) the structured data they need to actually understand what their agents are doing. Everything before v0.8 was about getting the runtime to behave correctly. v0.8 makes its behavior visible.
This release was scoped against two real Windows live tests on 2026-05-17 and 2026-05-18 that surfaced five observability gaps: invisible MCP call lifecycle, ambiguous permission-layer source on failure, invisible MCP server liveness, invisible phase timing, silent settings-merge failure modes. Each is closed by a specific event type in the new trace stream.
Added
- Structured trace event stream at
~/.openwar/sessions/<brief_id>.trace.ndjson. One JSONL event per line, append-only, schema-versioned via atrace_versionheader event on the first line. 19 event types covering phase transitions, tool calls, auth decisions, detector fires, role invocations, budget thresholds, sub-task state, coordinator state, MCP server lifecycle (started, shutdown, dispatched, completed;mcp_call_pendingtype defined, real-time emission lands in v0.8.x), settings-merge attempts and outcomes, and errors. openwar inspectextensions:--trace,--trace --tail N,--trace --full,--timing,--cost,--cost --dollar-per-1k <rate>,--detectors,--tools,--mcp. Each prints a focused table. The dashboard reuses the same formatters so column shape stays in sync between CLI and web view.openwar replay <brief_id>subcommand. Re-runs recorded assistant turns through CURRENT detector code, emits[replay]-prefixed output, halts at Phase 2 markers in the transcript (same shape as the original run), exits 1 when current detectors disagree with the recorded trace (drift). Useful for detector-regression CI gates and for demonstrating runs without paying for compute.openwar dashboardsubcommand. Opt-in local HTTP dashboard, default port 8780, bound to the IPv4 literal127.0.0.1(avoids Windows IPv6 resolution surprises). Zero outbound network calls. Zero third-party dependencies. Vanilla HTML over a single CSS block. Per-session views for summary, timing, cost, detectors, tools, mcp, and the raw trace.OPENWAR_SESSIONS_DIRenvironment variable. Overrides the default<OPENWAR_HOME>/sessionslocation wholesale. Lets integrators relocate the session store and gives tests a clean way to point at a tmpdir.docs/observability.md. Operator guide. Event reference, inspect modes, replay semantics, dashboard, file layout.- 40 new tests (
tests/state/trace.test.ts,tests/state/trace-seams.test.ts,tests/cli/inspect.test.ts,tests/cli/replay.test.ts,tests/dashboard/server.test.ts). Total now 490 (was 450 at v0.7.3). Every event type has a round-trip case. Inspect formatters pin column shape. Dashboard verified to bind 127.0.0.1 only and make zero outbound network calls.
Design notes (Phase 0 deviations approved)
- NDJSON appends use
fs.appendFileSyncper event, not tmp+rename. The original brief specced "same atomicity as the transcript (tmp+rename per append)." That conflated transcript atomicity (low-frequency message persistence) with trace atomicity (high-frequency event log). Right invariant is "any complete line is a complete event"; appendFileSync gives that and scales O(1) per event. trace_versionheader event is the first line of every trace file. v0.9 will add fields; without a schema version marker, replay would silently misinterpret old traces.call_idthreaded throughmcp_call_*events. Concurrent MCP calls would otherwise be uncorrelatable in the trace.- Replay re-runs detectors against the recorded transcript. Not playback of recorded detector results. Recorded trace is reference data for drift comparison, not the script. This is what makes replay useful for detector regression testing.
- Dashboard = inspect-as-HTML. Single source of truth across the on-disk text view and the web view. Four renderers collapse to one; bug fixes land once.
Out of scope (per the brief)
- Remote telemetry / cloud aggregation. Local-first.
- OpenTelemetry adapter. v0.8.x stretch if real demand.
- Real-time streaming dashboard. Files-on-demand. WebSocket live updates wait until at least v0.8.x.
- Real-time
mcp_call_pendingemission. Requires subprocess-side tracing wired intoopenwar mcp-serve; the event type is defined so consumers can code against it now. Emission lands in v0.8.x. - Multi-user dashboard authentication. Single operator, localhost-bound.
- Auto-pruning of old trace files. Operator manages disk usage manually.
Notes for forkers and War Room integrators
- The trace file lives sibling to the transcript and session-state files. Existing v0.7.x sessions (no trace) inspect gracefully:
openwar inspect <id> --traceprints a "no trace events; sessions written before v0.8 are transcript-only" notice and exits 0. - War Room integrators consuming the OpenWar library can
import { Tracer, readTrace } from "@pythonluvr/openwar"and pump trace data into their own observability stack. OpenWar itself stays silent on the wire.