Skip to content

Releases: PythonLuvr/openwar

v0.13.0

21 May 01:41

Choose a tag to compare

Use OpenWar from any tool that speaks the OpenAI API.

v0.13.0 ships openwar serve --openai-compat, an OpenAI Chat Completions HTTP server in front of OpenWar's runtime. Any tool that speaks OpenAI's API (Aider, Continue, Cline, the OpenAI SDKs, homegrown wrappers) points at the proxy with one env-var change and runs through OpenWar's phase-gated, traced, detector-enforced execution. The tool does not know OpenWar exists.

This is the MVP cut. Plain-text streaming and non-streaming chat completions work end-to-end against any upstream adapter. Tool round-trip and PermissionBridge negotiation land in v0.13.1.

What ships

openwar serve --openai-compat

A new CLI subcommand. Hand-rolled node:http server with a hand-rolled SSE encoder. Zero new dependencies.

Endpoints:

  • POST /v1/chat/completions: streaming SSE and non-streaming JSON.
  • GET /v1/models: returns the configured upstream as a single model entry.
  • GET /healthz: liveness probe.
  • All other paths return an OpenAI-shaped 404.

Security defaults

  • Localhost-default bind (127.0.0.1). Binding to 0.0.0.0 requires explicit intent and warns at startup.
  • Bearer-token auth with constant-time comparison. OpenAI-shaped 401 on failure. --no-auth exists for local dev and warns every startup.
  • Conservative authorized_costs default (filesystem_read only). The startup banner explains the expansion pattern for agentic clients.
  • --max-concurrent gate (default 4). Excess requests get an OpenAI-shaped rate_limit_error 429.

Observability

  • Every proxied request writes a trace at ~/.openwar/sessions/proxy-<uuid>.trace.ndjson.
  • Every response carries an X-OpenWar-Trace-Id header for correlation. Standard OpenAI clients ignore it; OpenWar-aware tooling can read it and run openwar inspect.
  • Two new trace event types: proxy_request and proxy_response. TRACE_SCHEMA_VERSION bumps from 4 to 5. Additive.

Phase machine in proxy mode

Each request synthesizes an in-memory brief (mode: auto, scope_locked: true). Phase 0 and Phase 4 run without operator prompts (the proxy cannot block on stdin). Phase 3 still fires; v0.13.0 ships the denial path (a refusal-text completion with finish_reason: content_filter when a destructive action is blocked). Full PermissionBridge negotiation lands in v0.13.1.

Upstream composition

The proxy routes the actual completion to any configured upstream adapter: Anthropic, OpenAI, Gemini, Grok, openai-compat, or cli-bridge. When the upstream is cli-bridge, the proxy warns about per-request CLI cold-start cost and recommends --max-concurrent 1.

Stats

  • 909 tests (up from 849). 60 new across tests/serve/.
  • Coverage gates green.
  • Zero new runtime dependencies. node:http, hand-rolled SSE, node:crypto.randomUUID.

Deferred to v0.13.1

  • Tool-call round-trip (request tools to response tool_calls). v0.13.0 records the tool count in the trace but does not dispatch tools.
  • PermissionBridge negotiation via openwar:request_permission tool calls.
  • Comprehensive Continue / Cline / Cursor integration examples.

Upgrade notes

Drop-in. npm update @pythonluvr/openwar from v0.12.1 picks this up automatically. No existing CLI command, library export, or trace event shape changes. The openwar serve subcommand and the two new trace events are additive.

To use the proxy: run openwar serve --openai-compat --auth-token <token>, then point your OpenAI-API tool at http://127.0.0.1:1234/v1 with the token as its API key. Full setup, the security model, and the supported request surface are documented in docs/openai-proxy.md.

v0.12.1

20 May 13:58

Choose a tag to compare

The runtime now sees what the bridged CLI sees.

v0.12.1 wires up Squire 1.1.0's structured event surface. The runtime captures every tool the bridged Claude Code or Gemini CLI invokes, the arguments, the result, the thinking-mode tokens, and the usage report. What used to disappear into a stdout text blob now flows through OpenWar's trace, inspect, and cost ledger.

What ships

Four new StreamEvent variants (additive, backwards compatible)

  • bridged_tool_call: a tool fired inside the bridged CLI's own run. Carries tool_name, arguments, call_id, and the bridged binary.
  • bridged_tool_result: the result of a bridged tool call. Carries call_id, result, is_error, and the bridged binary.
  • bridged_thinking_delta: thinking-mode tokens emitted by the bridged CLI. Kept distinct from text_delta so observers can filter them.
  • bridged_usage: token counts from the bridged CLI. Input, output, cache read, cache write fields.

The bridged_ prefix is load-bearing. It distinguishes events that came from inside a bridged CLI's run from events OpenWar's own runtime fired.

Four new trace event types

Same four shapes flow into trace.ndjson. TRACE_SCHEMA_VERSION bumps from 3 to 4. Additive; old readers ignore unknown event types.

Shared bridged-event router

New convergence point at src/runtime/bridged-events.ts. Single-agent (execute.ts) and multi-agent (driver.ts) dispatch paths route bridged events identically: trace event always, cost-ledger feed when a ledger exists. Closes the single-agent observability gap (sessions without a coordinator still get the trace).

Cost tracker integration

addBridgedUsage(usage, u) helper feeds bridged-CLI tokens into the cost ledger. Input + output count toward tokens_used for budget arithmetic. Cache reads and writes are recorded on dedicated bridged_tokens_cache_read / bridged_tokens_cache_write counters but excluded from tokens_used. Cache reads bill at a fraction of normal input rate; inflating the budget total with them would trip --max-tokens gates prematurely.

openwar inspect --tools grouping

Output now groups by source:

Native tool calls (OpenWar runtime)
  ...

Bridged CLI tool calls (from inside the bridged CLI's run)
  ...

Native first when both are present. Chronological within each section.

Test fixtures

Snapshot copy of Squire 1.1.0's real Claude Code and Gemini CLI fixtures at tests/fixtures/squire-snapshot/. README documents Squire 1.1.0 as the version snapshot point; future Squire releases that touch structured-event shapes will require a re-sync.

Stats

  • 849 tests (up from 819). 30 new across tests/adapters/, tests/state/, tests/coordinator/, tests/cli/.
  • Coverage gates: every tracked dir above 85% line coverage. src/state/trace.ts at 100%.
  • Zero new runtime dependencies. Squire stays pinned at ^1.1.0.

Upgrade notes

Drop-in. npm update @pythonluvr/openwar from v0.12.0 picks this up automatically. Existing StreamEvent consumers continue to work unchanged; the four new variants are additive. Existing cost-tracker behavior is unchanged for native-adapter runs. The discipline layer does not change. The observability layer does.

Operators running bridged Claude Code or Gemini CLI sessions before this release saw stdout text in their traces; after upgrading, they see structured tool calls. No code change required on the operator side.

v0.12.0

20 May 12:21

Choose a tag to compare

PermissionBridge turns Phase 3 into a conversation.

v0.12 makes the destructive-action gate articulate. Bridged CLIs (and any tool-calling agent) can now call request_permission before a destructive action with a structured payload (action, scope, reasoning, fallback). The operator answers at one of three scopes: this_call covers one upcoming destructive tool call, this_session lasts until the run ends, persistent survives across sessions via a per-project JSONL store. Phase 3 honors matching grants on the subsequent destructive call without re-prompting, emits permission_grant_consumed for the audit trail, and still fires when no grant matches.

PermissionBridge does not relax the gate. It makes the gate articulate.

What ships

New native tool: request_permission

Exposed via MCP-server-mode as openwar:request_permission. Default-allowed (requesting permission is itself never destructive). Input takes action, scope (this_call / this_session / persistent), reasoning, and optional fallback and category. Output returns granted, scope_granted, operator_note, and grant_id.

Grant ledger

Per-session GrantLedger lives on the active Session. Persistent grants serialize to ~/.openwar/projects/<slug>/permission_grants.jsonl (append-only). Persistent grants are seeded into the in-memory ledger at session start for the active project_slug.

Phase 3 integration

Phase 3's destructive-detector decision path checks the grant ledger before firing the operator prompt. Matching grants emit permission_grant_consumed and a synthesized auth-allow event, skipping the prompt. Non-matching destructive calls fire the existing Phase 3 halt path. Category-only matching; the most recent unconsumed this_call grant wins over this_session / persistent.

Chat REPL prompt

Permission request from agent:
  ACTION    Delete the file src/legacy.ts
  REASON    File is unreferenced; cleaning up before the refactor.
  FALLBACK  Skip the cleanup; refactor proceeds with the file present.
  CATEGORY  filesystem_write
  REQUESTED SCOPE  this_call

Approve at what scope?
  y         grant at requested scope (this_call)
  s         grant for the rest of this session
  p         grant persistently (saved to project memory)
  n         deny
  n: <msg>  deny with a note for the agent
>

Slash commands: /grants lists active grants, /revoke <grant_id> invalidates a grant mid-session.

Public API additions (all additive, backwards compatible)

  • Session.listActiveGrants(): readonly Grant[]
  • Session.revokeGrant(grant_id: string): boolean
  • SandboxContext.io?, SandboxContext.grantLedger?, SandboxContext.tracer? (optional fields; existing tools ignore)
  • Five new trace event variants: permission_requested, permission_granted, permission_denied, permission_grant_consumed, permission_revoked
  • TRACE_SCHEMA_VERSION bumped 2 → 3 (additive; old readers ignore unknown event types)
  • openwar inspect <brief_id> --permissions shows a per-grant audit row
  • New library export: formatPermissions

Stats

  • 819 tests (up from 769). 50 new tests across tests/tools/request-permission.test.ts, tests/runner/grants.test.ts, tests/runner/permission-grant-consumption.test.ts, tests/cli/chat-permission-prompt.test.ts, tests/state/permission-trace-events.test.ts, tests/cli/inspect-permissions.test.ts.
  • Coverage gates: every tracked dir above 85% line coverage. src/state/trace.ts and src/state/heuristics.ts at 100%.
  • Zero new runtime dependencies. AbortController, node:crypto.randomUUID, and the existing MCP server infrastructure cover everything.

Upgrade notes

Drop-in. npm update @pythonluvr/openwar from v0.11.2 picks this up automatically. Existing programs see no behavior change; the Session interface gains additive methods, SandboxContext gains optional fields, and the new native tool is opt-in. Phase 3 with no active grants behaves exactly as in v0.11.2. Existing trace consumers continue to work; new event types are additive.

Persistent grants live until explicitly revoked. No TTL, no expiration. Operators auditing past behavior can run openwar inspect <brief_id> --permissions for a full grant history per session.

v0.11.2

19 May 07:18

Choose a tag to compare

Patch release. Restores TypeScript build compatibility with the just-shipped Squire v1.1.0.

What broke

Squire v1.1.0 added four new SquireEvent variants (tool_call, tool_result, thinking_delta, usage) for its new vendor-aware adapters. OpenWar's cli-bridge.ts does exhaustive narrowing on SquireEvent.type via a never assignment. The new variants caused a compile-time error for anyone building OpenWar from source once npm install resolved ^1.0.0 to 1.1.0. End users running the precompiled dist/ were unaffected; contributors, forks, and fresh CI clones were not.

What v0.11.2 does

Explicit no-op case arms for the four new Squire variants in the cli-bridge.ts event translator. The exhaustive never check is preserved so any future additive Squire release surfaces the same way (compile-time call to add an arm). The Squire dep range moves from ^1.0.0 to ^1.1.0 so the lockfile pins against the version that introduced the variants.

OpenWar does not yet translate the new structured events (tool_call, tool_result) into its own StreamEvent surface. Adoption of the richer tool-call surface is a separate decision tracked for v0.12 or later.

Stats

  • 769 tests green against @pythonluvr/squire@1.1.0.
  • Cross-platform CI matrix (Ubuntu / macOS / Windows × Node 20 / 22): green.
  • Sanity-regex gate, em-dash gate, coverage gates: green.

Upgrade notes

Drop-in. npm update @pythonluvr/openwar from v0.11.1 picks this up automatically and resolves Squire to 1.1.0. No code changes required on the consumer side.

v0.11.1

18 May 17:34

Choose a tag to compare

Talk to your agent. The runtime keeps the phases honest, the destructives gated, and the trace intact.

Two honest gaps closed without expanding scope, plus the published-Squire adoption that unblocks Squire's first real-world consumer claim on npm.

Mid-tool-call cancellation

v0.11.1 makes the runtime survive a slow tool call. Ctrl-C in the chat REPL aborts the in-flight tool, surfaces a structured cancelled tool-result to the model, and writes a tool_cancelled event to the trace. A second Ctrl-C within 2 seconds exits cleanly with the chat log saved. shell_exec gets a 3-second SIGTERM grace before SIGKILL. apply_patch rolls back already-written files. MCP servers that ignore the abort get 5 seconds before the runtime synthesizes the cancellation locally. The phase machine continues; the agent decides whether to retry or switch approach. Zero new runtime deps; everything is AbortController / AbortSignal from Node stdlib.

README hero rewrite

The README now leads with openwar chat and the conversational front door, ordered before the discipline-not-intelligence framing. The "Just talk to it" section is renamed "Start with a conversation" and shows a one-input / one-output sample turn. WarBit chaos imagery moves out of the hero into the contrast section below.

Published Squire adoption

@pythonluvr/squire@1.0.0 is now consumed from the npm registry instead of a local workspace link. No source change required (the import path was already correct); the lockfile resolves from the registry. OpenWar is officially Squire's first real-world consumer on npm.

Public API additions (all additive, backwards compatible)

  • RunOptions.signal?: AbortSignal
  • RunOptions.onSession?: (session: Session) => void
  • Session.cancelCurrentToolCall(): Promise<boolean>
  • SandboxContext.signal?: AbortSignal for custom-tool authors
  • tool_cancelled trace event variant
  • TRACE_SCHEMA_VERSION bumped from 1 to 2 (old readers ignore unknown event types)

Stats

  • 32 new tests across tests/tools/cancellation.test.ts, tests/runner/cancel.test.ts, tests/cli/chat-cancel.test.ts, tests/state/tool-cancelled-event.test.ts. Total: 769 (up from 737).
  • Coverage gates: every tracked dir above 85% line coverage. src/state/trace.ts and src/state/heuristics.ts at 100%.
  • Sanity-regex gate, em-dash gate, cross-platform CI matrix: green.

Upgrade notes

Drop-in. npm update @pythonluvr/openwar picks this up from v0.11.0 automatically. No code changes required to opt out of cancellation; the new RunOptions fields are optional. The trace schema bump is additive: existing trace consumers continue to work; new ones can read the tool_cancelled event directly.

v0.11.0 — cli-bridge powered by @pythonluvr/squire

18 May 14:55

Choose a tag to compare

0.11.0

cli-bridge becomes a thin wrapper over @pythonluvr/squire, a new standalone npm package extracted from this codebase. Public behavior is unchanged: openwar run brief.md produces identical traces, identical phase events, identical tool-call shapes before and after the split. The architectural change is for discoverability. Squire ships its own README, its own front door, and lets developers searching "run Claude Code from Node.js" or "orchestrate multiple CLI agents" land on a focused tool with a clean API instead of a buried module path inside OpenWar.

Originally scoped as one ship covering structured event streaming for Claude Code / Codex / Gemini CLI. Phase 0 review caught that the brief's claimed SquireEvent surface (tool_call, tool_result, message_start/stop) does not match the code that exists today (text-stream only, no per-CLI parsers). Shipping the full surface would have required building three new vendor JSON-stream parsers from scratch and snapshot fixtures from real CLI runs. Split into v0.11.0 (foundations + honest event union) and v0.11.1 (per-CLI parsers and the richer event union) on the same pattern as v0.7 / v0.9 / v0.10 splits. Four for four.

Added (Squire side, separate package)

  • @pythonluvr/squire v1.0.0. General-purpose runtime for spawning CLI AI agents (Claude Code, Codex, Gemini CLI) as subprocesses. MIT-licensed. Public API frozen at v1.0.0; additive changes only on the v1.x line. Zero runtime dependencies (Node stdlib only). Cross-platform (Windows / macOS / Linux). Bundled TypeScript types.
  • Squire class with start(prompt) / send(followup) / stop({graceful}) lifecycle plus stdout / stderr / event / exit events.
  • SquireEvent discriminated union: stdout, stderr, text_delta, message_start, message_stop, error. Honest about v1.0 scope; per-CLI adapters with tool_call / tool_result events arrive in v1.x.
  • MCP forwarding: pass mcp.servers (inline) or mcp.configPath (pre-built) and Squire wires --mcp-config <path> (or a configurable flag) for the child CLI. Temp config files are cleaned up on stop.
  • Claude Code permission auto-setup: autoSetup.claudeCode merges allowedTools patterns into ~/.claude/settings.json atomically while preserving everything else. Idempotent.
  • SquireAdapter interface exported as v1.0 contract for custom per-CLI parsers. registerSquireAdapter(adapter) for process-global registration.
  • Cross-platform spawn: Windows .cmd / .bat / extensionless-binary handling baked in. needsShell exported for callers that want to share the auto-detection logic.
  • Typed error surface (SquireError, SquireAutoSetupError) with stable code strings.

Added (OpenWar side)

  • @pythonluvr/squire as a runtime dependency. During local dev this is file:../squire; the publisher swaps to ^1.0.0 before npm publish.
  • docs/adapters.md "Powered by" section linking to the Squire repo and explaining the dependency split is purely architectural.
  • README "Powered by" footer linking to Squire as the underlying CLI-agent runtime.

Changed

  • src/adapters/cli-bridge.ts rewritten as a thin wrapper over Squire. Down from 330 lines of subprocess plumbing to ~210 lines of translation + serialization. The Windows quirks, timeout machinery, abort-signal handling, EPIPE recovery, and stderr buffering all move into Squire. addExtraArgs is preserved for the v0.7 MCP wiring.
  • OpenWar's StreamEvent shape at the public surface is unchanged. Existing cli-bridge tests that read events at OpenWar's boundary remain green.

Design notes (Phase 0 deviations approved)

  • Split into v0.11.0 + v0.11.1. Spawn + MCP forwarding + auto-setup ship as Squire v1.0.0 with an honest event union now; per-CLI vendor JSON-stream parsers (Claude Code, Codex, Gemini CLI) and the richer SquireEvent variants land in v1.1.0 as a minor additive bump once real CLI snapshot fixtures are captured.
  • Decoupling is enforced by import discipline. Squire's src/ has zero import statements referencing OpenWar. The standalone positioning fails if Squire's API can't be used without OpenWar's types; this gate makes the failure visible.
  • OpenWar's MCP server is NOT moving to Squire. Squire knows how to pass --mcp-config <path> to a child; building the config file (with project-memory tools, brief-aware authorization, registry-specific format quirks for Gemini's .gemini/settings.json) stays in OpenWar. That logic is genuinely OpenWar-specific.

Test count

OpenWar: no net change (existing cli-bridge tests stay green against the wrapper). Squire: ~40 new (spawn, events, adapters, MCP, autosetup, lifecycle). Target ~60 lands with v1.1.0 per-CLI adapter tests.

Publisher queue

Squire must publish first (OpenWar depends on it). Order:

  1. github.com/PythonLuvr/squire: push, tag v1.0.0, GitHub release, npm publish via WebAuthn.
  2. Swap OpenWar's package.json dep from file:../squire to ^1.0.0.
  3. github.com/PythonLuvr/openwar: push, tag v0.11.0, GitHub release, npm publish via WebAuthn.

v0.10.0 — openwar chat (the front door)

18 May 12:53

Choose a tag to compare

0.10.0

openwar chat: the front door. (Patch applied post-initial-commit to close three corners caught during honest self-review: Windows readline handling, adversarial agent-drift coverage at the session level, and full emission of all three chat trace events. See "Post-commit fixes" section below.) The runtime is the same; the entry point is new. A non-developer describes what they want in plain English, OpenWar asks clarifying questions if needed, proposes a plan, gets approval, executes through the existing phase machine, and surfaces destructive prompts as plain English questions instead of y/n flags. The audit trail underneath (trace.ndjson, phase events, detector log, learned profile) all still exists. Power users keep writing briefs by hand. Everyone else just talks.

Originally scoped as one ship. Split during Phase 0 review into v0.10.0 (functional chat layer) and v0.10.1 (positioning + UX refinements based on real adoption signal) on the same pattern as the v0.7 reorder and v0.9 split. Three for three on this pattern.

Added

  • openwar chat subcommand. Interactive readline session. Default conversation-agent adapter precedence: ANTHROPIC_API_KEY > OPENAI_API_KEY > GEMINI_API_KEY (or GOOGLE_API_KEY) > XAI_API_KEY > OPENAI_COMPAT_API_KEY. Hard error with install hint if none are set; cli-bridge stays fully supported for hand-authored briefs as the BYOK-free escape hatch. Flags: --resume <id|last>, --adapter, --model, --exec-adapter, --exec-binary, --project, --no-save.
  • Per-role adapter split for chat sessions. Conversation-agent adapter (must support tool calls) is separate from execution adapter. Default: same. Override execution to cli-bridge via --exec-adapter cli-bridge --exec-binary claude for free local execution on an existing Claude Code subscription while keeping intent extraction deterministic via a BYOK key.
  • Structured tool-call intent contract (src/chat/intent.ts). Conversation agent declares intent via four tool calls (ask_clarification, propose_plan, start_execution, summarize_result), not free text. Adversarial fixtures in tests/chat/intent.test.ts pin every failure mode (no_tool_call, multiple_tool_calls, unknown_tool, invalid_args, fabricated_approval). Drift counter falls back to a deterministic user question after 3 failed turns; hard-fails the session after 5 with a save-and-resume pointer.
  • Conservative-authorization compiler (src/chat/compile.ts). LOAD-BEARING INVARIANT: destructive categories (filesystem_delete, shell_exec, http_fetch, paid_api_call, git_write, git_push, deploy, external_message) are NEVER auto-granted. They route through Phase 3 at execution time so the user sees them as natural-language confirms instead of silent grants in a skimmed plan. SAFE_AUTOGRANT is {filesystem_read, filesystem_write} only; expanding it is a P0 regression. Adversarial fixtures pin each destructive category individually.
  • Plain-English plan presenter (src/chat/plan.ts). Three sections: Plan (bulleted), Authorized (plain-language descriptions of each cost category), Not authorized (explicit list with consequence sentences and a reassuring "I'll ask you in plain English first" note).
  • Phase event renderer (src/chat/render.ts) + destructive phrase templates (src/chat/destructive-phrases.ts). Translates runtime trace events to chat output. git push becomes "publish this change to your repository; that will push your local commit to the remote." Tool-call debouncing avoids spamming on rapid sequential calls. Per-subtype templates cover every destructive subtype the runtime emits; missing-template would fail CI.
  • Chat session manager (src/chat/session.ts). Orchestrates the full clarify -> propose -> approve -> execute -> summarize -> save loop. Threads detector sensitivity (from learned profile) through to the runtime. Routes destructive prompts to the user and back to the runtime gate. Save-brief writes a v0.x-compatible YAML-frontmatter markdown file with the source conversation as a blockquote.
  • Chat store (src/state/chat-store.ts). NDJSON append-only at ~/.openwar/chats/<chat_id>.ndjson. First line is a chat_session_started header with schema_version: 1. Mismatch on resume raises a typed error with remediation. Same shape contract as v0.8 trace files.
  • Slash commands (src/chat/commands.ts). /help, /save, /inspect, /history, /resume, /abort, /quit. Path-vs-command heuristic: /index.html is treated as text, not as an unknown command, so users can naturally reference paths mid-conversation.
  • Project memory + learned profile integration (src/chat/context.ts). At session start, loads recent project memory entries and the learned profile (if any) and surfaces them to the conversation agent. When a learned profile exists, the chat session stamps learned_profile: <slug> into the compiled brief's frontmatter so the runtime applies the profile at execution time (per v0.9.1 contract).
  • Three new trace event types: chat_session_compiled (emitted into the brief's trace at session start when a run is chat-originated, via the new RunOptions.chatId field), chat_session_resumed and chat_brief_saved (defined for forward-compat; primary persistence is the chat-store NDJSON).
  • 133 new tests across tests/chat/*.test.ts, tests/cli/chat-cli.test.ts, and the headline tests/integration/chat-full-cycle.test.ts (full chat -> propose -> approve -> execute -> save -> replay loop). Total 713 (was 580 at v0.9.1). Right in the brief's 700-720 target range.
  • docs/chat.md (new). Operator guide. Walkthrough, flag surface, conservative-auth invariant, slash commands, saved-brief replay semantics, project memory + learned profile integration, intent contract.
  • README "New in v0.10" section. The chat path is added to the quickstart without touching the existing hero pitch.
  • Library exports for integrators: intent contract types, compiler, plan presenter, renderer, session manager, chat-store reader/writer.

Design notes (Phase 0 deviations approved)

  • Split into v0.10.0 + v0.10.1. Functional chat layer ships now; README hero rewrite and mid-tool-call cancellation wait for adoption signal. Same pattern as v0.7 / v0.9 splits.
  • Tool-call intent extraction, not free-text classification. Free-text would drift at scale. The four-tool contract is testable from day one with adversarial fixtures.
  • cli-bridge incompatible with conversation agent (architectural). cli-bridge does not surface tool-call events to OpenWar; the bridged binary's stdout is free text. We picked option (c) hybrid: the conversation agent uses a tool-call-capable BYOK adapter; the execution adapter can still be cli-bridge for free local Claude Code use. Stronger than the brief's binary (a)/(b) options.
  • Conservative authorization is load-bearing. Adversarial fixtures pin every destructive category individually. The plan presenter explicitly lists "Not authorized" with consequence sentences so the user sees what is excluded as visibly as what is included.
  • Off-topic mid-conversation: single-task focus (option D, not in brief). Agent says "I'm focused on X right now. After it's done I can help with Y. Want me to remember it for after?" rather than compiling a second brief in the same session.
  • Polite abort only in v0.10.0. Mid-tool-call cancellation deferred to v0.10.1.
  • Save-brief replay semantics explicitly documented. "Replays deliverables on the named project; if repo state has drifted, the agent may need different actions." Header in the file + paragraph in docs/chat.md.
  • Determinism scoping honest. Compiler is pure; conversation feeding it is stochastic. Docs say so.
  • Path-vs-command heuristic (caught during integration testing). Real slash commands are single-word [a-z]+; paths like /index.html or /usr/local/bin are treated as user text and routed to the conversation agent.

Out of scope (deferred to v0.10.1)

  • README hero rewrite (positioning change pending adoption signal).
  • Mid-tool-call cancellation (/abort is polite in v0.10.0).
  • Multi-channel chat surfaces (Discord, Slack, Telegram).
  • Streaming responses during agent turns.

Notes for forkers and War Room integrators

  • v0.10.0 is fully backward compatible with v0.9.x and all earlier versions. Existing briefs run identically. Existing scripts wrapping openwar run are unaffected. The new openwar chat subcommand is additive.
  • The chat store at ~/.openwar/chats/<chat_id>.ndjson is a separate persistence stream from ~/.openwar/sessions/<brief_id>.trace.ndjson. Library consumers can ingest both via readChat() and readTrace() exports.
  • A chat-originated brief's trace stamps chat_session_compiled with the originating chat id; openwar inspect <brief_id> --trace shows the correlation. chat_brief_saved and chat_session_resumed also mirror into the most recent brief's trace when one is active, so all three chat-correlation events surface via --trace.

Post-commit fixes

After the initial commit, three corners were caught during honest self-review and patched before publisher push:

  1. Windows readline handling. The brief budgeted explicit work for Windows-specific readline behavior. Initial commit shipped without it. Patched: SIGINT handler that closes readline cleanly so Ctrl-C routes through the /quit path with a saved-session banner; EOF on piped stdin treated identically to /quit; CRLF line endings parsed correctly; runChatCommand now accepts optional stdin / stdout overrides for embedders and programmatic test harnesses. Five new tests in tests/cli/chat-readline.test.ts pin the cross-platform behavior (EOF routing, /quit on pipes, /help on pipes, CRLF input, 250-entry history volume).
  2. Adversarial agent-drift coverage at the session level. Initial commit had parser-level adversarial f...
Read more

v0.9.1 — adaptive autonomy plumbing (conservative defaults)

18 May 09:37

Choose a tag to compare

0.9.1

Adaptive autonomy plumbing with conservative defaults. The plumbing is the deliverable; the thresholds are the patch-release dial.

v0.9.0 deferred the prescriptive layer of adaptive autonomy because the data foundation did not yet exist. v0.9.1 reframes the question: the threshold values needed real distributions to calibrate against, but the structural work (profile schema, runner integration, detector sensitivity wiring, new trace events) did not. v0.9.1 ships the plumbing with thresholds set high enough that the system is effectively a no-op for the first nine runs against any project. The first usable recommendation arrives around run 10. v0.9.2+ patch releases tune the constants in src/state/heuristics.ts once real distributions surface.

Added

  • openwar learn <slug> subcommand (src/cli/learn.ts). Reuses the v0.9.0 history aggregator, applies heuristic recommendation generators with conservative thresholds, prints a candidate learned.json (default) or writes it to disk (--apply). Flags:
    • openwar learn <slug> (dry run)
    • openwar learn <slug> --apply (write)
    • openwar learn <slug> --reset (delete existing profile)
    • openwar learn <slug> --since <ISO> (filter trace window)
    • openwar learn <slug> --min-samples <N> (floor 5; default 10)
    • openwar learn <slug> --emit-frontmatter (print paste-into-brief YAML)
  • src/state/learned-profile.ts: profile schema (schema_version: 1), atomic save (tmp+rename), schema-version check that raises a typed LearnedProfileSchemaError on MISSING_VERSION / VERSION_MISMATCH / PARSE / SHAPE rather than silently defaulting.
  • src/state/heuristics.ts: conservative-threshold constants with one paragraph each explaining what would justify lowering them. v0.9.1 values: DETECTOR_LOOSE_FIRE_RATE_BAR=0.85, DETECTOR_LOOSE_MIN_SAMPLES=10, DETECTOR_DISABLED_FIRE_RATE_BAR=0.95, DETECTOR_DISABLED_MIN_SAMPLES=20, PHASE_BUDGET_MIN_SAMPLES=10, PHASE_BUDGET_FORMULA="p90+5", DEAD_TOOL_MIN_SAMPLES=10. Pinned by tests/state/heuristics.test.ts so accidental tuning during refactors fails CI.
  • DETECTOR_SAFETY registry: blocker, destructive, completion, confirmation are safety_critical: true; banned_phrases and phase_marker are false. disabled is blocked for safety-critical detectors. The consultation record still surfaces the attempted override.
  • Detector sensitivity refactor: each detector's exported function gains an optional sensitivity: Sensitivity parameter (defaults to "default"; current behavior). loose requires stricter signal per detector (e.g., explicit Phase 2 marker for blocker; banned_phrases count >= 2; explicit Phase 4 for completion; explicit Confirmation Summary marker for confirmation; imminent-action marker for destructive). strict is a TODO marker; treated as default until v0.9.2+. The snapshotWithConsultations() dispatcher centralizes the disabled + safety-critical gate and produces a DetectorConsultation audit list.
  • Brief frontmatter: optional learned_profile: <slug> field. Explicit-only loading; the runner does NOT auto-discover a profile from the project slug even if one exists on disk.
  • Runner integration: when learned_profile: is set, the runner loads the profile, threads a DetectorSensitivityMap through runExecute, applies the execute-phase budget to maxSteps, and emits learned_profile_applied once at session start. Missing profile is a soft warning; schema mismatch is a hard remediation message.
  • Three new trace event types (additive to the v0.8 union):
    • learned_profile_applied: once per session at profile load. Counts detector overrides, phase budgets, and dead-tool callouts.
    • learned_sensitivity_consulted: per detector consultation with non-default sensitivity. Records sensitivity value and whether the detector fired or was suppressed.
    • learned_budget_consulted: at execute-phase enter. Carries recommended budget, applied value, and source (learned / brief / default).
  • openwar inspect <brief_id> --learned: brief-scoped view that combines the on-disk profile for the brief's project slug with consultation history from the brief's trace events. Renders detector overrides + phase budgets + tool usage + consultation summary in column-pinned tables.
  • 66 new tests (tests/state/heuristics.test.ts, tests/state/learned-profile.test.ts, tests/detectors/sensitivity.test.ts, tests/cli/learn.test.ts, tests/runner/learned-profile-apply.test.ts, tests/cli/inspect-learned.test.ts). Total 580 (was 514 at v0.9.0). Right in the brief's 560-580 target.
  • Library exports for integrators: heuristics constants, profile load/save, sensitivity map projection, learn subcommand, inspect-learned formatter.

Design notes (Phase 0 picks)

  • Reframed scope from "adaptive autonomy" to "plumbing with conservative defaults". The previous v0.9.0 deferral applied to threshold values, not to runtime plumbing. Building the plumbing now lets v0.9.2+ become a thresholds-only patch.
  • No threshold constants were tuned during Phase 1 development. Every value matches the brief.
  • Determinism: profile saves are deterministic via stringifyDeterministic; same trace inputs produce byte-identical files (modulo generated_at). source_runs sorted lexicographically by buildLearnedProfile.
  • Detector refactor is fully backward-compatible: sensitivity parameter defaults to "default" so all existing detector callers and the v0.8 / v0.9.0 test suite pass unchanged. The 514 prior tests stay green.
  • Multi-agent coordinator does not consume learned phase budgets in v0.9.1 (different budget primitives; revisited in v0.9.2+). Detector sensitivities still apply via the coordinator's executor path.

Out of scope (deferred to v0.9.2+)

  • Threshold tuning against observed real-world distributions.
  • Per-detector strict semantics (parameter accepted, treated as default).
  • Multi-agent coordinator budget integration.
  • Auto-recommendation expiry / age-off.
  • A/B harness for sensitivity tuning.
  • OpenTelemetry exporter for the three new event types.

Notes for forkers and War Room integrators

  • v0.9.1 is fully backward compatible with v0.9.0 and v0.8.x. Briefs without learned_profile: behave identically. The detector refactor preserves default-sensitivity behavior bit-for-bit.
  • War Room integrators can read learned profiles via loadLearnedProfile(slug) from the library entry point and consume the three new trace events through readTrace(brief_id).
  • Operators on a fresh install will see "this profile is effectively a no-op at current sample size" until they accumulate ~10 runs against a project slug. This is intentional; expect v0.9.2 to tune the constants once real distributions surface.

v0.9.0 — openwar history (descriptive analytics)

18 May 08:49

Choose a tag to compare

0.9.0

openwar history: descriptive analytics over accumulated v0.8 traces. Read-only by design.

Originally scoped as "adaptive autonomy" with detector sensitivity overrides, recommended phase budgets, and a runtime-applied learned_profile. The brief was split during Phase 0 review on 2026-05-18 because the data foundation did not yet exist: v0.8.0 had landed three hours earlier and zero real traces accumulated against any project. Adapting against synthetic or thin samples would have shipped wrong-shaped defaults baked into runtime behavior. The original brief's own anti-gaming clause warned against this exact failure mode.

v0.9.0 ships the inspection layer. v0.9.1 will ship the prescriptive layer once one to two release cycles of v0.9.0 have accumulated real traces and we can calibrate heuristics against actual distributions.

Added

  • openwar history <project_slug> subcommand. Reads every trace.ndjson whose session metadata carries the slug, computes:
    • Per-tool call counts + last-used timestamps + "dead" flag (zero calls when sample >= 3).
    • Per-phase tool-call P50 / P90 / max, summed across runs. Tool-call attribution uses a most-recent-phase_enter walker so calls fall into the right bucket.
    • Per-detector total fires + fires-per-run + runs-with-fire.
    • Per-phase total + average duration_ms from phase_exit events.
    • Operator-readable notes: thin-sample warnings, dead-tool callouts, corrupted-line totals, v0.9.0-is-descriptive-only banner.
    • --since <ISO> filter, --min-samples N threshold (>= 2), --json deterministic output.
  • openwar inspect <brief_id> --history. Brief-scoped surface: looks up the session's project slug and renders the same history report.
  • docs/learning.md (new). Locks per-detector false-positive semantics for v0.9.1 even though v0.9.0 does not use them. Half a day of design work now while the question is fresh is cheaper than rebuilding the analysis in v0.9.1 against muscle-memory assumptions. Also documents the v0.9.0 vs v0.9.1 scope split and the safety-critical flag plan.
  • 24 new tests (tests/state/history.test.ts, tests/cli/history.test.ts, tests/cli/inspect-history.test.ts). Total 514 (was 490 at v0.8.0). Math correctness, determinism guarantees, filter semantics, schema_version anchoring, traceless-session reporting, brief-to-project lookup.
  • Library exports (src/index.ts): summarizeRun, aggregateRuns, buildHistoryReport, runHistory, formatHistoryReport, quantile, stringifyDeterministic, plus the RunSummary / HistoryReport / row types. Integrators (War Room, etc.) can build their own reporting layers on top.

Design notes (Phase 0 deviations approved)

  • Renamed from "adaptive autonomy" to "history". A capability whose first impression is "tells you what your runs look like" should not ship under a name that promises runtime behavior change. v0.9.1 reclaims "adaptive autonomy" when it actually adapts.
  • No learned_profile schema, no runner integration, no detector sensitivity refactor, no new trace events. All deferred to v0.9.1. v0.9.0 carries no forward-compat stubs in the schema either; cleaner to add fields in v0.9.1 with real data informing their shape.
  • The only confident heuristic in v0.9.0 is dead-tool detection. Everything else is descriptive math (counts, quantiles, sums) with no thresholds attached. P50 + 1.5 * IQR for phase budgets is deferred because the IQR shape on real long-tail distributions is unknown.
  • Phase-attribution walker built now, inherited by v0.9.1. Tool calls credit to the most-recent phase_enter. v0.9.1's budget math reuses the same walker.
  • Determinism is load-bearing. source_runs arrays sort lexicographically. JSON output goes through stringifyDeterministic with sorted object keys. Same trace inputs produce the same report bit-for-bit (modulo generated_at timestamp). Tested in tests/state/history.test.ts.

Out of scope (deferred to v0.9.1 or later)

  • openwar learn subcommand and the learned.json profile schema.
  • learned_profile: brief frontmatter field.
  • Detector sensitivity overrides (loose / strict / disabled).
  • Recommended phase budgets.
  • Runner-side application of any of the above.
  • The three planned trace events (learned_profile_applied, learned_sensitivity_consulted, learned_budget_consulted).
  • Recommendation expiry, A/B harness for sensitivity tuning, cross-project learning.

Notes for forkers and War Room integrators

  • v0.9.0 is fully backwards compatible with v0.8.x. No new brief frontmatter fields. No runtime behavior changes. Existing sessions inspect identically; the new --history surface is purely additive.
  • Operators on v0.8.x can upgrade to v0.9.0 with no migration cost. Accumulated v0.8 traces are immediately usable as history input.
  • v0.9.1 (when it ships) will use the same trace format and the same phase-attribution walker; profiles will read this v0.9.0 history data plus locked FP semantics from docs/learning.md.

v0.8.0 — Observability and tracing

18 May 08:17

Choose a tag to compare

0.8.0

Observability and tracing. The first version that gives operators (and integrators like War Room) the structured data they need to actually understand what their agents are doing. Everything before v0.8 was about getting the runtime to behave correctly. v0.8 makes its behavior visible.

This release was scoped against two real Windows live tests on 2026-05-17 and 2026-05-18 that surfaced five observability gaps: invisible MCP call lifecycle, ambiguous permission-layer source on failure, invisible MCP server liveness, invisible phase timing, silent settings-merge failure modes. Each is closed by a specific event type in the new trace stream.

Added

  • Structured trace event stream at ~/.openwar/sessions/<brief_id>.trace.ndjson. One JSONL event per line, append-only, schema-versioned via a trace_version header event on the first line. 19 event types covering phase transitions, tool calls, auth decisions, detector fires, role invocations, budget thresholds, sub-task state, coordinator state, MCP server lifecycle (started, shutdown, dispatched, completed; mcp_call_pending type defined, real-time emission lands in v0.8.x), settings-merge attempts and outcomes, and errors.
  • openwar inspect extensions: --trace, --trace --tail N, --trace --full, --timing, --cost, --cost --dollar-per-1k <rate>, --detectors, --tools, --mcp. Each prints a focused table. The dashboard reuses the same formatters so column shape stays in sync between CLI and web view.
  • openwar replay <brief_id> subcommand. Re-runs recorded assistant turns through CURRENT detector code, emits [replay]-prefixed output, halts at Phase 2 markers in the transcript (same shape as the original run), exits 1 when current detectors disagree with the recorded trace (drift). Useful for detector-regression CI gates and for demonstrating runs without paying for compute.
  • openwar dashboard subcommand. Opt-in local HTTP dashboard, default port 8780, bound to the IPv4 literal 127.0.0.1 (avoids Windows IPv6 resolution surprises). Zero outbound network calls. Zero third-party dependencies. Vanilla HTML over a single CSS block. Per-session views for summary, timing, cost, detectors, tools, mcp, and the raw trace.
  • OPENWAR_SESSIONS_DIR environment variable. Overrides the default <OPENWAR_HOME>/sessions location wholesale. Lets integrators relocate the session store and gives tests a clean way to point at a tmpdir.
  • docs/observability.md. Operator guide. Event reference, inspect modes, replay semantics, dashboard, file layout.
  • 40 new tests (tests/state/trace.test.ts, tests/state/trace-seams.test.ts, tests/cli/inspect.test.ts, tests/cli/replay.test.ts, tests/dashboard/server.test.ts). Total now 490 (was 450 at v0.7.3). Every event type has a round-trip case. Inspect formatters pin column shape. Dashboard verified to bind 127.0.0.1 only and make zero outbound network calls.

Design notes (Phase 0 deviations approved)

  • NDJSON appends use fs.appendFileSync per event, not tmp+rename. The original brief specced "same atomicity as the transcript (tmp+rename per append)." That conflated transcript atomicity (low-frequency message persistence) with trace atomicity (high-frequency event log). Right invariant is "any complete line is a complete event"; appendFileSync gives that and scales O(1) per event.
  • trace_version header event is the first line of every trace file. v0.9 will add fields; without a schema version marker, replay would silently misinterpret old traces.
  • call_id threaded through mcp_call_* events. Concurrent MCP calls would otherwise be uncorrelatable in the trace.
  • Replay re-runs detectors against the recorded transcript. Not playback of recorded detector results. Recorded trace is reference data for drift comparison, not the script. This is what makes replay useful for detector regression testing.
  • Dashboard = inspect-as-HTML. Single source of truth across the on-disk text view and the web view. Four renderers collapse to one; bug fixes land once.

Out of scope (per the brief)

  • Remote telemetry / cloud aggregation. Local-first.
  • OpenTelemetry adapter. v0.8.x stretch if real demand.
  • Real-time streaming dashboard. Files-on-demand. WebSocket live updates wait until at least v0.8.x.
  • Real-time mcp_call_pending emission. Requires subprocess-side tracing wired into openwar mcp-serve; the event type is defined so consumers can code against it now. Emission lands in v0.8.x.
  • Multi-user dashboard authentication. Single operator, localhost-bound.
  • Auto-pruning of old trace files. Operator manages disk usage manually.

Notes for forkers and War Room integrators

  • The trace file lives sibling to the transcript and session-state files. Existing v0.7.x sessions (no trace) inspect gracefully: openwar inspect <id> --trace prints a "no trace events; sessions written before v0.8 are transcript-only" notice and exits 0.
  • War Room integrators consuming the OpenWar library can import { Tracer, readTrace } from "@pythonluvr/openwar" and pump trace data into their own observability stack. OpenWar itself stays silent on the wire.