You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vulnerability discovery workflow — the marquee 0.11.0 feature: an orchestrator-driven, hub-and-spoke FSM that hunts memory-safety and logic bugs in native code under a user-supplied threat model. Run it from the web UI (ironcurtain daemon --web-ui → Workflows → New run): the visual state-machine graph, per-state agent-message timeline, gate review panel, artifact browser, and live escalation modal are the intended way to follow a multi-hour discovery run — agent sessions span hours and produce many artifacts, which the CLI is not equipped to surface comfortably. The orchestrator routes between a structural analyze state, a tiered harness pipeline (harness_design → reviewer loop → harness_build → harness_validate), differential validation, discover/triage for hypothesis confirmation, an LLM review pass, and a final human report_review gate; each agent state ships per-hypothesis directives written into a persistent investigation journal.md. Harness build/validate uses a Tier 1 (isolated function) / Tier 2 (multi-component) / Tier 3 (full build) ladder picked mechanically from hypothesis scope, with libFuzzer/AFL++ coverage-feedback gating to catch the common failure mode where instrumentation never reaches the target. Domain content is factored into reusable skills — memory-safety-c-cpp (bug-class taxonomy with Class A/B/C delegate-library realism), harness-design-fuzzing, vulnerability-triage — loaded per-state, while ordering stays in the FSM. Conclude writes a report.md index plus one report.h<N>.md per hypothesis. Quota exhaustion (HTTP 429) and upstream stalls preserve the checkpoint so the run resumes where it stopped. CLI access via ironcurtain workflow start vuln-discovery "<task>" remains for scripting and debugging (#169, #175, #199, #198, #241, #229).
Workflow web UI — the intended way to run workflows. Opt-in Svelte 5 dashboard launched by ironcurtain daemon --web-ui (default http://localhost:7400, bearer-token auth). The Workflows panel hosts the full lifecycle: start a new run (workflow picker, model selection, workspace path, task description), watch active runs through a live state-machine graph (dagre + SVG) with per-state agent-message timeline and markdown rendering, review gates with a workspace + artifact browser, and respond to escalations through an overlay modal with inline indicators. A Past section lists completed/failed/aborted runs by scanning ~/.ironcurtain/workflow-runs/ (checkpoints are now retained on success, and historical checkpoint-less runs are reconstructed from the message log via the shared discoverWorkflowRuns utility). The same daemon also serves the Sessions, Escalations, and Jobs views. Vuln-discovery and every other multi-agent workflow is meant to be driven from here — CLI workflow commands exist primarily for scripting and debugging (#154, #157, #163, #200).
Multi-agent workflow engine — the orchestration layer that powers the web UI: an XState v5 state machine with typed events, guards, agent/gate/deterministic states, declarative when: verdict conditions on transitions, per-state maxVisits caps with bounded-loop escalation, transition actions: (including resetVisitCounts), freshSession control, artifact versioning that snapshots .v<N-1> backups on re-entry, and crash-resume via on-disk checkpoints. Workflow definitions are now packaged as directories (<name>/workflow.yaml) with YAML preferred over JSON, required description fields surfaced in web UI tooltips and CLI inspect output, and arbitrary verdict strings for direct routing. The same engine runs identically behind the web UI (the intended interface), mux, and the ironcurtain workflow start|resume|inspect|list CLI (provided for scripting and debugging) (#159, #163, #165, #169, #188, #190).
Workflow shared-container mode — opt in via settings.sharedContainer: true in a workflow YAML. One Docker container and one ToolCallCoordinator serve every agent state in the run; the orchestrator hot-swaps the active PolicyEngine between states via POST /__ironcurtain/policy/load over a per-run Unix domain control socket, and the coordinator swaps under callMutex → policyMutex. Audit entries are tagged with persona and written to a single audit.jsonl per run. Run artifacts consolidate under ~/.ironcurtain/workflow-runs/<id>/ with bundle/, states/<stateId>.<visitCount>/, audit.jsonl, and messages.jsonl; nothing lands under ~/.ironcurtain/sessions/ for a workflow run. The new containerScope primitive lets workflows split states across multiple bundles when isolation is needed. Implements Steps 4–5 of docs/designs/workflow-container-lifecycle.md (#184, #186, #187, #191).
Agent skills (SKILL.md) — drop SKILL.md packages under ~/.ironcurtain/skills/<name>/ to make purpose-specific guidance available to every Docker session; the merged set is staged so each agent's native skill discovery picks it up. Claude Code is pointed at the staging dir via --add-dir <parent>; Goose scans ~/.config/goose/skills/<name>/SKILL.md. Each agent adapter declares its own skillsContainerPath and the Docker infrastructure issues a dedicated read-only bind mount at that path. Workflows ship per-state skills via <workflow-pkg>/skills/<name>/SKILL.md and an optional skills: [...] field on agent states (omit = all workflow skills, none sentinel = clean slate). Persona skills (~/.ironcurtain/personas/<name>/skills/) apply to standalone sessions. See WORKFLOWS.md (#227).
ironcurtain doctor command — on-demand setup diagnostics that runs every health check independently and surfaces a single punch list (Node version, V8 sandbox viability, Docker availability with categorized errors, config parse, compiled policy / constitution / annotation drift, Anthropic OAuth/API-key presence and expiry, per-MCP-server env vars, and live tools/list against each configured server). Opt-in --check-api adds a 1-token round-trip and OAuth refresh validation (#206).
Pre-flight checks for sandbox / Docker / OAuth — start, daemon, bot, and workflow now spawn a child Node process that imports @utcp/code-mode to validate the V8 sandbox before doing anything; SIGSEGV/SIGILL maps to "use Node 22–24", NODE_MODULE_VERSION to "npm rebuild", and missing packages to "npm install". Success is cached at ~/.ironcurtain/.preflight-ok. Docker availability returns a tagged union with targeted messages for ENOENT, permission denied, and "Cannot connect to the Docker daemon." OAuth-only-without-Docker now fails fast with a remediation hint instead of silently dropping into builtin and later 401-ing. Mux runs the same preflight in the parent before entering fullscreen so failures surface cleanly (#203, #213, #244).
Configurable Docker container resources with auto-clamp and probe — new dockerResources: { memoryMb, cpus } field in ~/.ironcurtain/config.json (each independently nullable; null = no limit), editable via ironcurtain config → Docker Agent → Container resources. Values are auto-clamped against os.cpus() / os.totalmem() before reaching Docker. ironcurtain doctor and the first-start wizard run a real docker run --rm --cpus N --memory Mm <image> /usr/bin/true probe, parse Docker's stderr for rejection patterns, and suggest concrete lowered values. Necessary on small hosts (2-vCPU VMs) and on macOS Docker Desktop where os.cpus() over-reports vs. the VM (#247).
Idle-timeout watchdog for Docker pull / build — replaces hard wall-clock timeouts on docker pull and docker build with a progress-aware idle watchdog (spawnWithIdleTimeout); legitimate multi-hour pulls of large base images (e.g. devcontainers/universal) now succeed, only true silence kills the child. A TTY-aware progress sink collapses the per-layer / per-step chunk flood into a single in-place updating status line (docker pull 4/12 layers 2 downloading); non-TTY stderr passes the raw transcript through (#250, #251, #260).
Linux UID/GID remap for agent containers — on Linux hosts where the user's UID/GID isn't 1000, the agent container now starts as root (--user 0:0) with host UID/GID passed via env; the entrypoint runs usermod/groupmod/chown to renumber the baked codespace user to match the host, then drops privileges via exec runuser -u codespace. Fixes the "Not logged in / Please run /login" symptom on Kali, NixOS, and other non-default-UID environments where bind-mounted credential files were unwritable. macOS is unchanged. Centralized in buildAgentUidRemap() and shared by batch- and PTY-mode container creation (#245).
Per-persona / per-job memory opt-in — PersonaDefinition.memory?: { enabled: boolean } and JobDefinition.memory?: { enabled: boolean } let users disable the memory MCP server for individual personas / cron jobs (default on). The global userConfig.memory.enabled kill switch still wins. MEMORY_SERVER_NAME was removed from the persona resolver's always-included set; the workflow orchestrator now spawns the memory relay only when at least one persona in scope opts in, closing a shared-container gap where the prompt advertised memory but the relay was absent. CLI surfaces the toggle via persona create/edit, daemon add-job/edit-job, and ironcurtain config -> Memory (#215).
Real-time LLM token stream observation — the MITM proxy taps Anthropic/OpenAI SSE and JSON responses inside Docker agent sessions and publishes structured TokenStreamEvents on a shared pub/sub bus. New ironcurtain observe command renders a Matrix-style data rain panel alongside formatted tool calls, results, thinking text, and assistant output by subscribing through the daemon's WebSocket. Workflow summary totalTokens is now accumulated from message_end events across all the workflow's agent sessions and displayed in the UI (#178, #211).
Matrix rain login page — web UI auth screen renders a Canvas 2D Matrix rain that assembles into the "IronCurtain" wordmark (#180).
IRONCURTAIN_MITM_ALLOW_ALL_HOSTS escape hatch — opt-in env var makes the MITM proxy treat every unknown host as a passthrough TCP tunnel, bypassing the CONNECT allowlist for HTTPS, plain HTTP, and WebSocket upgrades. Provider/registry traffic is unchanged (TLS termination and credential swap still apply); only unknown hosts get the wildcard. Logs a WARN on proxy startup so the posture downgrade is visible in the audit trail (#249).
Sudo and apt-get inside agent containers — Linux capabilities (SETUID, SETGID, CHOWN, FOWNER, DAC_OVERRIDE, AUDIT_WRITE) added so sudo works despite --cap-drop=ALL; python3-pip baked into both base images; /etc/apt/apt.conf.d/90-ironcurtain-proxy written into containers so apt-get routes through the MITM proxy; MITM plain-HTTP forwarding fixed for Debian registry hosts (#164).
Annotation drift warnings — three drift-detection warnings between configured MCP servers and tool-annotations.json (policy-load time, MCP-connect time, and stale-on-disk). Previously such mismatches surfaced as silent per-call default-denies via the policy engine's structural-unknown-tool fallback (#193).
Ollama-style model IDs in workflow YAML — settings.model and per-state model fields now accept opaque name:tag forms (e.g. glm-5.1:cloud) via a new looseModelId schema, while ~/.ironcurtain/config.json slots stay on strict qualifiedModelId (#194).
Behavior changes
Silent fallback to the builtin agent is gone.UserConfig now carries preferredMode: 'docker' | 'builtin' (default 'docker'); if preferredMode: 'docker' and Docker is unavailable, the session refuses to start with a remediation hint instead of silently dropping into builtin. --agent CLI flag continues to win over config. ironcurtain doctor treats declared-but-unmet preferences as failures (exit 1). Borderline-breaking for users who relied on the implicit downgrade — set preferredMode: 'builtin' explicitly, or pass --agent builtin, to keep the old behavior. Closes feedback where testers ran in builtin mode for hours without noticing (#225).
MCP relays only spawn for servers the policy actually references.extractRequiredServers(policy) walks rule.if.server in the active compiled policy; unreferenced servers are dropped from mcpServers before relay subprocesses start. Default-deny would have rejected every call to them anyway. The workflow factory passes the per-scope union across all personas with the same containerScope (#208).
Spawn only MCP servers that successfully connect. When every backend of an MCP proxy subprocess fails to connect (missing env var, missing OAuth, etc.), the proxy now exits non-zero instead of staying alive with an empty tool list and causing every annotated tool to be flagged by the drift check (#259).
Workflow definitions are directories. Bundled workflows ship as <name>/workflow.yaml (with optional sibling skills/) rather than a single JSON file; YAML is preferred for new authoring. Custom .json user workflows continue to work (#169, #227).
Fixes
ironcurtain doctor exit code + preflight cwd resolution — doctor now exits non-zero when a declared preferredMode cannot be honored; the sandbox-viability preflight resolves @utcp/code-mode from the parent via createRequire().resolve() so running ironcurtain from outside the install tree (e.g. after npm install -g, from ~ or /) no longer falsely reports a missing dependency (#266).
Workflow agents wedged by ScheduleWakeup — the schedule built-in skill's tools are stripped from /v1/messages request bodies for workflow agents, and conversation-history references to those tools (Claude Code's ToolSearch deferred-tool fetcher surfaces them as tool_reference entries and <function>{...}</function> schema blocks) are scrubbed so Anthropic doesn't 400 the next request. Closes a workflow-abort failure mode reproduced in multi-hour runs (#258, #263).
Per-leg state directories — workflow forensic dirs (states/{stateId}.{N}/) are now keyed on a disk scan via nextStateSlug() rather than the FSM visitCounts, so resume legs of a single visit land in fresh dirs instead of overwriting the original leg's session.log and session-metadata.json (#264).
Per-hypothesis discovery/triage files — discover and triage states write per-hypothesis files instead of overwriting a single shared artifact, preserving evidence across hypotheses (#264).
Status-block reprompt anchored on final response — buildStatusBlockReprompt now tells the agent to emit agent_status on the last response only, fixing a harness_build failure where multi-checkpoint runs emitted intermediate blocks and a prose-only final turn that Claude Code's -p JSON output dropped (#254).
Rotate agent conversation id on upstream stall — Claude Code mid-stream kill (exit 143 + empty output) now retries the original prompt up to 2 times with a freshly-minted agent conversation id, instead of sending a "missing status block" reprompt against a consumed session id (#195).
429 quota-exhaustion resilience — workflow runs detect upstream 429s, short-circuit retries, preserve the checkpoint, and exit as aborted (not completed) so workflow resume accepts the run once the quota window reopens. Sustained upstream stalls (exit=0 + usage.output_tokens === 0 + stop_reason === null) are now classified as resumable transient failures via AgentResponse.transientFailure (#198, #210).
MITM token routing in shared-container workflows — the proxy's routing sessionId is now mutable and flipped by the orchestrator around each executeAgentState; per-response sidAtAttach/sidForToolResults snapshots prevent a mid-stream flip from splitting a single SSE response across two ids (#211).
Deterministic-state failures forward to the next agent — orchestrator now propagates deterministic-state errors as agent_status notes so downstream agents see the failure (#242).
MCP isError surfaces to Code Mode — sandbox now throws on MCP isError so Code Mode LLMs see tool failures instead of silently consuming an error envelope (#185).
Docker batch mode multi-turn context — switched from claude -p --continue (silently no-ops in non-interactive print mode) to --session-id <uuid> on first turn + --resume <uuid> thereafter, fixing context loss in web UI, cron, and workflow Docker sessions (#177).
Mux PTY session MITM bridge — mount the bundle sockets dir so the MITM bridge is reachable (#209).
Web UI bad-auth-token recovery — UI no longer wedges on a malformed bearer token; surfaces a re-auth flow (#182).
Web UI stuck at waiting_human after gate rejection — phase tracking refreshes after a gate verdict (#173).
Web UI Matrix rain review fixes — accessibility and performance polish to the login Canvas (#183).
Web UI markdown rendering — use prose-markdown class for consistent typography (#170).
Web UI e2e tests — repair after mobile drawer + gate auto-fetch changes (#212).
Goose PTY readiness probe — stop spawning a doomed agent process during the readiness check (#226).
Shared validatePolicyDir helper — src/config/validate-policy-dir.ts realpath-resolves candidate policy directories and enforces containment under the IronCurtain home or the package config dir; CLI flags, the loadPolicy RPC, and session creation all funnel through it (#207).
Shared applyAllowedDirectoryToMcpArgs helper — single source of truth in src/config/index.ts for keeping mcpServers.filesystem.args in sync with the active allowedDirectory; fixes stale paths in shared-container workflow runs.
Audit stream errors latch — AuditLog now remembers stream errors and surfaces them synchronously on the next log() instead of silently dropping entries.
Workflow run directory hardened at 0o700 — chmodSync enforces the mode so the control socket and audit log are protected by filesystem permissions.
Zero-constraint whitelist no longer blanket-approves add_proxy_domain (#151).
Conversation state directory mounted in Docker batch mode (#152).
Base-URL env vars validated — ANTHROPIC_BASE_URL, OPENAI_BASE_URL, GOOGLE_API_BASE_URL are now z.url()-validated at config load (#257).
npm test -- <file> forwards the filter to vitest.
Claude Code v2 telemetry and MCP marketplace endpoints allowed through the MITM proxy.
Sanitize agent output by escaping NUL bytes and truncating to 32KB.
Hash workflow artifact metadata, not contents — avoids re-hashing huge artifacts on each transition.
PolicyEngine + AuditLog centralized into ToolCallCoordinator — the security pipeline (tool-call-pipeline.ts) now owns all policy/audit/circuit-breaker/whitelist state; MCP proxy server subprocesses became pure relays (#179).
DockerInfrastructure bundle with explicit lifecycle — reframes the implicit "bag of stuff the session owns" into a typed handle with paired createDockerInfrastructure / destroyDockerInfrastructure and an ownsInfra flag on DockerAgentSession. Pure refactor, prerequisite for shared-container mode (#184).
Module layering rules — guidance added to per-directory CLAUDE.md files; src/pipeline/ is offline tooling that live-session runtime must not value-import. URL normalizers, list matcher, session error hierarchy, server listing, llm-logger, and misplaced constants moved to their correct layers; event bus generified; WorkflowManager moved out of src/web-ui/ (#216, #217, #218, #219, #220, #221, #222, #223, #224).
TokenStreamBus migrated to module singleton (#189).