approval-gate simplification — rules engine, three-state lifecycle, single drain RPC by ytallo · Pull Request #146 · iii-hq/workers

ytallo · 2026-05-16T11:59:50Z

Summary

Collapses approval-gate's three overlapping decision surfaces (classifier interceptor + apply_policy_rules + run-level approval_required) into one rules engine, simplifies the lifecycle to three states, and replaces the four-RPC delivery handshake with one atomic drain.

One decision path: every before_function_call hook resolves through rules::evaluate(function_id, pattern_for(args), ruleset) returning Allow / Ask / Deny. The classifier-as-LLM surface is gone (InterceptAction, decide_intercept_action, ClassifierDecision, interpret_classifier_reply, apply_policy_rules, PolicyOutcome all deleted).
Record { status: Pending | InFlight | Done, outcome: Outcome }: InFlight is the intermediate persist that closes the duplicate-execution race a second resolve would otherwise trigger. Replaces the older multi-status flow.
Single drain RPC: approval::consume returns Done rows and deletes them atomically (capped, FIFO by resolved_at, defensive session-id filter). The list_undelivered / ack_delivered / consume_undelivered / flush_delivered handshake is gone.
No sweeper: the 2-second background polling task is deleted. Pending → TimedOut flips lazily on read in handle_resolve / handle_consume / handle_list_pending.
Shell is a plain executor: the __from_approval marker is gone on both write (approval-gate) and verify (shell). classify_argv, approval_bypass, arity.rs, and the ShellConfig allowlist / denylist_patterns fields are deleted — command-level policy lives upstream in approval-gate's rules.
Curated default ruleset ships in iii.worker.yaml so empty operator config isn't "prompt for every tool call" — read-only fs::* and read-only git auto-allow; the rest asks.

Decisions worth a careful look

Cascade narrows to the exact pattern, not *. Clicking Allow + Always on git status pushes a runtime Rule { permission: "shell::exec", pattern: "git status", action: Allow } — earlier drafts pushed pattern: "*", which would have auto-allowed every shell::exec for the rest of the session including rm -rf /. Lives in approval-gate/src/resolve.rs.
No-match default = Ask + curated default ruleset. before_function_call fires for every tool call, so empty rules would prompt for everything. The shipped defaults in iii.worker.yaml cover the noisy read-only paths.
Threat-model shift. Removing the __from_approval marker means bus access ≡ shell access; the harness's bus-level access control is now the perimeter. Spelled out in the design doc under "Threat model".
resolved_at is kept, not derived. Needed for deterministic LLM stitch order when one consume drains multiple cascade-resolved rows (function_call_id is provider-minted and not lex-monotonic, so it can't substitute).

Notable change surface

Area	What
`approval-gate/src/`	`rules.rs`, `intercept.rs`, `resolve.rs`, `state.rs`, `delivery.rs`, `register.rs` split out of the old `lib.rs`. `sweeper.rs` removed in favor of lazy timeout flip. `WorkerConfig` reduced to `{topic, approval_state_scope, default_timeout_ms, rules}`.
`approval-gate/tests/lifecycle.rs`	E2E coverage of allow / deny / timeout / cascade against in-memory fakes.
`shell/src/`	`classify_argv`, `approval_bypass`, `arity.rs` deleted. `ShellConfig` strips `allowlist` and `denylist_patterns`.
`shell/tests/e2e/`	Allowlist/denylist test cases and config keys removed; remaining safety guardrails (timeout, output truncation, env scrubbing, fs path denylist) kept.
`turn-orchestrator/src/states/assistant.rs`	`consume_approval_stitch` now drives `approval::consume`.
`harness/web/src/components/ApprovalRow.tsx`	Allow + Always button, optional Deny correction feedback, client-side `expires_at` countdown.
`harness/src/fanout.rs`	Documented follow-up: switch from `list_pending` polling to `agent::events` subscription. Polling still works; deferred.

Net diff: ~3.6k lines removed across 63 files; 56 approval-gate lib tests + 5 E2E lifecycle tests pass.

Test plan

cargo test --manifest-path approval-gate/Cargo.toml — 56 lib tests + 5 E2E lifecycle tests pass.
shell/tests/e2e suite (CI workflow .github/workflows/shell-e2e.yml) — runs ./run-tests.sh + ./run-tests-jailed.sh against a real iii engine on ubuntu-latest.
Manual: allow / deny / cascade / timeout flows through the harness UI.

Introduce a first-class Rule/Ruleset/Action surface ported from opencode's Permission.evaluate. Rules are wildcard globs over the iii function id (pattern is "*" in v1 — the field is reserved for per- function pattern extractors in a follow-up). Evaluator semantics: findLast across flattened layers, so operators stack a permissive default with more-specific overrides without surgery on the base list. approval-gate - New rules module: Action (allow/deny/ask), Rule, Ruleset alias, wildcard_match (DP, no regex), evaluate(perm, pattern, IntoIterator of &Rule). 16 unit tests covering matching, layering, last-wins. - WorkerConfig.rules: Vec<Rule> (additive; existing InterceptorRule flow unchanged). - apply_policy_rules: pure helper that maps a rule hit to PolicyOutcome (Allow / Deny{rule_permission, rule_pattern} / FallThrough). Allow short-circuits to pass-through; Deny short-circuits with a Denial::Policy that names the matched rule; Ask and no-match fall through to the existing per-function interceptor flow. - handle_subscriber consults the rules layer first, before decide_intercept_action.

When a user resolves a pending call with decision=allow + always=true, approval-gate now pushes a runtime Allow rule for the call's function id and sweeps the same session's pending records, auto-resolving every other one the new rule covers. One click instead of N. approval-gate - approve_and_execute() extracted from handle_resolve's allow branch as a reusable async helper. Both the user-driven allow path and the cascade sweep drive state through the same transitions (approved -> invoke -> executed | failed). - Shared policy ruleset wrapped as Arc<RwLock<Ruleset>> so reply-time mutation is safe across the subscriber + resolve closures. Read guards are scoped so they never cross an .await (std::sync::RwLock is not async-safe to hold across suspension). - cascade_allow_for_session() pushes the new rule under the write lock, snapshots session pending via list_prefix, and runs approve_and_execute for each newly-Allow record. The originator is skipped (already resolved above). Per-record state-write failures are logged and the rest of the cascade continues. - handle_resolve response carries `cascaded: N` when the sweep resolved at least one extra record; omitted otherwise so the one-shot path stays unchanged. Cascade scope is intentionally narrow for v1: - function-id-only matching (pattern "*"), mirrors the v1 rules surface - same session only - in-memory rule (no cross-restart persistence) - allow + always; deny + always is out of scope Five new tests cover the cascade decision tree (no-always, same-session match, cross-session non-effect, originator-skip, terminal-record skip).

The Denial refactor replaced decision_reason with a typed structure, but several legacy hooks remained in place. They are now removed because the underlying schemas they protected against (pre-trigger-model status strings, pre-Denial flat decision_reason field) cannot occur in any state store this branch is wired to. approval-gate - Drop migrate_legacy_record entirely. The pre-trigger-model status="allow"/"deny" rename and the decision_reason -> Denial::Legacy lift both go away. Old persisted records (if any) now surface as filtered-out orphans rather than silently mistranslated. - Drop Denial::Legacy variant from the enum. - Drop legacy_migrated flag from records. - Drop the four migrate_legacy_record_* unit tests and the handle_list_undelivered_persists_migrated_legacy_record integration test. - handle_list_pending no longer pre-filters legacy shapes; orphan records lacking a session_id stamp are dropped uniformly. - timeout_resolved_event drops the decision_reason: "timeout" field (status: "timed_out" is self-describing). - handle_sweep_session drops the optional `reason` payload field. The cause (session_deleted vs run_stopped) is the caller's concern; the swept timed_out records carry no Denial. - skills/sweep_session.md updated. turn-orchestrator - approval_stitching drops the legacy denial-kind branch and the legacy_migrated note line; the corresponding stitch test is removed. - run_stop no longer forwards reason: "run_stopped" on the approval::sweep_session payload; it logs the cause locally instead. - The run_stop integration test now asserts the absence of the reason field on the sweep payload. harness-types (6 vendored copies, byte-identical) - Drop the Legacy variant and its docblock line from each agent_event.rs copy.

New src/record.rs defines the persisted approval record as a typed struct alongside its lifecycle Status enum and a Next enum that pairs each target status with its outcome data. Wire format is byte-identical to the existing serde_json::Value blobs — operators can adopt the typed shape incrementally without changing any persisted state. approval-gate - New module record: - Status enum (Pending / Approved / Executed / Failed / Denied / TimedOut), serde snake_case, with is_terminal() helper. - Record struct mirroring the historical JSON keys field-for-field, optional outcome fields scoped to terminal statuses, serde skip-when-None so existing rows round-trip cleanly. - Next enum that bundles (target_status, outcome_payload) so the type system rules out impossible transitions (e.g. Executed without a result, Denied without a Denial). Replaces the old (status_string, Option<result>, Option<error>, Option<denial>) parallel-Option signature when callers opt in. - Record::to_value / Record::from_value bridge to the Value blobs the iii state bus still expects. - 8 unit tests covering serde round-trips, status semantics, expiry saturation, forward-compat with unknown fields. - lib.rs re-exports Next, Record, Status. Handler migration to the typed schema is a separate change; this commit just makes the types available so future work can lift Value access into typed field access incrementally rather than in one big diff.

First step of breaking lib.rs (~4300 lines) into focused modules. This commit lifts the wire-format types and small wire-shape helpers into their own module — pure data shapes with no I/O, no iii-sdk dependency, no async — so downstream workers that only need to understand the approval-gate protocol can depend on a small surface. Moved (lib.rs -> wire.rs): - Denial (structured deny payload) - IncomingCall + requires_approval() - Decision (Allow | Deny(Denial)) - WireDecision (allow|deny wire enum) - pending_key() - extract_call() - block_reply_for() All public items stay re-exported from the crate root, so every existing call site (handlers, tests, integration tests, downstream crates) continues to work without import changes. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + session + providers + harness-tui + shell all pass.

Second step of breaking lib.rs into focused modules. This commit lifts the persisted-record lifecycle helpers — pure Value-blob constructors and transitions, no I/O, no iii-sdk dependency — into their own module. Moved (lib.rs -> lifecycle.rs): - is_terminal_status() - build_pending_record() - transition_record() + transition_record_with_now() - collect_timed_out_for_sweep() - maybe_flip_timed_out() All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass.

Third step of breaking lib.rs into focused modules. This commit lifts the iii-backed state-bus and function-executor implementations into their own module, along with the `__from_approval` marker plumbing and the boot-time marker-target safety check. Moved (lib.rs -> state.rs): - StateBus trait + IiiStateBus impl (iii state::* wrapper) - FunctionExecutor trait + IiiFunctionExecutor impl (iii.trigger wrapper) - rule_for() (lookup helper used by IiiFunctionExecutor) - merge_from_approval_marker_if_needed() (marker payload composer) - unverified_marker_targets() (boot guard) The StateBus / FunctionExecutor traits are kept exactly as they were in lib.rs — they exist purely as test seams so unit tests can swap in InMemoryStateBus / FakeExecutor. No new abstractions added. Public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass (turn-orchestrator's pre-existing dual_write flake notwithstanding).

Fourth step of breaking lib.rs into focused modules. This commit lifts the intercept decision flow into its own module — the three layers that decide what the gate does with an incoming function call: policy rules, per-function interceptor rules, and classifier replies, plus the async handle_intercept that writes the pending record. Moved (lib.rs -> intercept.rs): - InterceptAction enum + decide_intercept_action() - PolicyOutcome enum + apply_policy_rules() - ClassifierDecision enum + interpret_classifier_reply() - handle_intercept() handle_intercept stays public (re-exported from the crate root). The pub(crate) helpers stay pub(crate) and are imported back into lib.rs for the subscriber closure in register(). Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green.

Fifth step of breaking lib.rs into focused modules. This commit lifts the approval::resolve flow into its own module. Moved (lib.rs -> resolve.rs): - handle_lookup_record() - handle_resolve() - cascade_allow_for_session() (the always=true sweep) - approve_and_execute() (shared by user-driven allow + cascade) handle_resolve and handle_lookup_record stay public via the crate root re-exports. approve_and_execute and cascade_allow_for_session stay private to the resolve module. Pure move, no behavior change. 141 approval-gate tests pass.

Sixth step of breaking lib.rs into focused modules. This commit lifts the delivery-tracking RPCs into their own module. Moved (lib.rs -> delivery.rs): - handle_list_pending() - handle_list_undelivered() - handle_ack_delivered() - handle_consume_undelivered() - handle_flush_delivered() - handle_sweep_session() - LIST_UNDELIVERED_DEFAULT_LIMIT All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass.

Seventh step of breaking lib.rs into focused modules. This commit lifts the periodic timeout sweeper and the small iii-stream helpers it owns into their own module. Moved (lib.rs -> sweeper.rs): - spawn_timeout_sweeper() - timeout_resolved_event() - write_event() - write_hook_reply() - uuid_like() All five stay pub(crate) — only the register() wiring in lib.rs and the resolve flow need them. Pure move, no behavior change. 141 approval-gate tests pass.

Final step of breaking lib.rs into focused modules. This commit lifts the iii function/trigger wiring — the heart of the worker startup path — into its own module. Moved (lib.rs -> register.rs): - register() (~420 lines: the subscriber closure + 8 function registrations + trigger registration + sweeper spawn) - Refs struct (the handle bag returned to the binary) - FN_RESOLVE / FN_LIST_PENDING / FN_LIST_UNDELIVERED / FN_CONSUME_UNDELIVERED / FN_ACK_DELIVERED / FN_FLUSH_DELIVERED / FN_SWEEP_SESSION / FN_LOOKUP_RECORD constants - STATE_SCOPE constant lib.rs is now ~50 lines of module declarations + re-exports + ~2580 lines of inline tests. The test mod uses super::* plus local serde_json and std::sync imports. All public items stay re-exported from the crate root so the binary's `use approval_gate::register;` keeps working unchanged. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green (turn-orchestrator's pre-existing dual_write flake notwithstanding).

First step of relocating the 2580-line inline test block from lib.rs into more focused homes. This commit moves the 18 tests that touch pub(crate) helpers — ones that can't live in tests/*.rs because integration tests can only see pub items — into the #[cfg(test)] mod tests block of whichever production module owns the helper they exercise. intercept.rs gained: - interpret_classifier_reply_reads_decision_tags - decide_intercept_action_* (5 tests) - apply_policy_rules_* (5 tests) state.rs gained: - merge_from_approval_* (4 tests) - rule_for_returns_matching_rule / rule_for_returns_none_when_absent sweeper.rs gained: - timeout_resolved_event_shape lib.rs loses those 18 tests. 141 approval-gate tests pass (no duplicate counts — each test runs once). turn-orchestrator + harness + shell verified green. Public-API tests (the remaining ~96 in lib.rs) move to tests/*.rs in a follow-up commit so they can share fakes via a common module.

Final test split. The 86 remaining inline tests in lib.rs that exercise only the crate's `pub` surface now live in tests/*.rs, organized by the area they cover. The state-machine proptest gets its own file. lib.rs drops from 2369 to 52 lines (just module declarations + re-exports). New tests/ layout: - common/mod.rs — shared fakes: FakeExecutor, InMemoryStateBus, FailingStateBus, sample_call(), empty_policy_rules() - lifecycle.rs — 19 tests: build_pending_record, transition_record, maybe_flip_timed_out, collect_timed_out_for_sweep, is_terminal_status, pending_key - wire.rs — 10 tests: extract_call, block_reply_for, requires_approval - intercept.rs — 9 tests: handle_intercept (replays, fail-closed, session_id stamping, force_pending) - resolve.rs — 19 tests: handle_resolve, cascade-on-`always`, handle_lookup_record - delivery.rs — 25 tests: list_pending / list_undelivered / ack_delivered / consume_undelivered / flush_delivered / sweep_session - misc.rs — 4 tests: FN_* constants, unverified_marker_targets, FakeExecutor smoke test - state_machine.rs — the proptest with the four lifecycle invariants Each tests/*.rs uses `mod common;` for shared fakes — only one source of truth for in-memory bus/executor stand-ins. 141 approval-gate tests pass across 13 test binaries (previously 6). turn-orchestrator + harness + shell verified green.

…ecycle Pending → InFlight → Done(Outcome). InFlight is the intermediate persist that closes the dup-exec race within a worker process — a second approval::resolve arriving during the invoke await sees the row is non-Pending and bails. Outcome is a tagged enum (Executed/Failed/Denied/TimedOut). resolved_at is stamped on the first non-Pending transition for deterministic multi-row consume ordering. lifecycle.rs is deleted; its only surviving helper (flipped_to_timed_out_if_expired) is now a Record method.

Task 1 of the simplification plan deletes lifecycle.rs entirely, but the plan was a bit optimistic: dependent modules (intercept, resolve, delivery, sweeper) still import its helpers (build_pending_record, transition_record, maybe_flip_timed_out, is_terminal_status, collect_timed_out_for_sweep) and the crate fails to compile without them. Cargo can't run record.rs's unit tests if the library itself won't link. Restore lifecycle.rs as a transitional shim. Tasks 5/6/8/11 will progressively migrate each callsite to the new Record API; whichever of those tasks lands last deletes lifecycle.rs for real.

…ttern The classifier surface is being deleted in favor of the layered rules engine; rename the Policy variant's detail fields to match the actual data they'll carry once T5 lands. For now, classifier callsites (intercept.rs, register.rs) map their old reason/fn locals into the new field names — transient mapping, deleted in T5. Test callsites in tests/wire.rs and tests/lifecycle.rs updated to use representative rule_permission/rule_pattern values. Note: IncomingCall.approval_required + requires_approval() field/method are NOT removed in this commit (the plan's T2 calls for it but intercept.rs's decide_intercept_action still depends on them). They disappear in T5 when intercept.rs is rewritten end-to-end.

For shell::exec/shell::exec_bg, derive a pattern string from {command, args} so layered rules can match argv-level patterns like 'git status*'. Other function ids default to '*'. Malformed/missing args fall back to '*' (matches only wildcard rules). 8 new tests cover all branches: join+args, single-string command, empty args list, missing command, malformed args, non-shell function id, and the documented space-conflation case.

Required by the new approval::consume RPC (T8), which returns Done rows and deletes them in the same call. IiiStateBus impl maps to the existing state::delete RPC (used elsewhere in the codebase, e.g. auth-credentials). Test fakes (InMemoryStateBus, FailingStateBus) gain matching impls — InMemory removes the key idempotently, Failing errors to match its other primitives.

The classifier surface and per-function InterceptorRule decision flow are gone. handle_intercept now reads the verdict via the new lib-level verdict_for(function_id, args, ruleset) helper: Verdict::Allow → {block:false}, no state write Verdict::Deny → {block:true, denial:Policy{rule_permission, rule_pattern}} Verdict::Ask → write Pending row + reply {block:true, status:pending} Replay defense extended for the new 3-state schema: Pending → replay:in_flight, status:pending InFlight → replay:in_flight, status:in_flight (new) Done → replay:already_resolved, status:done intercept.rs drops InterceptAction, decide_intercept_action, ClassifierDecision, interpret_classifier_reply, PolicyOutcome, apply_policy_rules. register.rs's subscriber closure collapses the old policy_outcome+decide_intercept+classifier dance into one handle_intercept call. resolve.rs's cascade loop reuses verdict_for instead of the deleted apply_policy_rules. State-write failure still fails closed with Denial::StateError. 7 new tests cover all four verdict branches + the three replay states + fail-closed behavior. Note: InterceptorRule config struct and intercept_rules variable are still in place for T12 to fully remove.

handle_resolve rewritten to typed Record + three-phase allow path: 1. write InFlight (closes the dup-exec race within a worker process) 2. iii.trigger(function_id, args) and await 3. write Done(Executed{result}) or Done(Failed{error}) Deny is a single Pending → Done(Denied{denial}) write — no invoke, no InFlight needed. WireDecision::Deny with no denial in payload defaults to UserRejected. UserCorrected{feedback} round-trips. Dup-exec guard: handle_resolve refuses non-Pending rows with in_flight (concurrent resolve mid-invoke) or already_resolved (terminal). Cascade allow on always:true (T7) now pushes a runtime Allow rule with the originator's EXACT pattern via rules::pattern_for(args) — not the prior blanket pattern:'*'. 'Always allow git status' no longer auto-allows rm -rf / via the same shell::exec function id. Lock-ordering invariant pinned in module docs: never hold the ruleset guard across .await.

…elete sweeper.rs/lifecycle.rs (T8+T9+T11) Three behavioral changes folded into one commit because they share files: T8 (new approval::consume RPC): Three-phase drain — gather Done candidates, sort by resolved_at + cap, delete-and-return. Defensive session_id filter. Lazy timeout flip on read. Default cap CONSUME_DEFAULT_LIMIT=50 bounds the response size against MB-scale stdout payloads. T9 (delivery.rs strip): handle_list_undelivered, handle_consume_undelivered, handle_ack_delivered, handle_flush_delivered all deleted. LIST_UNDELIVERED_DEFAULT_LIMIT constant gone. handle_list_pending rewritten on the typed Record API with lazy timeout flip on read. handle_sweep_session rewritten on new schema — flips Pending+InFlight rows to Done(TimedOut). T11 (sweeper.rs deleted): Background polling task gone. Timeouts now flip lazily on read in handle_resolve / handle_consume / handle_list_pending. UI handles expires_at countdown client-side; the LLM learns of timeouts on the next consume. lifecycle.rs transient shim also deleted (no more callers). Stream helpers (uuid_like, write_event, write_hook_reply) move into register.rs as their only consumer; spawn_timeout_sweeper + timeout_resolved_event deleted with the rest of sweeper.rs. register.rs surgery: Refs struct loses 5 dead FunctionRef fields and the sweeper JoinHandle. RPC registrations for list_undelivered, consume_undelivered, ack_delivered, flush_delivered all gone. New FN_CONSUME registration added. Classifier-alias warning check trimmed to live function ids. 15 new tests cover handle_consume (7) + handle_sweep_session (1) + handle_list_pending lazy flip (1) + the pre-existing edits. 65 lib tests pass total.

IiiFunctionExecutor::invoke forwards function_id+args directly to iii.trigger — no marker stamp. Deleted from state.rs: - merge_from_approval_marker_if_needed (and its 4 unit tests) - unverified_marker_targets (and the 2 rule_for tests) - rule_for (only marker-related callers) Deleted from register.rs: - IiiFunctionExecutor's rules: Arc<Vec<InterceptorRule>> field - Boot-time unverified-marker check (refused-to-start guard) Per the spec's Threat Model section: bus access ≡ shell access in the new model. The harness's bus-level access control is the perimeter; defense-in-depth via per-target marker verification is out of scope for v1. Shell-side marker verification is deleted in T13. T12 will fully retire InterceptorRule + the classifier-alias warning loop in register.rs.

config.rs: WorkerConfig now has just {topic, approval_state_scope, default_timeout_ms, rules}. interceptors + sweeper_interval_ms fields gone. InterceptorRule struct kept as a no-op shim with minimal fields so the classifier-alias warning loop in register.rs still compiles — the loop is fed an empty Vec so the body never runs. iii.worker.yaml: replaced interceptor config with a curated default ruleset. Read-only fs/git auto-allowed; shell::exec/exec_bg ask; catch-all asks. Operators stack their own rules on top (last-match wins). register.rs cfg.interceptors → cfg.rules wiring confirmed: policy_rules is seeded from cfg.rules at startup and remains mutable via cascade. Note: InterceptorRule shim + classifier-alias warning loop in register will be deleted in a follow-up pass when the dependent code is fully cleaned up. They're cosmetic at this point — neither has functional effect.

vercel · 2026-05-16T11:59:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
workers	Error		May 16, 2026 0:53am

coderabbitai · 2026-05-16T11:59:56Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d6acc78-ebe5-4695-884c-3a64ee311c17

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch approval-gate-simplification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-16T12:00:06Z

skill-check — worker

3 verified, 22 skipped (no docs/).

1 error across the verified workers.

File	Approximate line	Severity	Violation
`shell/docs/leaves/kill.md`	~7	error	[Terminology.SlopMarketing] Avoid marketing/anthropomorphic phrasing 'Reaping'. Use concrete, technical language. (error)

… (T13) Shell becomes a plain executor. All policy decisions live in approval-gate's rules layer (see the layered ruleset shipped in approval-gate/iii.worker.yaml). Deleted: - shell/src/functions/approval_bypass.rs (marker validation module) - shell/src/functions/classify.rs (classifier handler) - shell/src/arity.rs (arity-aware allowlist matcher) - ApprovalMarker + ClassifyArgvRequest types - ExecRequest.from_approval + ExecBgRequest.from_approval fields - ShellConfig.allowlist + allow_any + denylist_patterns + compiled_denylist fields and their methods (compile_denylist, denylist_hit_reason, allowlist_contains, is_command_allowed) - shell::classify_argv function registration - shell::classify_argv manifest entry - All allowlist/denylist unit tests in shell/src/config.rs - One test in shell/tests/function_handlers.rs that asserted shell rejected unlisted commands Kept (independent surface): - fs::host_root jail + fs.denylist_paths (filesystem path-based denials for shell::fs::* tools; distinct from the exec-policy denylist) - The fs jail validate_fs_jail boot check Threat model implication, per the spec: bus access ≡ shell access. The harness's bus-level access control is the perimeter; any worker on the bus can call shell::exec with arbitrary args. Defense-in-depth via shell-side allowlists is gone for v1. 133 shell lib tests + 328 shell integration tests pass.

consume_approval_stitch now calls approval::consume (single RPC, returns + deletes in one shot). Payload is { session_id } only; no turn_id, no limit (gate enforces a default cap of 50, surfacing overflow via the 'omitted' counter as before). stitch_entries rewritten for the new Record wire shape: reads nested outcome: { kind, detail } instead of the old top-level status + result/error/denial. Outcome kinds: executed, failed, denied, timed_out. function_call_id is the canonical key (call_id retained as a legacy fallback for old rows on disk during the upgrade). render_denial_lines's policy branch reads rule_permission/rule_pattern (matching the renamed Denial::Policy detail shape from T2). The 'policy denied by classifier_fn: classifier_reason' wording becomes 'policy denied by rule <perm> : <pat>'. omission_summary_message reworded: 'approval::flush_delivered' is gone; the natural recovery path is the next-turn consume, which is what the new message points at. 131 lib tests pass (was 122 + 9 failing from old-shape assertions). Pre-existing integration test failure (tests/run_stop.rs imports run_stop module that isn't declared in lib.rs) is unrelated to this refactor.

…sts (T17) Delete 8 integration test files whose assertions targeted the deleted surface (FN_LIST_UNDELIVERED, FN_ACK_DELIVERED, transition_record, InterceptAction, marker plumbing, ...): - tests/approval_lifecycle.rs (engine-backed, used deleted RPCs) - tests/delivery.rs (old delivery RPC surface) - tests/integration.rs (old subscriber wiring) - tests/intercept.rs (decide_intercept_action, classifier path) - tests/lifecycle.rs (record-helper tests on deleted schema) - tests/misc.rs (mixed assertions on old shape) - tests/resolve.rs (old transition_record / Approved status) - tests/state_machine.rs (proptest dep + old schema) The src/* unit tests added across T1-T12 cover the equivalent surface for the new wire shape (Pending|InFlight|Done, Outcome enum, Denial::Policy {rule_permission, rule_pattern}, etc.). Add tests/lifecycle.rs with FIVE E2E lifecycle tests against the in-memory StateBus + FakeExecutor — the integration safety net for the whole refactor: 1. allow_path_end_to_end — pending → InFlight → executed → consume drains 2. deny_path_with_user_corrected_feedback_end_to_end — feedback round-trips 3. timeout_path_lazy_flips_on_consume_end_to_end — past expires_at flips on read 4. cascade_path_end_to_end — two pending rows; allow+always pushes exact-pattern rule; second auto-resolves; consume drains both; pinned: pushed pattern is the originator's argv ('echo go'), NOT blanket '*' 5. allow_rule_short_circuits_with_no_state_write — bonus: Verdict::Allow doesn't touch state 72 approval-gate tests pass across 6 suites (lib + wire + manifest + lifecycle + 2 from common helpers). Net deletion: ~2500 lines of old-shape integration tests gone, ~250 lines of new E2E in their place.

…(T15) ApprovalRow.tsx grows three real-feature additions matching the approval-gate refactor's new wire surface: 1. Allow+Always button → sends {decision:'allow', always:true}. The gate cascades by pushing a runtime Allow rule with the originator's exact pattern and sweeping the session's other Pending rows (auto-resolving any that match). The response carries cascaded:N when the sweep was non-empty; not surfaced in the UI yet. 2. Optional Deny feedback textarea → when the operator types a correction message before clicking deny, the payload becomes {decision:'deny', denial:{kind:'user_corrected', detail:{feedback}}}. That feedback flows verbatim to the LLM via the stitched message on the next consume — the high-value path because the model gets actionable correction, not just 'denied'. 3. Client-side expires_at countdown → the modal shows MM:SS remaining and disables actions on expiry. No server-emitted timed_out frame anymore (the gate's sweeper is gone); the LLM learns of timeout via the next-turn approval::consume that lazy-flips the row. Each pending row is split into its own ApprovalCard component so the countdown hook respects rules-of-hooks (can't sit inside a .map callback). ResolvePayload is constructed inline so the bridge's Record<string, unknown> type accepts it without an explicit cast at the interface boundary. CSS classes for the new states (approval-allow-always, approval-countdown, approval-feedback, approval-expired-note) inherit default styling for now; a follow-up styling pass can polish.

The approval-gate refactor's T16 called for switching the fanout from polling approval::list_pending to subscribing to agent::events for approval_requested / approval_resolved frames. The polling implementation continues to work correctly against the new wire shape — diff_approvals reads function_call_id (now top-level on Record) and list_pending was kept by approval-gate's T8/T11 — so this is a perf cleanup, not a correctness fix. Comment the constant accordingly; defer the actual stream subscription to a follow-up PR. No code change beyond the doc.

Shell stopped enforcing command-level policy; the e2e suite still referenced the removed allowlist/denylist_patterns config fields and pinned behavior that no longer exists. Strip those expectations and update the docs that described them. - config.yaml / config-jailed.yaml: remove allowlist + denylist_patterns (would fail YAML parse against the new ShellConfig schema) - cases-safety.ts: drop 3 allowlist/denylist assertions; keep one positive 'absolute-path command runs' case - cases-edge.ts: 'nonexistent unlisted command' rewritten to assert the OS spawn error (ENOENT), not an allowlist message - cases-exec-break.ts: drop 'unlisted command with extra fields' - cases-exec-sandbox.ts: drop 'sandbox allowlist still enforced' - cases-vuln-repro.ts: drop S-H3 case (advisory denylist) — the finding's whole point was that command policy belongs upstream - ARCHITECTURE.md / e2e README: reflect the new boundary

…ingle drain RPC (#146) * feat(approval-gate): layered policy rules with findLast evaluator Introduce a first-class Rule/Ruleset/Action surface ported from opencode's Permission.evaluate. Rules are wildcard globs over the iii function id (pattern is "*" in v1 — the field is reserved for per- function pattern extractors in a follow-up). Evaluator semantics: findLast across flattened layers, so operators stack a permissive default with more-specific overrides without surgery on the base list. approval-gate - New rules module: Action (allow/deny/ask), Rule, Ruleset alias, wildcard_match (DP, no regex), evaluate(perm, pattern, IntoIterator of &Rule). 16 unit tests covering matching, layering, last-wins. - WorkerConfig.rules: Vec<Rule> (additive; existing InterceptorRule flow unchanged). - apply_policy_rules: pure helper that maps a rule hit to PolicyOutcome (Allow / Deny{rule_permission, rule_pattern} / FallThrough). Allow short-circuits to pass-through; Deny short-circuits with a Denial::Policy that names the matched rule; Ask and no-match fall through to the existing per-function interceptor flow. - handle_subscriber consults the rules layer first, before decide_intercept_action. * feat(approval-gate): cascade allow on `always: true` resolve reply When a user resolves a pending call with decision=allow + always=true, approval-gate now pushes a runtime Allow rule for the call's function id and sweeps the same session's pending records, auto-resolving every other one the new rule covers. One click instead of N. approval-gate - approve_and_execute() extracted from handle_resolve's allow branch as a reusable async helper. Both the user-driven allow path and the cascade sweep drive state through the same transitions (approved -> invoke -> executed | failed). - Shared policy ruleset wrapped as Arc<RwLock<Ruleset>> so reply-time mutation is safe across the subscriber + resolve closures. Read guards are scoped so they never cross an .await (std::sync::RwLock is not async-safe to hold across suspension). - cascade_allow_for_session() pushes the new rule under the write lock, snapshots session pending via list_prefix, and runs approve_and_execute for each newly-Allow record. The originator is skipped (already resolved above). Per-record state-write failures are logged and the rest of the cascade continues. - handle_resolve response carries `cascaded: N` when the sweep resolved at least one extra record; omitted otherwise so the one-shot path stays unchanged. Cascade scope is intentionally narrow for v1: - function-id-only matching (pattern "*"), mirrors the v1 rules surface - same session only - in-memory rule (no cross-restart persistence) - allow + always; deny + always is out of scope Five new tests cover the cascade decision tree (no-always, same-session match, cross-session non-effect, originator-skip, terminal-record skip). * chore(approval-gate): remove legacy migration surface The Denial refactor replaced decision_reason with a typed structure, but several legacy hooks remained in place. They are now removed because the underlying schemas they protected against (pre-trigger-model status strings, pre-Denial flat decision_reason field) cannot occur in any state store this branch is wired to. approval-gate - Drop migrate_legacy_record entirely. The pre-trigger-model status="allow"/"deny" rename and the decision_reason -> Denial::Legacy lift both go away. Old persisted records (if any) now surface as filtered-out orphans rather than silently mistranslated. - Drop Denial::Legacy variant from the enum. - Drop legacy_migrated flag from records. - Drop the four migrate_legacy_record_* unit tests and the handle_list_undelivered_persists_migrated_legacy_record integration test. - handle_list_pending no longer pre-filters legacy shapes; orphan records lacking a session_id stamp are dropped uniformly. - timeout_resolved_event drops the decision_reason: "timeout" field (status: "timed_out" is self-describing). - handle_sweep_session drops the optional `reason` payload field. The cause (session_deleted vs run_stopped) is the caller's concern; the swept timed_out records carry no Denial. - skills/sweep_session.md updated. turn-orchestrator - approval_stitching drops the legacy denial-kind branch and the legacy_migrated note line; the corresponding stitch test is removed. - run_stop no longer forwards reason: "run_stopped" on the approval::sweep_session payload; it logs the cause locally instead. - The run_stop integration test now asserts the absence of the reason field on the sweep payload. harness-types (6 vendored copies, byte-identical) - Drop the Legacy variant and its docblock line from each agent_event.rs copy. * feat(approval-gate): add typed Record, Status, Next schema New src/record.rs defines the persisted approval record as a typed struct alongside its lifecycle Status enum and a Next enum that pairs each target status with its outcome data. Wire format is byte-identical to the existing serde_json::Value blobs — operators can adopt the typed shape incrementally without changing any persisted state. approval-gate - New module record: - Status enum (Pending / Approved / Executed / Failed / Denied / TimedOut), serde snake_case, with is_terminal() helper. - Record struct mirroring the historical JSON keys field-for-field, optional outcome fields scoped to terminal statuses, serde skip-when-None so existing rows round-trip cleanly. - Next enum that bundles (target_status, outcome_payload) so the type system rules out impossible transitions (e.g. Executed without a result, Denied without a Denial). Replaces the old (status_string, Option<result>, Option<error>, Option<denial>) parallel-Option signature when callers opt in. - Record::to_value / Record::from_value bridge to the Value blobs the iii state bus still expects. - 8 unit tests covering serde round-trips, status semantics, expiry saturation, forward-compat with unknown fields. - lib.rs re-exports Next, Record, Status. Handler migration to the typed schema is a separate change; this commit just makes the types available so future work can lift Value access into typed field access incrementally rather than in one big diff. * refactor(approval-gate): extract wire.rs from lib.rs First step of breaking lib.rs (~4300 lines) into focused modules. This commit lifts the wire-format types and small wire-shape helpers into their own module — pure data shapes with no I/O, no iii-sdk dependency, no async — so downstream workers that only need to understand the approval-gate protocol can depend on a small surface. Moved (lib.rs -> wire.rs): - Denial (structured deny payload) - IncomingCall + requires_approval() - Decision (Allow | Deny(Denial)) - WireDecision (allow|deny wire enum) - pending_key() - extract_call() - block_reply_for() All public items stay re-exported from the crate root, so every existing call site (handlers, tests, integration tests, downstream crates) continues to work without import changes. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + session + providers + harness-tui + shell all pass. * refactor(approval-gate): extract lifecycle.rs from lib.rs Second step of breaking lib.rs into focused modules. This commit lifts the persisted-record lifecycle helpers — pure Value-blob constructors and transitions, no I/O, no iii-sdk dependency — into their own module. Moved (lib.rs -> lifecycle.rs): - is_terminal_status() - build_pending_record() - transition_record() + transition_record_with_now() - collect_timed_out_for_sweep() - maybe_flip_timed_out() All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass. * refactor(approval-gate): extract state.rs from lib.rs Third step of breaking lib.rs into focused modules. This commit lifts the iii-backed state-bus and function-executor implementations into their own module, along with the `__from_approval` marker plumbing and the boot-time marker-target safety check. Moved (lib.rs -> state.rs): - StateBus trait + IiiStateBus impl (iii state::* wrapper) - FunctionExecutor trait + IiiFunctionExecutor impl (iii.trigger wrapper) - rule_for() (lookup helper used by IiiFunctionExecutor) - merge_from_approval_marker_if_needed() (marker payload composer) - unverified_marker_targets() (boot guard) The StateBus / FunctionExecutor traits are kept exactly as they were in lib.rs — they exist purely as test seams so unit tests can swap in InMemoryStateBus / FakeExecutor. No new abstractions added. Public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass (turn-orchestrator's pre-existing dual_write flake notwithstanding). * refactor(approval-gate): extract intercept.rs from lib.rs Fourth step of breaking lib.rs into focused modules. This commit lifts the intercept decision flow into its own module — the three layers that decide what the gate does with an incoming function call: policy rules, per-function interceptor rules, and classifier replies, plus the async handle_intercept that writes the pending record. Moved (lib.rs -> intercept.rs): - InterceptAction enum + decide_intercept_action() - PolicyOutcome enum + apply_policy_rules() - ClassifierDecision enum + interpret_classifier_reply() - handle_intercept() handle_intercept stays public (re-exported from the crate root). The pub(crate) helpers stay pub(crate) and are imported back into lib.rs for the subscriber closure in register(). Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green. * refactor(approval-gate): extract resolve.rs from lib.rs Fifth step of breaking lib.rs into focused modules. This commit lifts the approval::resolve flow into its own module. Moved (lib.rs -> resolve.rs): - handle_lookup_record() - handle_resolve() - cascade_allow_for_session() (the always=true sweep) - approve_and_execute() (shared by user-driven allow + cascade) handle_resolve and handle_lookup_record stay public via the crate root re-exports. approve_and_execute and cascade_allow_for_session stay private to the resolve module. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract delivery.rs from lib.rs Sixth step of breaking lib.rs into focused modules. This commit lifts the delivery-tracking RPCs into their own module. Moved (lib.rs -> delivery.rs): - handle_list_pending() - handle_list_undelivered() - handle_ack_delivered() - handle_consume_undelivered() - handle_flush_delivered() - handle_sweep_session() - LIST_UNDELIVERED_DEFAULT_LIMIT All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract sweeper.rs from lib.rs Seventh step of breaking lib.rs into focused modules. This commit lifts the periodic timeout sweeper and the small iii-stream helpers it owns into their own module. Moved (lib.rs -> sweeper.rs): - spawn_timeout_sweeper() - timeout_resolved_event() - write_event() - write_hook_reply() - uuid_like() All five stay pub(crate) — only the register() wiring in lib.rs and the resolve flow need them. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract register.rs from lib.rs Final step of breaking lib.rs into focused modules. This commit lifts the iii function/trigger wiring — the heart of the worker startup path — into its own module. Moved (lib.rs -> register.rs): - register() (~420 lines: the subscriber closure + 8 function registrations + trigger registration + sweeper spawn) - Refs struct (the handle bag returned to the binary) - FN_RESOLVE / FN_LIST_PENDING / FN_LIST_UNDELIVERED / FN_CONSUME_UNDELIVERED / FN_ACK_DELIVERED / FN_FLUSH_DELIVERED / FN_SWEEP_SESSION / FN_LOOKUP_RECORD constants - STATE_SCOPE constant lib.rs is now ~50 lines of module declarations + re-exports + ~2580 lines of inline tests. The test mod uses super::* plus local serde_json and std::sync imports. All public items stay re-exported from the crate root so the binary's `use approval_gate::register;` keeps working unchanged. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green (turn-orchestrator's pre-existing dual_write flake notwithstanding). * refactor(approval-gate): move pub(crate)-using tests inline First step of relocating the 2580-line inline test block from lib.rs into more focused homes. This commit moves the 18 tests that touch pub(crate) helpers — ones that can't live in tests/*.rs because integration tests can only see pub items — into the #[cfg(test)] mod tests block of whichever production module owns the helper they exercise. intercept.rs gained: - interpret_classifier_reply_reads_decision_tags - decide_intercept_action_* (5 tests) - apply_policy_rules_* (5 tests) state.rs gained: - merge_from_approval_* (4 tests) - rule_for_returns_matching_rule / rule_for_returns_none_when_absent sweeper.rs gained: - timeout_resolved_event_shape lib.rs loses those 18 tests. 141 approval-gate tests pass (no duplicate counts — each test runs once). turn-orchestrator + harness + shell verified green. Public-API tests (the remaining ~96 in lib.rs) move to tests/*.rs in a follow-up commit so they can share fakes via a common module. * refactor(approval-gate): move public-API tests to tests/*.rs Final test split. The 86 remaining inline tests in lib.rs that exercise only the crate's `pub` surface now live in tests/*.rs, organized by the area they cover. The state-machine proptest gets its own file. lib.rs drops from 2369 to 52 lines (just module declarations + re-exports). New tests/ layout: - common/mod.rs — shared fakes: FakeExecutor, InMemoryStateBus, FailingStateBus, sample_call(), empty_policy_rules() - lifecycle.rs — 19 tests: build_pending_record, transition_record, maybe_flip_timed_out, collect_timed_out_for_sweep, is_terminal_status, pending_key - wire.rs — 10 tests: extract_call, block_reply_for, requires_approval - intercept.rs — 9 tests: handle_intercept (replays, fail-closed, session_id stamping, force_pending) - resolve.rs — 19 tests: handle_resolve, cascade-on-`always`, handle_lookup_record - delivery.rs — 25 tests: list_pending / list_undelivered / ack_delivered / consume_undelivered / flush_delivered / sweep_session - misc.rs — 4 tests: FN_* constants, unverified_marker_targets, FakeExecutor smoke test - state_machine.rs — the proptest with the four lifecycle invariants Each tests/*.rs uses `mod common;` for shared fakes — only one source of truth for in-memory bus/executor stand-ins. 141 approval-gate tests pass across 13 test binaries (previously 6). turn-orchestrator + harness + shell verified green. * feat(approval-gate): new Record schema with Pending|InFlight|Done lifecycle Pending → InFlight → Done(Outcome). InFlight is the intermediate persist that closes the dup-exec race within a worker process — a second approval::resolve arriving during the invoke await sees the row is non-Pending and bails. Outcome is a tagged enum (Executed/Failed/Denied/TimedOut). resolved_at is stamped on the first non-Pending transition for deterministic multi-row consume ordering. lifecycle.rs is deleted; its only surviving helper (flipped_to_timed_out_if_expired) is now a Record method. * fix(approval-gate): keep lifecycle.rs as transitional compat shim Task 1 of the simplification plan deletes lifecycle.rs entirely, but the plan was a bit optimistic: dependent modules (intercept, resolve, delivery, sweeper) still import its helpers (build_pending_record, transition_record, maybe_flip_timed_out, is_terminal_status, collect_timed_out_for_sweep) and the crate fails to compile without them. Cargo can't run record.rs's unit tests if the library itself won't link. Restore lifecycle.rs as a transitional shim. Tasks 5/6/8/11 will progressively migrate each callsite to the new Record API; whichever of those tasks lands last deletes lifecycle.rs for real. * feat(approval-gate): Denial::Policy carries rule_permission + rule_pattern The classifier surface is being deleted in favor of the layered rules engine; rename the Policy variant's detail fields to match the actual data they'll carry once T5 lands. For now, classifier callsites (intercept.rs, register.rs) map their old reason/fn locals into the new field names — transient mapping, deleted in T5. Test callsites in tests/wire.rs and tests/lifecycle.rs updated to use representative rule_permission/rule_pattern values. Note: IncomingCall.approval_required + requires_approval() field/method are NOT removed in this commit (the plan's T2 calls for it but intercept.rs's decide_intercept_action still depends on them). They disappear in T5 when intercept.rs is rewritten end-to-end. * feat(approval-gate): pattern_for(function_id, args) extractor For shell::exec/shell::exec_bg, derive a pattern string from {command, args} so layered rules can match argv-level patterns like 'git status*'. Other function ids default to '*'. Malformed/missing args fall back to '*' (matches only wildcard rules). 8 new tests cover all branches: join+args, single-string command, empty args list, missing command, malformed args, non-shell function id, and the documented space-conflation case. * feat(approval-gate): StateBus::delete primitive Required by the new approval::consume RPC (T8), which returns Done rows and deletes them in the same call. IiiStateBus impl maps to the existing state::delete RPC (used elsewhere in the codebase, e.g. auth-credentials). Test fakes (InMemoryStateBus, FailingStateBus) gain matching impls — InMemory removes the key idempotently, Failing errors to match its other primitives. * feat(approval-gate): verdict-driven handle_intercept The classifier surface and per-function InterceptorRule decision flow are gone. handle_intercept now reads the verdict via the new lib-level verdict_for(function_id, args, ruleset) helper: Verdict::Allow → {block:false}, no state write Verdict::Deny → {block:true, denial:Policy{rule_permission, rule_pattern}} Verdict::Ask → write Pending row + reply {block:true, status:pending} Replay defense extended for the new 3-state schema: Pending → replay:in_flight, status:pending InFlight → replay:in_flight, status:in_flight (new) Done → replay:already_resolved, status:done intercept.rs drops InterceptAction, decide_intercept_action, ClassifierDecision, interpret_classifier_reply, PolicyOutcome, apply_policy_rules. register.rs's subscriber closure collapses the old policy_outcome+decide_intercept+classifier dance into one handle_intercept call. resolve.rs's cascade loop reuses verdict_for instead of the deleted apply_policy_rules. State-write failure still fails closed with Denial::StateError. 7 new tests cover all four verdict branches + the three replay states + fail-closed behavior. Note: InterceptorRule config struct and intercept_rules variable are still in place for T12 to fully remove. * feat(approval-gate): three-phase resolve + cascade exact-pattern (T6+T7) handle_resolve rewritten to typed Record + three-phase allow path: 1. write InFlight (closes the dup-exec race within a worker process) 2. iii.trigger(function_id, args) and await 3. write Done(Executed{result}) or Done(Failed{error}) Deny is a single Pending → Done(Denied{denial}) write — no invoke, no InFlight needed. WireDecision::Deny with no denial in payload defaults to UserRejected. UserCorrected{feedback} round-trips. Dup-exec guard: handle_resolve refuses non-Pending rows with in_flight (concurrent resolve mid-invoke) or already_resolved (terminal). Cascade allow on always:true (T7) now pushes a runtime Allow rule with the originator's EXACT pattern via rules::pattern_for(args) — not the prior blanket pattern:'*'. 'Always allow git status' no longer auto-allows rm -rf / via the same shell::exec function id. Lock-ordering invariant pinned in module docs: never hold the ruleset guard across .await. * feat(approval-gate): approval::consume + strip delivery dead RPCs + delete sweeper.rs/lifecycle.rs (T8+T9+T11) Three behavioral changes folded into one commit because they share files: T8 (new approval::consume RPC): Three-phase drain — gather Done candidates, sort by resolved_at + cap, delete-and-return. Defensive session_id filter. Lazy timeout flip on read. Default cap CONSUME_DEFAULT_LIMIT=50 bounds the response size against MB-scale stdout payloads. T9 (delivery.rs strip): handle_list_undelivered, handle_consume_undelivered, handle_ack_delivered, handle_flush_delivered all deleted. LIST_UNDELIVERED_DEFAULT_LIMIT constant gone. handle_list_pending rewritten on the typed Record API with lazy timeout flip on read. handle_sweep_session rewritten on new schema — flips Pending+InFlight rows to Done(TimedOut). T11 (sweeper.rs deleted): Background polling task gone. Timeouts now flip lazily on read in handle_resolve / handle_consume / handle_list_pending. UI handles expires_at countdown client-side; the LLM learns of timeouts on the next consume. lifecycle.rs transient shim also deleted (no more callers). Stream helpers (uuid_like, write_event, write_hook_reply) move into register.rs as their only consumer; spawn_timeout_sweeper + timeout_resolved_event deleted with the rest of sweeper.rs. register.rs surgery: Refs struct loses 5 dead FunctionRef fields and the sweeper JoinHandle. RPC registrations for list_undelivered, consume_undelivered, ack_delivered, flush_delivered all gone. New FN_CONSUME registration added. Classifier-alias warning check trimmed to live function ids. 15 new tests cover handle_consume (7) + handle_sweep_session (1) + handle_list_pending lazy flip (1) + the pre-existing edits. 65 lib tests pass total. * feat(approval-gate): strip __from_approval marker plumbing (T10) IiiFunctionExecutor::invoke forwards function_id+args directly to iii.trigger — no marker stamp. Deleted from state.rs: - merge_from_approval_marker_if_needed (and its 4 unit tests) - unverified_marker_targets (and the 2 rule_for tests) - rule_for (only marker-related callers) Deleted from register.rs: - IiiFunctionExecutor's rules: Arc<Vec<InterceptorRule>> field - Boot-time unverified-marker check (refused-to-start guard) Per the spec's Threat Model section: bus access ≡ shell access in the new model. The harness's bus-level access control is the perimeter; defense-in-depth via per-target marker verification is out of scope for v1. Shell-side marker verification is deleted in T13. T12 will fully retire InterceptorRule + the classifier-alias warning loop in register.rs. * feat(approval-gate): strip config to topic+scope+timeout+rules (T12) config.rs: WorkerConfig now has just {topic, approval_state_scope, default_timeout_ms, rules}. interceptors + sweeper_interval_ms fields gone. InterceptorRule struct kept as a no-op shim with minimal fields so the classifier-alias warning loop in register.rs still compiles — the loop is fed an empty Vec so the body never runs. iii.worker.yaml: replaced interceptor config with a curated default ruleset. Read-only fs/git auto-allowed; shell::exec/exec_bg ask; catch-all asks. Operators stack their own rules on top (last-match wins). register.rs cfg.interceptors → cfg.rules wiring confirmed: policy_rules is seeded from cfg.rules at startup and remains mutable via cascade. Note: InterceptorRule shim + classifier-alias warning loop in register will be deleted in a follow-up pass when the dependent code is fully cleaned up. They're cosmetic at this point — neither has functional effect. * feat(shell): strip classify_argv / allowlist / __from_approval marker (T13) Shell becomes a plain executor. All policy decisions live in approval-gate's rules layer (see the layered ruleset shipped in approval-gate/iii.worker.yaml). Deleted: - shell/src/functions/approval_bypass.rs (marker validation module) - shell/src/functions/classify.rs (classifier handler) - shell/src/arity.rs (arity-aware allowlist matcher) - ApprovalMarker + ClassifyArgvRequest types - ExecRequest.from_approval + ExecBgRequest.from_approval fields - ShellConfig.allowlist + allow_any + denylist_patterns + compiled_denylist fields and their methods (compile_denylist, denylist_hit_reason, allowlist_contains, is_command_allowed) - shell::classify_argv function registration - shell::classify_argv manifest entry - All allowlist/denylist unit tests in shell/src/config.rs - One test in shell/tests/function_handlers.rs that asserted shell rejected unlisted commands Kept (independent surface): - fs::host_root jail + fs.denylist_paths (filesystem path-based denials for shell::fs::* tools; distinct from the exec-policy denylist) - The fs jail validate_fs_jail boot check Threat model implication, per the spec: bus access ≡ shell access. The harness's bus-level access control is the perimeter; any worker on the bus can call shell::exec with arbitrary args. Defense-in-depth via shell-side allowlists is gone for v1. 133 shell lib tests + 328 shell integration tests pass. * feat(turn-orchestrator): switch stitch to approval::consume (T14) consume_approval_stitch now calls approval::consume (single RPC, returns + deletes in one shot). Payload is { session_id } only; no turn_id, no limit (gate enforces a default cap of 50, surfacing overflow via the 'omitted' counter as before). stitch_entries rewritten for the new Record wire shape: reads nested outcome: { kind, detail } instead of the old top-level status + result/error/denial. Outcome kinds: executed, failed, denied, timed_out. function_call_id is the canonical key (call_id retained as a legacy fallback for old rows on disk during the upgrade). render_denial_lines's policy branch reads rule_permission/rule_pattern (matching the renamed Denial::Policy detail shape from T2). The 'policy denied by classifier_fn: classifier_reason' wording becomes 'policy denied by rule <perm> : <pat>'. omission_summary_message reworded: 'approval::flush_delivered' is gone; the natural recovery path is the next-turn consume, which is what the new message points at. 131 lib tests pass (was 122 + 9 failing from old-shape assertions). Pre-existing integration test failure (tests/run_stop.rs imports run_stop module that isn't declared in lib.rs) is unrelated to this refactor. * test(approval-gate): E2E lifecycle tests + delete dead integration tests (T17) Delete 8 integration test files whose assertions targeted the deleted surface (FN_LIST_UNDELIVERED, FN_ACK_DELIVERED, transition_record, InterceptAction, marker plumbing, ...): - tests/approval_lifecycle.rs (engine-backed, used deleted RPCs) - tests/delivery.rs (old delivery RPC surface) - tests/integration.rs (old subscriber wiring) - tests/intercept.rs (decide_intercept_action, classifier path) - tests/lifecycle.rs (record-helper tests on deleted schema) - tests/misc.rs (mixed assertions on old shape) - tests/resolve.rs (old transition_record / Approved status) - tests/state_machine.rs (proptest dep + old schema) The src/* unit tests added across T1-T12 cover the equivalent surface for the new wire shape (Pending|InFlight|Done, Outcome enum, Denial::Policy {rule_permission, rule_pattern}, etc.). Add tests/lifecycle.rs with FIVE E2E lifecycle tests against the in-memory StateBus + FakeExecutor — the integration safety net for the whole refactor: 1. allow_path_end_to_end — pending → InFlight → executed → consume drains 2. deny_path_with_user_corrected_feedback_end_to_end — feedback round-trips 3. timeout_path_lazy_flips_on_consume_end_to_end — past expires_at flips on read 4. cascade_path_end_to_end — two pending rows; allow+always pushes exact-pattern rule; second auto-resolves; consume drains both; pinned: pushed pattern is the originator's argv ('echo go'), NOT blanket '*' 5. allow_rule_short_circuits_with_no_state_write — bonus: Verdict::Allow doesn't touch state 72 approval-gate tests pass across 6 suites (lib + wire + manifest + lifecycle + 2 from common helpers). Net deletion: ~2500 lines of old-shape integration tests gone, ~250 lines of new E2E in their place. * feat(harness/web): Allow+Always, Deny feedback, expires_at countdown (T15) ApprovalRow.tsx grows three real-feature additions matching the approval-gate refactor's new wire surface: 1. Allow+Always button → sends {decision:'allow', always:true}. The gate cascades by pushing a runtime Allow rule with the originator's exact pattern and sweeping the session's other Pending rows (auto-resolving any that match). The response carries cascaded:N when the sweep was non-empty; not surfaced in the UI yet. 2. Optional Deny feedback textarea → when the operator types a correction message before clicking deny, the payload becomes {decision:'deny', denial:{kind:'user_corrected', detail:{feedback}}}. That feedback flows verbatim to the LLM via the stitched message on the next consume — the high-value path because the model gets actionable correction, not just 'denied'. 3. Client-side expires_at countdown → the modal shows MM:SS remaining and disables actions on expiry. No server-emitted timed_out frame anymore (the gate's sweeper is gone); the LLM learns of timeout via the next-turn approval::consume that lazy-flips the row. Each pending row is split into its own ApprovalCard component so the countdown hook respects rules-of-hooks (can't sit inside a .map callback). ResolvePayload is constructed inline so the bridge's Record<string, unknown> type accepts it without an explicit cast at the interface boundary. CSS classes for the new states (approval-allow-always, approval-countdown, approval-feedback, approval-expired-note) inherit default styling for now; a follow-up styling pass can polish. * docs(harness/fanout): note T16 stream-rewrite as deferred follow-up The approval-gate refactor's T16 called for switching the fanout from polling approval::list_pending to subscribing to agent::events for approval_requested / approval_resolved frames. The polling implementation continues to work correctly against the new wire shape — diff_approvals reads function_call_id (now top-level on Record) and list_pending was kept by approval-gate's T8/T11 — so this is a perf cleanup, not a correctness fix. Comment the constant accordingly; defer the actual stream subscription to a follow-up PR. No code change beyond the doc. * fix(shell/e2e): drop allowlist/denylist tests + configs Shell stopped enforcing command-level policy; the e2e suite still referenced the removed allowlist/denylist_patterns config fields and pinned behavior that no longer exists. Strip those expectations and update the docs that described them. - config.yaml / config-jailed.yaml: remove allowlist + denylist_patterns (would fail YAML parse against the new ShellConfig schema) - cases-safety.ts: drop 3 allowlist/denylist assertions; keep one positive 'absolute-path command runs' case - cases-edge.ts: 'nonexistent unlisted command' rewritten to assert the OS spawn error (ENOENT), not an allowlist message - cases-exec-break.ts: drop 'unlisted command with extra fields' - cases-exec-sandbox.ts: drop 'sandbox allowlist still enforced' - cases-vuln-repro.ts: drop S-H3 case (advisory denylist) — the finding's whole point was that command policy belongs upstream - ARCHITECTURE.md / e2e README: reflect the new boundary

ytallo added 24 commits May 15, 2026 18:19

ytallo added 3 commits May 16, 2026 09:10

ytallo added 2 commits May 16, 2026 09:22

vercel Bot had a problem deploying to Preview May 16, 2026 12:24 Failure

vercel Bot had a problem deploying to Preview May 16, 2026 12:53 Failure

ytallo merged commit 8f9d090 into feat/shell-allowlist-approval May 16, 2026
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

approval-gate simplification — rules engine, three-state lifecycle, single drain RPC#146

approval-gate simplification — rules engine, three-state lifecycle, single drain RPC#146
ytallo merged 30 commits into
feat/shell-allowlist-approvalfrom
approval-gate-simplification

ytallo commented May 16, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ytallo commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Decisions worth a careful look

Notable change surface

Test plan

Uh oh!

vercel Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

skill-check — worker

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ytallo commented May 16, 2026 •

edited

Loading

vercel Bot commented May 16, 2026 •

edited

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading