approval-gate simplification — rules engine, three-state lifecycle, single drain RPC#146
Conversation
Introduce a first-class Rule/Ruleset/Action surface ported from
opencode's Permission.evaluate. Rules are wildcard globs over the iii
function id (pattern is "*" in v1 — the field is reserved for per-
function pattern extractors in a follow-up). Evaluator semantics:
findLast across flattened layers, so operators stack a permissive
default with more-specific overrides without surgery on the base list.
approval-gate
- New rules module: Action (allow/deny/ask), Rule, Ruleset alias,
wildcard_match (DP, no regex), evaluate(perm, pattern, IntoIterator
of &Rule). 16 unit tests covering matching, layering, last-wins.
- WorkerConfig.rules: Vec<Rule> (additive; existing InterceptorRule
flow unchanged).
- apply_policy_rules: pure helper that maps a rule hit to PolicyOutcome
(Allow / Deny{rule_permission, rule_pattern} / FallThrough).
Allow short-circuits to pass-through; Deny short-circuits with a
Denial::Policy that names the matched rule; Ask and no-match fall
through to the existing per-function interceptor flow.
- handle_subscriber consults the rules layer first, before
decide_intercept_action.
When a user resolves a pending call with decision=allow + always=true, approval-gate now pushes a runtime Allow rule for the call's function id and sweeps the same session's pending records, auto-resolving every other one the new rule covers. One click instead of N. approval-gate - approve_and_execute() extracted from handle_resolve's allow branch as a reusable async helper. Both the user-driven allow path and the cascade sweep drive state through the same transitions (approved -> invoke -> executed | failed). - Shared policy ruleset wrapped as Arc<RwLock<Ruleset>> so reply-time mutation is safe across the subscriber + resolve closures. Read guards are scoped so they never cross an .await (std::sync::RwLock is not async-safe to hold across suspension). - cascade_allow_for_session() pushes the new rule under the write lock, snapshots session pending via list_prefix, and runs approve_and_execute for each newly-Allow record. The originator is skipped (already resolved above). Per-record state-write failures are logged and the rest of the cascade continues. - handle_resolve response carries `cascaded: N` when the sweep resolved at least one extra record; omitted otherwise so the one-shot path stays unchanged. Cascade scope is intentionally narrow for v1: - function-id-only matching (pattern "*"), mirrors the v1 rules surface - same session only - in-memory rule (no cross-restart persistence) - allow + always; deny + always is out of scope Five new tests cover the cascade decision tree (no-always, same-session match, cross-session non-effect, originator-skip, terminal-record skip).
The Denial refactor replaced decision_reason with a typed structure, but several legacy hooks remained in place. They are now removed because the underlying schemas they protected against (pre-trigger-model status strings, pre-Denial flat decision_reason field) cannot occur in any state store this branch is wired to. approval-gate - Drop migrate_legacy_record entirely. The pre-trigger-model status="allow"/"deny" rename and the decision_reason -> Denial::Legacy lift both go away. Old persisted records (if any) now surface as filtered-out orphans rather than silently mistranslated. - Drop Denial::Legacy variant from the enum. - Drop legacy_migrated flag from records. - Drop the four migrate_legacy_record_* unit tests and the handle_list_undelivered_persists_migrated_legacy_record integration test. - handle_list_pending no longer pre-filters legacy shapes; orphan records lacking a session_id stamp are dropped uniformly. - timeout_resolved_event drops the decision_reason: "timeout" field (status: "timed_out" is self-describing). - handle_sweep_session drops the optional `reason` payload field. The cause (session_deleted vs run_stopped) is the caller's concern; the swept timed_out records carry no Denial. - skills/sweep_session.md updated. turn-orchestrator - approval_stitching drops the legacy denial-kind branch and the legacy_migrated note line; the corresponding stitch test is removed. - run_stop no longer forwards reason: "run_stopped" on the approval::sweep_session payload; it logs the cause locally instead. - The run_stop integration test now asserts the absence of the reason field on the sweep payload. harness-types (6 vendored copies, byte-identical) - Drop the Legacy variant and its docblock line from each agent_event.rs copy.
New src/record.rs defines the persisted approval record as a typed
struct alongside its lifecycle Status enum and a Next enum that pairs
each target status with its outcome data. Wire format is byte-identical
to the existing serde_json::Value blobs — operators can adopt the
typed shape incrementally without changing any persisted state.
approval-gate
- New module record:
- Status enum (Pending / Approved / Executed / Failed / Denied /
TimedOut), serde snake_case, with is_terminal() helper.
- Record struct mirroring the historical JSON keys field-for-field,
optional outcome fields scoped to terminal statuses, serde
skip-when-None so existing rows round-trip cleanly.
- Next enum that bundles (target_status, outcome_payload) so the
type system rules out impossible transitions (e.g. Executed without
a result, Denied without a Denial). Replaces the old
(status_string, Option<result>, Option<error>, Option<denial>)
parallel-Option signature when callers opt in.
- Record::to_value / Record::from_value bridge to the Value blobs the
iii state bus still expects.
- 8 unit tests covering serde round-trips, status semantics, expiry
saturation, forward-compat with unknown fields.
- lib.rs re-exports Next, Record, Status.
Handler migration to the typed schema is a separate change; this commit
just makes the types available so future work can lift Value access
into typed field access incrementally rather than in one big diff.
First step of breaking lib.rs (~4300 lines) into focused modules. This commit lifts the wire-format types and small wire-shape helpers into their own module — pure data shapes with no I/O, no iii-sdk dependency, no async — so downstream workers that only need to understand the approval-gate protocol can depend on a small surface. Moved (lib.rs -> wire.rs): - Denial (structured deny payload) - IncomingCall + requires_approval() - Decision (Allow | Deny(Denial)) - WireDecision (allow|deny wire enum) - pending_key() - extract_call() - block_reply_for() All public items stay re-exported from the crate root, so every existing call site (handlers, tests, integration tests, downstream crates) continues to work without import changes. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + session + providers + harness-tui + shell all pass.
Second step of breaking lib.rs into focused modules. This commit lifts the persisted-record lifecycle helpers — pure Value-blob constructors and transitions, no I/O, no iii-sdk dependency — into their own module. Moved (lib.rs -> lifecycle.rs): - is_terminal_status() - build_pending_record() - transition_record() + transition_record_with_now() - collect_timed_out_for_sweep() - maybe_flip_timed_out() All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass.
Third step of breaking lib.rs into focused modules. This commit lifts the iii-backed state-bus and function-executor implementations into their own module, along with the `__from_approval` marker plumbing and the boot-time marker-target safety check. Moved (lib.rs -> state.rs): - StateBus trait + IiiStateBus impl (iii state::* wrapper) - FunctionExecutor trait + IiiFunctionExecutor impl (iii.trigger wrapper) - rule_for() (lookup helper used by IiiFunctionExecutor) - merge_from_approval_marker_if_needed() (marker payload composer) - unverified_marker_targets() (boot guard) The StateBus / FunctionExecutor traits are kept exactly as they were in lib.rs — they exist purely as test seams so unit tests can swap in InMemoryStateBus / FakeExecutor. No new abstractions added. Public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass (turn-orchestrator's pre-existing dual_write flake notwithstanding).
Fourth step of breaking lib.rs into focused modules. This commit lifts the intercept decision flow into its own module — the three layers that decide what the gate does with an incoming function call: policy rules, per-function interceptor rules, and classifier replies, plus the async handle_intercept that writes the pending record. Moved (lib.rs -> intercept.rs): - InterceptAction enum + decide_intercept_action() - PolicyOutcome enum + apply_policy_rules() - ClassifierDecision enum + interpret_classifier_reply() - handle_intercept() handle_intercept stays public (re-exported from the crate root). The pub(crate) helpers stay pub(crate) and are imported back into lib.rs for the subscriber closure in register(). Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green.
Fifth step of breaking lib.rs into focused modules. This commit lifts the approval::resolve flow into its own module. Moved (lib.rs -> resolve.rs): - handle_lookup_record() - handle_resolve() - cascade_allow_for_session() (the always=true sweep) - approve_and_execute() (shared by user-driven allow + cascade) handle_resolve and handle_lookup_record stay public via the crate root re-exports. approve_and_execute and cascade_allow_for_session stay private to the resolve module. Pure move, no behavior change. 141 approval-gate tests pass.
Sixth step of breaking lib.rs into focused modules. This commit lifts the delivery-tracking RPCs into their own module. Moved (lib.rs -> delivery.rs): - handle_list_pending() - handle_list_undelivered() - handle_ack_delivered() - handle_consume_undelivered() - handle_flush_delivered() - handle_sweep_session() - LIST_UNDELIVERED_DEFAULT_LIMIT All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass.
Seventh step of breaking lib.rs into focused modules. This commit lifts the periodic timeout sweeper and the small iii-stream helpers it owns into their own module. Moved (lib.rs -> sweeper.rs): - spawn_timeout_sweeper() - timeout_resolved_event() - write_event() - write_hook_reply() - uuid_like() All five stay pub(crate) — only the register() wiring in lib.rs and the resolve flow need them. Pure move, no behavior change. 141 approval-gate tests pass.
Final step of breaking lib.rs into focused modules. This commit lifts the iii function/trigger wiring — the heart of the worker startup path — into its own module. Moved (lib.rs -> register.rs): - register() (~420 lines: the subscriber closure + 8 function registrations + trigger registration + sweeper spawn) - Refs struct (the handle bag returned to the binary) - FN_RESOLVE / FN_LIST_PENDING / FN_LIST_UNDELIVERED / FN_CONSUME_UNDELIVERED / FN_ACK_DELIVERED / FN_FLUSH_DELIVERED / FN_SWEEP_SESSION / FN_LOOKUP_RECORD constants - STATE_SCOPE constant lib.rs is now ~50 lines of module declarations + re-exports + ~2580 lines of inline tests. The test mod uses super::* plus local serde_json and std::sync imports. All public items stay re-exported from the crate root so the binary's `use approval_gate::register;` keeps working unchanged. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green (turn-orchestrator's pre-existing dual_write flake notwithstanding).
First step of relocating the 2580-line inline test block from lib.rs into more focused homes. This commit moves the 18 tests that touch pub(crate) helpers — ones that can't live in tests/*.rs because integration tests can only see pub items — into the #[cfg(test)] mod tests block of whichever production module owns the helper they exercise. intercept.rs gained: - interpret_classifier_reply_reads_decision_tags - decide_intercept_action_* (5 tests) - apply_policy_rules_* (5 tests) state.rs gained: - merge_from_approval_* (4 tests) - rule_for_returns_matching_rule / rule_for_returns_none_when_absent sweeper.rs gained: - timeout_resolved_event_shape lib.rs loses those 18 tests. 141 approval-gate tests pass (no duplicate counts — each test runs once). turn-orchestrator + harness + shell verified green. Public-API tests (the remaining ~96 in lib.rs) move to tests/*.rs in a follow-up commit so they can share fakes via a common module.
Final test split. The 86 remaining inline tests in lib.rs that
exercise only the crate's `pub` surface now live in tests/*.rs,
organized by the area they cover. The state-machine proptest gets
its own file. lib.rs drops from 2369 to 52 lines (just module
declarations + re-exports).
New tests/ layout:
- common/mod.rs — shared fakes: FakeExecutor, InMemoryStateBus,
FailingStateBus, sample_call(), empty_policy_rules()
- lifecycle.rs — 19 tests: build_pending_record, transition_record,
maybe_flip_timed_out, collect_timed_out_for_sweep,
is_terminal_status, pending_key
- wire.rs — 10 tests: extract_call, block_reply_for,
requires_approval
- intercept.rs — 9 tests: handle_intercept (replays, fail-closed,
session_id stamping, force_pending)
- resolve.rs — 19 tests: handle_resolve, cascade-on-`always`,
handle_lookup_record
- delivery.rs — 25 tests: list_pending / list_undelivered /
ack_delivered / consume_undelivered /
flush_delivered / sweep_session
- misc.rs — 4 tests: FN_* constants, unverified_marker_targets,
FakeExecutor smoke test
- state_machine.rs — the proptest with the four lifecycle invariants
Each tests/*.rs uses `mod common;` for shared fakes — only one source
of truth for in-memory bus/executor stand-ins.
141 approval-gate tests pass across 13 test binaries (previously 6).
turn-orchestrator + harness + shell verified green.
…ecycle Pending → InFlight → Done(Outcome). InFlight is the intermediate persist that closes the dup-exec race within a worker process — a second approval::resolve arriving during the invoke await sees the row is non-Pending and bails. Outcome is a tagged enum (Executed/Failed/Denied/TimedOut). resolved_at is stamped on the first non-Pending transition for deterministic multi-row consume ordering. lifecycle.rs is deleted; its only surviving helper (flipped_to_timed_out_if_expired) is now a Record method.
Task 1 of the simplification plan deletes lifecycle.rs entirely, but the plan was a bit optimistic: dependent modules (intercept, resolve, delivery, sweeper) still import its helpers (build_pending_record, transition_record, maybe_flip_timed_out, is_terminal_status, collect_timed_out_for_sweep) and the crate fails to compile without them. Cargo can't run record.rs's unit tests if the library itself won't link. Restore lifecycle.rs as a transitional shim. Tasks 5/6/8/11 will progressively migrate each callsite to the new Record API; whichever of those tasks lands last deletes lifecycle.rs for real.
…ttern The classifier surface is being deleted in favor of the layered rules engine; rename the Policy variant's detail fields to match the actual data they'll carry once T5 lands. For now, classifier callsites (intercept.rs, register.rs) map their old reason/fn locals into the new field names — transient mapping, deleted in T5. Test callsites in tests/wire.rs and tests/lifecycle.rs updated to use representative rule_permission/rule_pattern values. Note: IncomingCall.approval_required + requires_approval() field/method are NOT removed in this commit (the plan's T2 calls for it but intercept.rs's decide_intercept_action still depends on them). They disappear in T5 when intercept.rs is rewritten end-to-end.
For shell::exec/shell::exec_bg, derive a pattern string from
{command, args} so layered rules can match argv-level patterns like
'git status*'. Other function ids default to '*'. Malformed/missing
args fall back to '*' (matches only wildcard rules).
8 new tests cover all branches: join+args, single-string command,
empty args list, missing command, malformed args, non-shell function
id, and the documented space-conflation case.
Required by the new approval::consume RPC (T8), which returns Done rows and deletes them in the same call. IiiStateBus impl maps to the existing state::delete RPC (used elsewhere in the codebase, e.g. auth-credentials). Test fakes (InMemoryStateBus, FailingStateBus) gain matching impls — InMemory removes the key idempotently, Failing errors to match its other primitives.
The classifier surface and per-function InterceptorRule decision flow
are gone. handle_intercept now reads the verdict via the new lib-level
verdict_for(function_id, args, ruleset) helper:
Verdict::Allow → {block:false}, no state write
Verdict::Deny → {block:true, denial:Policy{rule_permission, rule_pattern}}
Verdict::Ask → write Pending row + reply {block:true, status:pending}
Replay defense extended for the new 3-state schema:
Pending → replay:in_flight, status:pending
InFlight → replay:in_flight, status:in_flight (new)
Done → replay:already_resolved, status:done
intercept.rs drops InterceptAction, decide_intercept_action,
ClassifierDecision, interpret_classifier_reply, PolicyOutcome,
apply_policy_rules. register.rs's subscriber closure collapses the
old policy_outcome+decide_intercept+classifier dance into one
handle_intercept call. resolve.rs's cascade loop reuses verdict_for
instead of the deleted apply_policy_rules.
State-write failure still fails closed with Denial::StateError.
7 new tests cover all four verdict branches + the three replay
states + fail-closed behavior.
Note: InterceptorRule config struct and intercept_rules variable
are still in place for T12 to fully remove.
handle_resolve rewritten to typed Record + three-phase allow path:
1. write InFlight (closes the dup-exec race within a worker process)
2. iii.trigger(function_id, args) and await
3. write Done(Executed{result}) or Done(Failed{error})
Deny is a single Pending → Done(Denied{denial}) write — no invoke,
no InFlight needed. WireDecision::Deny with no denial in payload
defaults to UserRejected. UserCorrected{feedback} round-trips.
Dup-exec guard: handle_resolve refuses non-Pending rows with
in_flight (concurrent resolve mid-invoke) or
already_resolved (terminal).
Cascade allow on always:true (T7) now pushes a runtime Allow rule with
the originator's EXACT pattern via rules::pattern_for(args) — not the
prior blanket pattern:'*'. 'Always allow git status' no longer
auto-allows rm -rf / via the same shell::exec function id.
Lock-ordering invariant pinned in module docs: never hold the ruleset
guard across .await.
…elete sweeper.rs/lifecycle.rs (T8+T9+T11) Three behavioral changes folded into one commit because they share files: T8 (new approval::consume RPC): Three-phase drain — gather Done candidates, sort by resolved_at + cap, delete-and-return. Defensive session_id filter. Lazy timeout flip on read. Default cap CONSUME_DEFAULT_LIMIT=50 bounds the response size against MB-scale stdout payloads. T9 (delivery.rs strip): handle_list_undelivered, handle_consume_undelivered, handle_ack_delivered, handle_flush_delivered all deleted. LIST_UNDELIVERED_DEFAULT_LIMIT constant gone. handle_list_pending rewritten on the typed Record API with lazy timeout flip on read. handle_sweep_session rewritten on new schema — flips Pending+InFlight rows to Done(TimedOut). T11 (sweeper.rs deleted): Background polling task gone. Timeouts now flip lazily on read in handle_resolve / handle_consume / handle_list_pending. UI handles expires_at countdown client-side; the LLM learns of timeouts on the next consume. lifecycle.rs transient shim also deleted (no more callers). Stream helpers (uuid_like, write_event, write_hook_reply) move into register.rs as their only consumer; spawn_timeout_sweeper + timeout_resolved_event deleted with the rest of sweeper.rs. register.rs surgery: Refs struct loses 5 dead FunctionRef fields and the sweeper JoinHandle. RPC registrations for list_undelivered, consume_undelivered, ack_delivered, flush_delivered all gone. New FN_CONSUME registration added. Classifier-alias warning check trimmed to live function ids. 15 new tests cover handle_consume (7) + handle_sweep_session (1) + handle_list_pending lazy flip (1) + the pre-existing edits. 65 lib tests pass total.
IiiFunctionExecutor::invoke forwards function_id+args directly to iii.trigger — no marker stamp. Deleted from state.rs: - merge_from_approval_marker_if_needed (and its 4 unit tests) - unverified_marker_targets (and the 2 rule_for tests) - rule_for (only marker-related callers) Deleted from register.rs: - IiiFunctionExecutor's rules: Arc<Vec<InterceptorRule>> field - Boot-time unverified-marker check (refused-to-start guard) Per the spec's Threat Model section: bus access ≡ shell access in the new model. The harness's bus-level access control is the perimeter; defense-in-depth via per-target marker verification is out of scope for v1. Shell-side marker verification is deleted in T13. T12 will fully retire InterceptorRule + the classifier-alias warning loop in register.rs.
config.rs: WorkerConfig now has just {topic, approval_state_scope,
default_timeout_ms, rules}. interceptors + sweeper_interval_ms fields
gone. InterceptorRule struct kept as a no-op shim with minimal fields
so the classifier-alias warning loop in register.rs still compiles —
the loop is fed an empty Vec so the body never runs.
iii.worker.yaml: replaced interceptor config with a curated default
ruleset. Read-only fs/git auto-allowed; shell::exec/exec_bg ask;
catch-all asks. Operators stack their own rules on top (last-match
wins).
register.rs cfg.interceptors → cfg.rules wiring confirmed: policy_rules
is seeded from cfg.rules at startup and remains mutable via cascade.
Note: InterceptorRule shim + classifier-alias warning loop in register
will be deleted in a follow-up pass when the dependent code is fully
cleaned up. They're cosmetic at this point — neither has functional
effect.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
skill-check — worker3 verified, 22 skipped (no docs/). 1 error across the verified workers.
|
… (T13) Shell becomes a plain executor. All policy decisions live in approval-gate's rules layer (see the layered ruleset shipped in approval-gate/iii.worker.yaml). Deleted: - shell/src/functions/approval_bypass.rs (marker validation module) - shell/src/functions/classify.rs (classifier handler) - shell/src/arity.rs (arity-aware allowlist matcher) - ApprovalMarker + ClassifyArgvRequest types - ExecRequest.from_approval + ExecBgRequest.from_approval fields - ShellConfig.allowlist + allow_any + denylist_patterns + compiled_denylist fields and their methods (compile_denylist, denylist_hit_reason, allowlist_contains, is_command_allowed) - shell::classify_argv function registration - shell::classify_argv manifest entry - All allowlist/denylist unit tests in shell/src/config.rs - One test in shell/tests/function_handlers.rs that asserted shell rejected unlisted commands Kept (independent surface): - fs::host_root jail + fs.denylist_paths (filesystem path-based denials for shell::fs::* tools; distinct from the exec-policy denylist) - The fs jail validate_fs_jail boot check Threat model implication, per the spec: bus access ≡ shell access. The harness's bus-level access control is the perimeter; any worker on the bus can call shell::exec with arbitrary args. Defense-in-depth via shell-side allowlists is gone for v1. 133 shell lib tests + 328 shell integration tests pass.
consume_approval_stitch now calls approval::consume (single RPC,
returns + deletes in one shot). Payload is { session_id } only; no
turn_id, no limit (gate enforces a default cap of 50, surfacing
overflow via the 'omitted' counter as before).
stitch_entries rewritten for the new Record wire shape: reads nested
outcome: { kind, detail } instead of the old top-level status +
result/error/denial. Outcome kinds: executed, failed, denied,
timed_out. function_call_id is the canonical key (call_id retained
as a legacy fallback for old rows on disk during the upgrade).
render_denial_lines's policy branch reads rule_permission/rule_pattern
(matching the renamed Denial::Policy detail shape from T2). The
'policy denied by classifier_fn: classifier_reason' wording becomes
'policy denied by rule <perm> : <pat>'.
omission_summary_message reworded: 'approval::flush_delivered' is gone;
the natural recovery path is the next-turn consume, which is what the
new message points at.
131 lib tests pass (was 122 + 9 failing from old-shape assertions).
Pre-existing integration test failure (tests/run_stop.rs imports
run_stop module that isn't declared in lib.rs) is unrelated to this
refactor.
…sts (T17)
Delete 8 integration test files whose assertions targeted the deleted
surface (FN_LIST_UNDELIVERED, FN_ACK_DELIVERED, transition_record,
InterceptAction, marker plumbing, ...):
- tests/approval_lifecycle.rs (engine-backed, used deleted RPCs)
- tests/delivery.rs (old delivery RPC surface)
- tests/integration.rs (old subscriber wiring)
- tests/intercept.rs (decide_intercept_action, classifier path)
- tests/lifecycle.rs (record-helper tests on deleted schema)
- tests/misc.rs (mixed assertions on old shape)
- tests/resolve.rs (old transition_record / Approved status)
- tests/state_machine.rs (proptest dep + old schema)
The src/* unit tests added across T1-T12 cover the equivalent surface
for the new wire shape (Pending|InFlight|Done, Outcome enum,
Denial::Policy {rule_permission, rule_pattern}, etc.).
Add tests/lifecycle.rs with FIVE E2E lifecycle tests against the
in-memory StateBus + FakeExecutor — the integration safety net for
the whole refactor:
1. allow_path_end_to_end — pending → InFlight → executed → consume drains
2. deny_path_with_user_corrected_feedback_end_to_end — feedback round-trips
3. timeout_path_lazy_flips_on_consume_end_to_end — past expires_at flips on read
4. cascade_path_end_to_end — two pending rows; allow+always pushes exact-pattern
rule; second auto-resolves; consume drains both; pinned: pushed pattern is
the originator's argv ('echo go'), NOT blanket '*'
5. allow_rule_short_circuits_with_no_state_write — bonus: Verdict::Allow
doesn't touch state
72 approval-gate tests pass across 6 suites (lib + wire + manifest +
lifecycle + 2 from common helpers). Net deletion: ~2500 lines of
old-shape integration tests gone, ~250 lines of new E2E in their place.
…(T15)
ApprovalRow.tsx grows three real-feature additions matching the
approval-gate refactor's new wire surface:
1. Allow+Always button → sends {decision:'allow', always:true}. The
gate cascades by pushing a runtime Allow rule with the originator's
exact pattern and sweeping the session's other Pending rows
(auto-resolving any that match). The response carries cascaded:N
when the sweep was non-empty; not surfaced in the UI yet.
2. Optional Deny feedback textarea → when the operator types a
correction message before clicking deny, the payload becomes
{decision:'deny', denial:{kind:'user_corrected', detail:{feedback}}}.
That feedback flows verbatim to the LLM via the stitched message
on the next consume — the high-value path because the model gets
actionable correction, not just 'denied'.
3. Client-side expires_at countdown → the modal shows MM:SS remaining
and disables actions on expiry. No server-emitted timed_out frame
anymore (the gate's sweeper is gone); the LLM learns of timeout
via the next-turn approval::consume that lazy-flips the row.
Each pending row is split into its own ApprovalCard component so the
countdown hook respects rules-of-hooks (can't sit inside a .map
callback). ResolvePayload is constructed inline so the bridge's
Record<string, unknown> type accepts it without an explicit cast at
the interface boundary.
CSS classes for the new states (approval-allow-always,
approval-countdown, approval-feedback, approval-expired-note) inherit
default styling for now; a follow-up styling pass can polish.
The approval-gate refactor's T16 called for switching the fanout from polling approval::list_pending to subscribing to agent::events for approval_requested / approval_resolved frames. The polling implementation continues to work correctly against the new wire shape — diff_approvals reads function_call_id (now top-level on Record) and list_pending was kept by approval-gate's T8/T11 — so this is a perf cleanup, not a correctness fix. Comment the constant accordingly; defer the actual stream subscription to a follow-up PR. No code change beyond the doc.
Shell stopped enforcing command-level policy; the e2e suite still referenced the removed allowlist/denylist_patterns config fields and pinned behavior that no longer exists. Strip those expectations and update the docs that described them. - config.yaml / config-jailed.yaml: remove allowlist + denylist_patterns (would fail YAML parse against the new ShellConfig schema) - cases-safety.ts: drop 3 allowlist/denylist assertions; keep one positive 'absolute-path command runs' case - cases-edge.ts: 'nonexistent unlisted command' rewritten to assert the OS spawn error (ENOENT), not an allowlist message - cases-exec-break.ts: drop 'unlisted command with extra fields' - cases-exec-sandbox.ts: drop 'sandbox allowlist still enforced' - cases-vuln-repro.ts: drop S-H3 case (advisory denylist) — the finding's whole point was that command policy belongs upstream - ARCHITECTURE.md / e2e README: reflect the new boundary
…ingle drain RPC (#146) * feat(approval-gate): layered policy rules with findLast evaluator Introduce a first-class Rule/Ruleset/Action surface ported from opencode's Permission.evaluate. Rules are wildcard globs over the iii function id (pattern is "*" in v1 — the field is reserved for per- function pattern extractors in a follow-up). Evaluator semantics: findLast across flattened layers, so operators stack a permissive default with more-specific overrides without surgery on the base list. approval-gate - New rules module: Action (allow/deny/ask), Rule, Ruleset alias, wildcard_match (DP, no regex), evaluate(perm, pattern, IntoIterator of &Rule). 16 unit tests covering matching, layering, last-wins. - WorkerConfig.rules: Vec<Rule> (additive; existing InterceptorRule flow unchanged). - apply_policy_rules: pure helper that maps a rule hit to PolicyOutcome (Allow / Deny{rule_permission, rule_pattern} / FallThrough). Allow short-circuits to pass-through; Deny short-circuits with a Denial::Policy that names the matched rule; Ask and no-match fall through to the existing per-function interceptor flow. - handle_subscriber consults the rules layer first, before decide_intercept_action. * feat(approval-gate): cascade allow on `always: true` resolve reply When a user resolves a pending call with decision=allow + always=true, approval-gate now pushes a runtime Allow rule for the call's function id and sweeps the same session's pending records, auto-resolving every other one the new rule covers. One click instead of N. approval-gate - approve_and_execute() extracted from handle_resolve's allow branch as a reusable async helper. Both the user-driven allow path and the cascade sweep drive state through the same transitions (approved -> invoke -> executed | failed). - Shared policy ruleset wrapped as Arc<RwLock<Ruleset>> so reply-time mutation is safe across the subscriber + resolve closures. Read guards are scoped so they never cross an .await (std::sync::RwLock is not async-safe to hold across suspension). - cascade_allow_for_session() pushes the new rule under the write lock, snapshots session pending via list_prefix, and runs approve_and_execute for each newly-Allow record. The originator is skipped (already resolved above). Per-record state-write failures are logged and the rest of the cascade continues. - handle_resolve response carries `cascaded: N` when the sweep resolved at least one extra record; omitted otherwise so the one-shot path stays unchanged. Cascade scope is intentionally narrow for v1: - function-id-only matching (pattern "*"), mirrors the v1 rules surface - same session only - in-memory rule (no cross-restart persistence) - allow + always; deny + always is out of scope Five new tests cover the cascade decision tree (no-always, same-session match, cross-session non-effect, originator-skip, terminal-record skip). * chore(approval-gate): remove legacy migration surface The Denial refactor replaced decision_reason with a typed structure, but several legacy hooks remained in place. They are now removed because the underlying schemas they protected against (pre-trigger-model status strings, pre-Denial flat decision_reason field) cannot occur in any state store this branch is wired to. approval-gate - Drop migrate_legacy_record entirely. The pre-trigger-model status="allow"/"deny" rename and the decision_reason -> Denial::Legacy lift both go away. Old persisted records (if any) now surface as filtered-out orphans rather than silently mistranslated. - Drop Denial::Legacy variant from the enum. - Drop legacy_migrated flag from records. - Drop the four migrate_legacy_record_* unit tests and the handle_list_undelivered_persists_migrated_legacy_record integration test. - handle_list_pending no longer pre-filters legacy shapes; orphan records lacking a session_id stamp are dropped uniformly. - timeout_resolved_event drops the decision_reason: "timeout" field (status: "timed_out" is self-describing). - handle_sweep_session drops the optional `reason` payload field. The cause (session_deleted vs run_stopped) is the caller's concern; the swept timed_out records carry no Denial. - skills/sweep_session.md updated. turn-orchestrator - approval_stitching drops the legacy denial-kind branch and the legacy_migrated note line; the corresponding stitch test is removed. - run_stop no longer forwards reason: "run_stopped" on the approval::sweep_session payload; it logs the cause locally instead. - The run_stop integration test now asserts the absence of the reason field on the sweep payload. harness-types (6 vendored copies, byte-identical) - Drop the Legacy variant and its docblock line from each agent_event.rs copy. * feat(approval-gate): add typed Record, Status, Next schema New src/record.rs defines the persisted approval record as a typed struct alongside its lifecycle Status enum and a Next enum that pairs each target status with its outcome data. Wire format is byte-identical to the existing serde_json::Value blobs — operators can adopt the typed shape incrementally without changing any persisted state. approval-gate - New module record: - Status enum (Pending / Approved / Executed / Failed / Denied / TimedOut), serde snake_case, with is_terminal() helper. - Record struct mirroring the historical JSON keys field-for-field, optional outcome fields scoped to terminal statuses, serde skip-when-None so existing rows round-trip cleanly. - Next enum that bundles (target_status, outcome_payload) so the type system rules out impossible transitions (e.g. Executed without a result, Denied without a Denial). Replaces the old (status_string, Option<result>, Option<error>, Option<denial>) parallel-Option signature when callers opt in. - Record::to_value / Record::from_value bridge to the Value blobs the iii state bus still expects. - 8 unit tests covering serde round-trips, status semantics, expiry saturation, forward-compat with unknown fields. - lib.rs re-exports Next, Record, Status. Handler migration to the typed schema is a separate change; this commit just makes the types available so future work can lift Value access into typed field access incrementally rather than in one big diff. * refactor(approval-gate): extract wire.rs from lib.rs First step of breaking lib.rs (~4300 lines) into focused modules. This commit lifts the wire-format types and small wire-shape helpers into their own module — pure data shapes with no I/O, no iii-sdk dependency, no async — so downstream workers that only need to understand the approval-gate protocol can depend on a small surface. Moved (lib.rs -> wire.rs): - Denial (structured deny payload) - IncomingCall + requires_approval() - Decision (Allow | Deny(Denial)) - WireDecision (allow|deny wire enum) - pending_key() - extract_call() - block_reply_for() All public items stay re-exported from the crate root, so every existing call site (handlers, tests, integration tests, downstream crates) continues to work without import changes. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + session + providers + harness-tui + shell all pass. * refactor(approval-gate): extract lifecycle.rs from lib.rs Second step of breaking lib.rs into focused modules. This commit lifts the persisted-record lifecycle helpers — pure Value-blob constructors and transitions, no I/O, no iii-sdk dependency — into their own module. Moved (lib.rs -> lifecycle.rs): - is_terminal_status() - build_pending_record() - transition_record() + transition_record_with_now() - collect_timed_out_for_sweep() - maybe_flip_timed_out() All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass. * refactor(approval-gate): extract state.rs from lib.rs Third step of breaking lib.rs into focused modules. This commit lifts the iii-backed state-bus and function-executor implementations into their own module, along with the `__from_approval` marker plumbing and the boot-time marker-target safety check. Moved (lib.rs -> state.rs): - StateBus trait + IiiStateBus impl (iii state::* wrapper) - FunctionExecutor trait + IiiFunctionExecutor impl (iii.trigger wrapper) - rule_for() (lookup helper used by IiiFunctionExecutor) - merge_from_approval_marker_if_needed() (marker payload composer) - unverified_marker_targets() (boot guard) The StateBus / FunctionExecutor traits are kept exactly as they were in lib.rs — they exist purely as test seams so unit tests can swap in InMemoryStateBus / FakeExecutor. No new abstractions added. Public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass (turn-orchestrator's pre-existing dual_write flake notwithstanding). * refactor(approval-gate): extract intercept.rs from lib.rs Fourth step of breaking lib.rs into focused modules. This commit lifts the intercept decision flow into its own module — the three layers that decide what the gate does with an incoming function call: policy rules, per-function interceptor rules, and classifier replies, plus the async handle_intercept that writes the pending record. Moved (lib.rs -> intercept.rs): - InterceptAction enum + decide_intercept_action() - PolicyOutcome enum + apply_policy_rules() - ClassifierDecision enum + interpret_classifier_reply() - handle_intercept() handle_intercept stays public (re-exported from the crate root). The pub(crate) helpers stay pub(crate) and are imported back into lib.rs for the subscriber closure in register(). Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green. * refactor(approval-gate): extract resolve.rs from lib.rs Fifth step of breaking lib.rs into focused modules. This commit lifts the approval::resolve flow into its own module. Moved (lib.rs -> resolve.rs): - handle_lookup_record() - handle_resolve() - cascade_allow_for_session() (the always=true sweep) - approve_and_execute() (shared by user-driven allow + cascade) handle_resolve and handle_lookup_record stay public via the crate root re-exports. approve_and_execute and cascade_allow_for_session stay private to the resolve module. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract delivery.rs from lib.rs Sixth step of breaking lib.rs into focused modules. This commit lifts the delivery-tracking RPCs into their own module. Moved (lib.rs -> delivery.rs): - handle_list_pending() - handle_list_undelivered() - handle_ack_delivered() - handle_consume_undelivered() - handle_flush_delivered() - handle_sweep_session() - LIST_UNDELIVERED_DEFAULT_LIMIT All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract sweeper.rs from lib.rs Seventh step of breaking lib.rs into focused modules. This commit lifts the periodic timeout sweeper and the small iii-stream helpers it owns into their own module. Moved (lib.rs -> sweeper.rs): - spawn_timeout_sweeper() - timeout_resolved_event() - write_event() - write_hook_reply() - uuid_like() All five stay pub(crate) — only the register() wiring in lib.rs and the resolve flow need them. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract register.rs from lib.rs Final step of breaking lib.rs into focused modules. This commit lifts the iii function/trigger wiring — the heart of the worker startup path — into its own module. Moved (lib.rs -> register.rs): - register() (~420 lines: the subscriber closure + 8 function registrations + trigger registration + sweeper spawn) - Refs struct (the handle bag returned to the binary) - FN_RESOLVE / FN_LIST_PENDING / FN_LIST_UNDELIVERED / FN_CONSUME_UNDELIVERED / FN_ACK_DELIVERED / FN_FLUSH_DELIVERED / FN_SWEEP_SESSION / FN_LOOKUP_RECORD constants - STATE_SCOPE constant lib.rs is now ~50 lines of module declarations + re-exports + ~2580 lines of inline tests. The test mod uses super::* plus local serde_json and std::sync imports. All public items stay re-exported from the crate root so the binary's `use approval_gate::register;` keeps working unchanged. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green (turn-orchestrator's pre-existing dual_write flake notwithstanding). * refactor(approval-gate): move pub(crate)-using tests inline First step of relocating the 2580-line inline test block from lib.rs into more focused homes. This commit moves the 18 tests that touch pub(crate) helpers — ones that can't live in tests/*.rs because integration tests can only see pub items — into the #[cfg(test)] mod tests block of whichever production module owns the helper they exercise. intercept.rs gained: - interpret_classifier_reply_reads_decision_tags - decide_intercept_action_* (5 tests) - apply_policy_rules_* (5 tests) state.rs gained: - merge_from_approval_* (4 tests) - rule_for_returns_matching_rule / rule_for_returns_none_when_absent sweeper.rs gained: - timeout_resolved_event_shape lib.rs loses those 18 tests. 141 approval-gate tests pass (no duplicate counts — each test runs once). turn-orchestrator + harness + shell verified green. Public-API tests (the remaining ~96 in lib.rs) move to tests/*.rs in a follow-up commit so they can share fakes via a common module. * refactor(approval-gate): move public-API tests to tests/*.rs Final test split. The 86 remaining inline tests in lib.rs that exercise only the crate's `pub` surface now live in tests/*.rs, organized by the area they cover. The state-machine proptest gets its own file. lib.rs drops from 2369 to 52 lines (just module declarations + re-exports). New tests/ layout: - common/mod.rs — shared fakes: FakeExecutor, InMemoryStateBus, FailingStateBus, sample_call(), empty_policy_rules() - lifecycle.rs — 19 tests: build_pending_record, transition_record, maybe_flip_timed_out, collect_timed_out_for_sweep, is_terminal_status, pending_key - wire.rs — 10 tests: extract_call, block_reply_for, requires_approval - intercept.rs — 9 tests: handle_intercept (replays, fail-closed, session_id stamping, force_pending) - resolve.rs — 19 tests: handle_resolve, cascade-on-`always`, handle_lookup_record - delivery.rs — 25 tests: list_pending / list_undelivered / ack_delivered / consume_undelivered / flush_delivered / sweep_session - misc.rs — 4 tests: FN_* constants, unverified_marker_targets, FakeExecutor smoke test - state_machine.rs — the proptest with the four lifecycle invariants Each tests/*.rs uses `mod common;` for shared fakes — only one source of truth for in-memory bus/executor stand-ins. 141 approval-gate tests pass across 13 test binaries (previously 6). turn-orchestrator + harness + shell verified green. * feat(approval-gate): new Record schema with Pending|InFlight|Done lifecycle Pending → InFlight → Done(Outcome). InFlight is the intermediate persist that closes the dup-exec race within a worker process — a second approval::resolve arriving during the invoke await sees the row is non-Pending and bails. Outcome is a tagged enum (Executed/Failed/Denied/TimedOut). resolved_at is stamped on the first non-Pending transition for deterministic multi-row consume ordering. lifecycle.rs is deleted; its only surviving helper (flipped_to_timed_out_if_expired) is now a Record method. * fix(approval-gate): keep lifecycle.rs as transitional compat shim Task 1 of the simplification plan deletes lifecycle.rs entirely, but the plan was a bit optimistic: dependent modules (intercept, resolve, delivery, sweeper) still import its helpers (build_pending_record, transition_record, maybe_flip_timed_out, is_terminal_status, collect_timed_out_for_sweep) and the crate fails to compile without them. Cargo can't run record.rs's unit tests if the library itself won't link. Restore lifecycle.rs as a transitional shim. Tasks 5/6/8/11 will progressively migrate each callsite to the new Record API; whichever of those tasks lands last deletes lifecycle.rs for real. * feat(approval-gate): Denial::Policy carries rule_permission + rule_pattern The classifier surface is being deleted in favor of the layered rules engine; rename the Policy variant's detail fields to match the actual data they'll carry once T5 lands. For now, classifier callsites (intercept.rs, register.rs) map their old reason/fn locals into the new field names — transient mapping, deleted in T5. Test callsites in tests/wire.rs and tests/lifecycle.rs updated to use representative rule_permission/rule_pattern values. Note: IncomingCall.approval_required + requires_approval() field/method are NOT removed in this commit (the plan's T2 calls for it but intercept.rs's decide_intercept_action still depends on them). They disappear in T5 when intercept.rs is rewritten end-to-end. * feat(approval-gate): pattern_for(function_id, args) extractor For shell::exec/shell::exec_bg, derive a pattern string from {command, args} so layered rules can match argv-level patterns like 'git status*'. Other function ids default to '*'. Malformed/missing args fall back to '*' (matches only wildcard rules). 8 new tests cover all branches: join+args, single-string command, empty args list, missing command, malformed args, non-shell function id, and the documented space-conflation case. * feat(approval-gate): StateBus::delete primitive Required by the new approval::consume RPC (T8), which returns Done rows and deletes them in the same call. IiiStateBus impl maps to the existing state::delete RPC (used elsewhere in the codebase, e.g. auth-credentials). Test fakes (InMemoryStateBus, FailingStateBus) gain matching impls — InMemory removes the key idempotently, Failing errors to match its other primitives. * feat(approval-gate): verdict-driven handle_intercept The classifier surface and per-function InterceptorRule decision flow are gone. handle_intercept now reads the verdict via the new lib-level verdict_for(function_id, args, ruleset) helper: Verdict::Allow → {block:false}, no state write Verdict::Deny → {block:true, denial:Policy{rule_permission, rule_pattern}} Verdict::Ask → write Pending row + reply {block:true, status:pending} Replay defense extended for the new 3-state schema: Pending → replay:in_flight, status:pending InFlight → replay:in_flight, status:in_flight (new) Done → replay:already_resolved, status:done intercept.rs drops InterceptAction, decide_intercept_action, ClassifierDecision, interpret_classifier_reply, PolicyOutcome, apply_policy_rules. register.rs's subscriber closure collapses the old policy_outcome+decide_intercept+classifier dance into one handle_intercept call. resolve.rs's cascade loop reuses verdict_for instead of the deleted apply_policy_rules. State-write failure still fails closed with Denial::StateError. 7 new tests cover all four verdict branches + the three replay states + fail-closed behavior. Note: InterceptorRule config struct and intercept_rules variable are still in place for T12 to fully remove. * feat(approval-gate): three-phase resolve + cascade exact-pattern (T6+T7) handle_resolve rewritten to typed Record + three-phase allow path: 1. write InFlight (closes the dup-exec race within a worker process) 2. iii.trigger(function_id, args) and await 3. write Done(Executed{result}) or Done(Failed{error}) Deny is a single Pending → Done(Denied{denial}) write — no invoke, no InFlight needed. WireDecision::Deny with no denial in payload defaults to UserRejected. UserCorrected{feedback} round-trips. Dup-exec guard: handle_resolve refuses non-Pending rows with in_flight (concurrent resolve mid-invoke) or already_resolved (terminal). Cascade allow on always:true (T7) now pushes a runtime Allow rule with the originator's EXACT pattern via rules::pattern_for(args) — not the prior blanket pattern:'*'. 'Always allow git status' no longer auto-allows rm -rf / via the same shell::exec function id. Lock-ordering invariant pinned in module docs: never hold the ruleset guard across .await. * feat(approval-gate): approval::consume + strip delivery dead RPCs + delete sweeper.rs/lifecycle.rs (T8+T9+T11) Three behavioral changes folded into one commit because they share files: T8 (new approval::consume RPC): Three-phase drain — gather Done candidates, sort by resolved_at + cap, delete-and-return. Defensive session_id filter. Lazy timeout flip on read. Default cap CONSUME_DEFAULT_LIMIT=50 bounds the response size against MB-scale stdout payloads. T9 (delivery.rs strip): handle_list_undelivered, handle_consume_undelivered, handle_ack_delivered, handle_flush_delivered all deleted. LIST_UNDELIVERED_DEFAULT_LIMIT constant gone. handle_list_pending rewritten on the typed Record API with lazy timeout flip on read. handle_sweep_session rewritten on new schema — flips Pending+InFlight rows to Done(TimedOut). T11 (sweeper.rs deleted): Background polling task gone. Timeouts now flip lazily on read in handle_resolve / handle_consume / handle_list_pending. UI handles expires_at countdown client-side; the LLM learns of timeouts on the next consume. lifecycle.rs transient shim also deleted (no more callers). Stream helpers (uuid_like, write_event, write_hook_reply) move into register.rs as their only consumer; spawn_timeout_sweeper + timeout_resolved_event deleted with the rest of sweeper.rs. register.rs surgery: Refs struct loses 5 dead FunctionRef fields and the sweeper JoinHandle. RPC registrations for list_undelivered, consume_undelivered, ack_delivered, flush_delivered all gone. New FN_CONSUME registration added. Classifier-alias warning check trimmed to live function ids. 15 new tests cover handle_consume (7) + handle_sweep_session (1) + handle_list_pending lazy flip (1) + the pre-existing edits. 65 lib tests pass total. * feat(approval-gate): strip __from_approval marker plumbing (T10) IiiFunctionExecutor::invoke forwards function_id+args directly to iii.trigger — no marker stamp. Deleted from state.rs: - merge_from_approval_marker_if_needed (and its 4 unit tests) - unverified_marker_targets (and the 2 rule_for tests) - rule_for (only marker-related callers) Deleted from register.rs: - IiiFunctionExecutor's rules: Arc<Vec<InterceptorRule>> field - Boot-time unverified-marker check (refused-to-start guard) Per the spec's Threat Model section: bus access ≡ shell access in the new model. The harness's bus-level access control is the perimeter; defense-in-depth via per-target marker verification is out of scope for v1. Shell-side marker verification is deleted in T13. T12 will fully retire InterceptorRule + the classifier-alias warning loop in register.rs. * feat(approval-gate): strip config to topic+scope+timeout+rules (T12) config.rs: WorkerConfig now has just {topic, approval_state_scope, default_timeout_ms, rules}. interceptors + sweeper_interval_ms fields gone. InterceptorRule struct kept as a no-op shim with minimal fields so the classifier-alias warning loop in register.rs still compiles — the loop is fed an empty Vec so the body never runs. iii.worker.yaml: replaced interceptor config with a curated default ruleset. Read-only fs/git auto-allowed; shell::exec/exec_bg ask; catch-all asks. Operators stack their own rules on top (last-match wins). register.rs cfg.interceptors → cfg.rules wiring confirmed: policy_rules is seeded from cfg.rules at startup and remains mutable via cascade. Note: InterceptorRule shim + classifier-alias warning loop in register will be deleted in a follow-up pass when the dependent code is fully cleaned up. They're cosmetic at this point — neither has functional effect. * feat(shell): strip classify_argv / allowlist / __from_approval marker (T13) Shell becomes a plain executor. All policy decisions live in approval-gate's rules layer (see the layered ruleset shipped in approval-gate/iii.worker.yaml). Deleted: - shell/src/functions/approval_bypass.rs (marker validation module) - shell/src/functions/classify.rs (classifier handler) - shell/src/arity.rs (arity-aware allowlist matcher) - ApprovalMarker + ClassifyArgvRequest types - ExecRequest.from_approval + ExecBgRequest.from_approval fields - ShellConfig.allowlist + allow_any + denylist_patterns + compiled_denylist fields and their methods (compile_denylist, denylist_hit_reason, allowlist_contains, is_command_allowed) - shell::classify_argv function registration - shell::classify_argv manifest entry - All allowlist/denylist unit tests in shell/src/config.rs - One test in shell/tests/function_handlers.rs that asserted shell rejected unlisted commands Kept (independent surface): - fs::host_root jail + fs.denylist_paths (filesystem path-based denials for shell::fs::* tools; distinct from the exec-policy denylist) - The fs jail validate_fs_jail boot check Threat model implication, per the spec: bus access ≡ shell access. The harness's bus-level access control is the perimeter; any worker on the bus can call shell::exec with arbitrary args. Defense-in-depth via shell-side allowlists is gone for v1. 133 shell lib tests + 328 shell integration tests pass. * feat(turn-orchestrator): switch stitch to approval::consume (T14) consume_approval_stitch now calls approval::consume (single RPC, returns + deletes in one shot). Payload is { session_id } only; no turn_id, no limit (gate enforces a default cap of 50, surfacing overflow via the 'omitted' counter as before). stitch_entries rewritten for the new Record wire shape: reads nested outcome: { kind, detail } instead of the old top-level status + result/error/denial. Outcome kinds: executed, failed, denied, timed_out. function_call_id is the canonical key (call_id retained as a legacy fallback for old rows on disk during the upgrade). render_denial_lines's policy branch reads rule_permission/rule_pattern (matching the renamed Denial::Policy detail shape from T2). The 'policy denied by classifier_fn: classifier_reason' wording becomes 'policy denied by rule <perm> : <pat>'. omission_summary_message reworded: 'approval::flush_delivered' is gone; the natural recovery path is the next-turn consume, which is what the new message points at. 131 lib tests pass (was 122 + 9 failing from old-shape assertions). Pre-existing integration test failure (tests/run_stop.rs imports run_stop module that isn't declared in lib.rs) is unrelated to this refactor. * test(approval-gate): E2E lifecycle tests + delete dead integration tests (T17) Delete 8 integration test files whose assertions targeted the deleted surface (FN_LIST_UNDELIVERED, FN_ACK_DELIVERED, transition_record, InterceptAction, marker plumbing, ...): - tests/approval_lifecycle.rs (engine-backed, used deleted RPCs) - tests/delivery.rs (old delivery RPC surface) - tests/integration.rs (old subscriber wiring) - tests/intercept.rs (decide_intercept_action, classifier path) - tests/lifecycle.rs (record-helper tests on deleted schema) - tests/misc.rs (mixed assertions on old shape) - tests/resolve.rs (old transition_record / Approved status) - tests/state_machine.rs (proptest dep + old schema) The src/* unit tests added across T1-T12 cover the equivalent surface for the new wire shape (Pending|InFlight|Done, Outcome enum, Denial::Policy {rule_permission, rule_pattern}, etc.). Add tests/lifecycle.rs with FIVE E2E lifecycle tests against the in-memory StateBus + FakeExecutor — the integration safety net for the whole refactor: 1. allow_path_end_to_end — pending → InFlight → executed → consume drains 2. deny_path_with_user_corrected_feedback_end_to_end — feedback round-trips 3. timeout_path_lazy_flips_on_consume_end_to_end — past expires_at flips on read 4. cascade_path_end_to_end — two pending rows; allow+always pushes exact-pattern rule; second auto-resolves; consume drains both; pinned: pushed pattern is the originator's argv ('echo go'), NOT blanket '*' 5. allow_rule_short_circuits_with_no_state_write — bonus: Verdict::Allow doesn't touch state 72 approval-gate tests pass across 6 suites (lib + wire + manifest + lifecycle + 2 from common helpers). Net deletion: ~2500 lines of old-shape integration tests gone, ~250 lines of new E2E in their place. * feat(harness/web): Allow+Always, Deny feedback, expires_at countdown (T15) ApprovalRow.tsx grows three real-feature additions matching the approval-gate refactor's new wire surface: 1. Allow+Always button → sends {decision:'allow', always:true}. The gate cascades by pushing a runtime Allow rule with the originator's exact pattern and sweeping the session's other Pending rows (auto-resolving any that match). The response carries cascaded:N when the sweep was non-empty; not surfaced in the UI yet. 2. Optional Deny feedback textarea → when the operator types a correction message before clicking deny, the payload becomes {decision:'deny', denial:{kind:'user_corrected', detail:{feedback}}}. That feedback flows verbatim to the LLM via the stitched message on the next consume — the high-value path because the model gets actionable correction, not just 'denied'. 3. Client-side expires_at countdown → the modal shows MM:SS remaining and disables actions on expiry. No server-emitted timed_out frame anymore (the gate's sweeper is gone); the LLM learns of timeout via the next-turn approval::consume that lazy-flips the row. Each pending row is split into its own ApprovalCard component so the countdown hook respects rules-of-hooks (can't sit inside a .map callback). ResolvePayload is constructed inline so the bridge's Record<string, unknown> type accepts it without an explicit cast at the interface boundary. CSS classes for the new states (approval-allow-always, approval-countdown, approval-feedback, approval-expired-note) inherit default styling for now; a follow-up styling pass can polish. * docs(harness/fanout): note T16 stream-rewrite as deferred follow-up The approval-gate refactor's T16 called for switching the fanout from polling approval::list_pending to subscribing to agent::events for approval_requested / approval_resolved frames. The polling implementation continues to work correctly against the new wire shape — diff_approvals reads function_call_id (now top-level on Record) and list_pending was kept by approval-gate's T8/T11 — so this is a perf cleanup, not a correctness fix. Comment the constant accordingly; defer the actual stream subscription to a follow-up PR. No code change beyond the doc. * fix(shell/e2e): drop allowlist/denylist tests + configs Shell stopped enforcing command-level policy; the e2e suite still referenced the removed allowlist/denylist_patterns config fields and pinned behavior that no longer exists. Strip those expectations and update the docs that described them. - config.yaml / config-jailed.yaml: remove allowlist + denylist_patterns (would fail YAML parse against the new ShellConfig schema) - cases-safety.ts: drop 3 allowlist/denylist assertions; keep one positive 'absolute-path command runs' case - cases-edge.ts: 'nonexistent unlisted command' rewritten to assert the OS spawn error (ENOENT), not an allowlist message - cases-exec-break.ts: drop 'unlisted command with extra fields' - cases-exec-sandbox.ts: drop 'sandbox allowlist still enforced' - cases-vuln-repro.ts: drop S-H3 case (advisory denylist) — the finding's whole point was that command policy belongs upstream - ARCHITECTURE.md / e2e README: reflect the new boundary
Summary
Collapses approval-gate's three overlapping decision surfaces (classifier interceptor +
apply_policy_rules+ run-levelapproval_required) into one rules engine, simplifies the lifecycle to three states, and replaces the four-RPC delivery handshake with one atomic drain.before_function_callhook resolves throughrules::evaluate(function_id, pattern_for(args), ruleset)returningAllow/Ask/Deny. The classifier-as-LLM surface is gone (InterceptAction,decide_intercept_action,ClassifierDecision,interpret_classifier_reply,apply_policy_rules,PolicyOutcomeall deleted).Record { status: Pending | InFlight | Done, outcome: Outcome }:InFlightis the intermediate persist that closes the duplicate-execution race a second resolve would otherwise trigger. Replaces the older multi-status flow.approval::consumereturns Done rows and deletes them atomically (capped, FIFO byresolved_at, defensive session-id filter). Thelist_undelivered/ack_delivered/consume_undelivered/flush_deliveredhandshake is gone.handle_resolve/handle_consume/handle_list_pending.__from_approvalmarker is gone on both write (approval-gate) and verify (shell).classify_argv,approval_bypass,arity.rs, and theShellConfigallowlist/denylist_patternsfields are deleted — command-level policy lives upstream in approval-gate's rules.iii.worker.yamlso empty operator config isn't "prompt for every tool call" — read-onlyfs::*and read-onlygitauto-allow; the rest asks.Decisions worth a careful look
*. Clicking Allow + Always ongit statuspushes a runtimeRule { permission: "shell::exec", pattern: "git status", action: Allow }— earlier drafts pushedpattern: "*", which would have auto-allowed everyshell::execfor the rest of the session includingrm -rf /. Lives inapproval-gate/src/resolve.rs.Ask+ curated default ruleset.before_function_callfires for every tool call, so empty rules would prompt for everything. The shipped defaults iniii.worker.yamlcover the noisy read-only paths.__from_approvalmarker means bus access ≡ shell access; the harness's bus-level access control is now the perimeter. Spelled out in the design doc under "Threat model".resolved_atis kept, not derived. Needed for deterministic LLM stitch order when one consume drains multiple cascade-resolved rows (function_call_idis provider-minted and not lex-monotonic, so it can't substitute).Notable change surface
approval-gate/src/rules.rs,intercept.rs,resolve.rs,state.rs,delivery.rs,register.rssplit out of the oldlib.rs.sweeper.rsremoved in favor of lazy timeout flip.WorkerConfigreduced to{topic, approval_state_scope, default_timeout_ms, rules}.approval-gate/tests/lifecycle.rsshell/src/classify_argv,approval_bypass,arity.rsdeleted.ShellConfigstripsallowlistanddenylist_patterns.shell/tests/e2e/turn-orchestrator/src/states/assistant.rsconsume_approval_stitchnow drivesapproval::consume.harness/web/src/components/ApprovalRow.tsxexpires_atcountdown.harness/src/fanout.rslist_pendingpolling toagent::eventssubscription. Polling still works; deferred.Net diff: ~3.6k lines removed across 63 files; 56 approval-gate lib tests + 5 E2E lifecycle tests pass.
Test plan
cargo test --manifest-path approval-gate/Cargo.toml— 56 lib tests + 5 E2E lifecycle tests pass.shell/tests/e2esuite (CI workflow.github/workflows/shell-e2e.yml) — runs./run-tests.sh+./run-tests-jailed.shagainst a real iii engine on ubuntu-latest.