Skip to content

approval-gate simplification — rules engine, three-state lifecycle, single drain RPC#146

Merged
ytallo merged 30 commits into
feat/shell-allowlist-approvalfrom
approval-gate-simplification
May 16, 2026
Merged

approval-gate simplification — rules engine, three-state lifecycle, single drain RPC#146
ytallo merged 30 commits into
feat/shell-allowlist-approvalfrom
approval-gate-simplification

Conversation

@ytallo
Copy link
Copy Markdown
Contributor

@ytallo ytallo commented May 16, 2026

Summary

Collapses approval-gate's three overlapping decision surfaces (classifier interceptor + apply_policy_rules + run-level approval_required) into one rules engine, simplifies the lifecycle to three states, and replaces the four-RPC delivery handshake with one atomic drain.

  • One decision path: every before_function_call hook resolves through rules::evaluate(function_id, pattern_for(args), ruleset) returning Allow / Ask / Deny. The classifier-as-LLM surface is gone (InterceptAction, decide_intercept_action, ClassifierDecision, interpret_classifier_reply, apply_policy_rules, PolicyOutcome all deleted).
  • Record { status: Pending | InFlight | Done, outcome: Outcome }: InFlight is the intermediate persist that closes the duplicate-execution race a second resolve would otherwise trigger. Replaces the older multi-status flow.
  • Single drain RPC: approval::consume returns Done rows and deletes them atomically (capped, FIFO by resolved_at, defensive session-id filter). The list_undelivered / ack_delivered / consume_undelivered / flush_delivered handshake is gone.
  • No sweeper: the 2-second background polling task is deleted. Pending → TimedOut flips lazily on read in handle_resolve / handle_consume / handle_list_pending.
  • Shell is a plain executor: the __from_approval marker is gone on both write (approval-gate) and verify (shell). classify_argv, approval_bypass, arity.rs, and the ShellConfig allowlist / denylist_patterns fields are deleted — command-level policy lives upstream in approval-gate's rules.
  • Curated default ruleset ships in iii.worker.yaml so empty operator config isn't "prompt for every tool call" — read-only fs::* and read-only git auto-allow; the rest asks.

Decisions worth a careful look

  • Cascade narrows to the exact pattern, not *. Clicking Allow + Always on git status pushes a runtime Rule { permission: "shell::exec", pattern: "git status", action: Allow } — earlier drafts pushed pattern: "*", which would have auto-allowed every shell::exec for the rest of the session including rm -rf /. Lives in approval-gate/src/resolve.rs.
  • No-match default = Ask + curated default ruleset. before_function_call fires for every tool call, so empty rules would prompt for everything. The shipped defaults in iii.worker.yaml cover the noisy read-only paths.
  • Threat-model shift. Removing the __from_approval marker means bus access ≡ shell access; the harness's bus-level access control is now the perimeter. Spelled out in the design doc under "Threat model".
  • resolved_at is kept, not derived. Needed for deterministic LLM stitch order when one consume drains multiple cascade-resolved rows (function_call_id is provider-minted and not lex-monotonic, so it can't substitute).

Notable change surface

Area What
approval-gate/src/ rules.rs, intercept.rs, resolve.rs, state.rs, delivery.rs, register.rs split out of the old lib.rs. sweeper.rs removed in favor of lazy timeout flip. WorkerConfig reduced to {topic, approval_state_scope, default_timeout_ms, rules}.
approval-gate/tests/lifecycle.rs E2E coverage of allow / deny / timeout / cascade against in-memory fakes.
shell/src/ classify_argv, approval_bypass, arity.rs deleted. ShellConfig strips allowlist and denylist_patterns.
shell/tests/e2e/ Allowlist/denylist test cases and config keys removed; remaining safety guardrails (timeout, output truncation, env scrubbing, fs path denylist) kept.
turn-orchestrator/src/states/assistant.rs consume_approval_stitch now drives approval::consume.
harness/web/src/components/ApprovalRow.tsx Allow + Always button, optional Deny correction feedback, client-side expires_at countdown.
harness/src/fanout.rs Documented follow-up: switch from list_pending polling to agent::events subscription. Polling still works; deferred.

Net diff: ~3.6k lines removed across 63 files; 56 approval-gate lib tests + 5 E2E lifecycle tests pass.

Test plan

  • cargo test --manifest-path approval-gate/Cargo.toml — 56 lib tests + 5 E2E lifecycle tests pass.
  • shell/tests/e2e suite (CI workflow .github/workflows/shell-e2e.yml) — runs ./run-tests.sh + ./run-tests-jailed.sh against a real iii engine on ubuntu-latest.
  • Manual: allow / deny / cascade / timeout flows through the harness UI.

ytallo added 24 commits May 15, 2026 18:19
Introduce a first-class Rule/Ruleset/Action surface ported from
opencode's Permission.evaluate. Rules are wildcard globs over the iii
function id (pattern is "*" in v1 — the field is reserved for per-
function pattern extractors in a follow-up). Evaluator semantics:
findLast across flattened layers, so operators stack a permissive
default with more-specific overrides without surgery on the base list.

approval-gate
- New rules module: Action (allow/deny/ask), Rule, Ruleset alias,
  wildcard_match (DP, no regex), evaluate(perm, pattern, IntoIterator
  of &Rule). 16 unit tests covering matching, layering, last-wins.
- WorkerConfig.rules: Vec<Rule> (additive; existing InterceptorRule
  flow unchanged).
- apply_policy_rules: pure helper that maps a rule hit to PolicyOutcome
  (Allow / Deny{rule_permission, rule_pattern} / FallThrough).
  Allow short-circuits to pass-through; Deny short-circuits with a
  Denial::Policy that names the matched rule; Ask and no-match fall
  through to the existing per-function interceptor flow.
- handle_subscriber consults the rules layer first, before
  decide_intercept_action.
When a user resolves a pending call with decision=allow + always=true,
approval-gate now pushes a runtime Allow rule for the call's function
id and sweeps the same session's pending records, auto-resolving every
other one the new rule covers. One click instead of N.

approval-gate
- approve_and_execute() extracted from handle_resolve's allow branch as
  a reusable async helper. Both the user-driven allow path and the
  cascade sweep drive state through the same transitions
  (approved -> invoke -> executed | failed).
- Shared policy ruleset wrapped as Arc<RwLock<Ruleset>> so reply-time
  mutation is safe across the subscriber + resolve closures. Read
  guards are scoped so they never cross an .await (std::sync::RwLock
  is not async-safe to hold across suspension).
- cascade_allow_for_session() pushes the new rule under the write
  lock, snapshots session pending via list_prefix, and runs
  approve_and_execute for each newly-Allow record. The originator is
  skipped (already resolved above). Per-record state-write failures
  are logged and the rest of the cascade continues.
- handle_resolve response carries `cascaded: N` when the sweep
  resolved at least one extra record; omitted otherwise so the
  one-shot path stays unchanged.

Cascade scope is intentionally narrow for v1:
- function-id-only matching (pattern "*"), mirrors the v1 rules surface
- same session only
- in-memory rule (no cross-restart persistence)
- allow + always; deny + always is out of scope

Five new tests cover the cascade decision tree (no-always, same-session
match, cross-session non-effect, originator-skip, terminal-record skip).
The Denial refactor replaced decision_reason with a typed structure, but
several legacy hooks remained in place. They are now removed because the
underlying schemas they protected against (pre-trigger-model status
strings, pre-Denial flat decision_reason field) cannot occur in any
state store this branch is wired to.

approval-gate
- Drop migrate_legacy_record entirely. The pre-trigger-model
  status="allow"/"deny" rename and the decision_reason -> Denial::Legacy
  lift both go away. Old persisted records (if any) now surface as
  filtered-out orphans rather than silently mistranslated.
- Drop Denial::Legacy variant from the enum.
- Drop legacy_migrated flag from records.
- Drop the four migrate_legacy_record_* unit tests and the
  handle_list_undelivered_persists_migrated_legacy_record integration
  test.
- handle_list_pending no longer pre-filters legacy shapes; orphan
  records lacking a session_id stamp are dropped uniformly.
- timeout_resolved_event drops the decision_reason: "timeout" field
  (status: "timed_out" is self-describing).
- handle_sweep_session drops the optional `reason` payload field. The
  cause (session_deleted vs run_stopped) is the caller's concern; the
  swept timed_out records carry no Denial.
- skills/sweep_session.md updated.

turn-orchestrator
- approval_stitching drops the legacy denial-kind branch and the
  legacy_migrated note line; the corresponding stitch test is removed.
- run_stop no longer forwards reason: "run_stopped" on the
  approval::sweep_session payload; it logs the cause locally instead.
- The run_stop integration test now asserts the absence of the reason
  field on the sweep payload.

harness-types (6 vendored copies, byte-identical)
- Drop the Legacy variant and its docblock line from each agent_event.rs
  copy.
New src/record.rs defines the persisted approval record as a typed
struct alongside its lifecycle Status enum and a Next enum that pairs
each target status with its outcome data. Wire format is byte-identical
to the existing serde_json::Value blobs — operators can adopt the
typed shape incrementally without changing any persisted state.

approval-gate
- New module record:
  - Status enum (Pending / Approved / Executed / Failed / Denied /
    TimedOut), serde snake_case, with is_terminal() helper.
  - Record struct mirroring the historical JSON keys field-for-field,
    optional outcome fields scoped to terminal statuses, serde
    skip-when-None so existing rows round-trip cleanly.
  - Next enum that bundles (target_status, outcome_payload) so the
    type system rules out impossible transitions (e.g. Executed without
    a result, Denied without a Denial). Replaces the old
    (status_string, Option<result>, Option<error>, Option<denial>)
    parallel-Option signature when callers opt in.
  - Record::to_value / Record::from_value bridge to the Value blobs the
    iii state bus still expects.
  - 8 unit tests covering serde round-trips, status semantics, expiry
    saturation, forward-compat with unknown fields.
- lib.rs re-exports Next, Record, Status.

Handler migration to the typed schema is a separate change; this commit
just makes the types available so future work can lift Value access
into typed field access incrementally rather than in one big diff.
First step of breaking lib.rs (~4300 lines) into focused modules.
This commit lifts the wire-format types and small wire-shape helpers
into their own module — pure data shapes with no I/O, no iii-sdk
dependency, no async — so downstream workers that only need to
understand the approval-gate protocol can depend on a small surface.

Moved (lib.rs -> wire.rs):
- Denial (structured deny payload)
- IncomingCall + requires_approval()
- Decision (Allow | Deny(Denial))
- WireDecision (allow|deny wire enum)
- pending_key()
- extract_call()
- block_reply_for()

All public items stay re-exported from the crate root, so every
existing call site (handlers, tests, integration tests, downstream
crates) continues to work without import changes.

Pure move, no behavior change. 141 approval-gate tests pass;
turn-orchestrator + harness + session + providers + harness-tui +
shell all pass.
Second step of breaking lib.rs into focused modules. This commit
lifts the persisted-record lifecycle helpers — pure Value-blob
constructors and transitions, no I/O, no iii-sdk dependency — into
their own module.

Moved (lib.rs -> lifecycle.rs):
- is_terminal_status()
- build_pending_record()
- transition_record() + transition_record_with_now()
- collect_timed_out_for_sweep()
- maybe_flip_timed_out()

All public items stay re-exported from the crate root.
Pure move, no behavior change. 141 approval-gate tests pass;
all 8 downstream worker crates pass.
Third step of breaking lib.rs into focused modules. This commit lifts
the iii-backed state-bus and function-executor implementations into
their own module, along with the `__from_approval` marker plumbing
and the boot-time marker-target safety check.

Moved (lib.rs -> state.rs):
- StateBus trait + IiiStateBus impl  (iii state::* wrapper)
- FunctionExecutor trait + IiiFunctionExecutor impl  (iii.trigger wrapper)
- rule_for()  (lookup helper used by IiiFunctionExecutor)
- merge_from_approval_marker_if_needed()  (marker payload composer)
- unverified_marker_targets()  (boot guard)

The StateBus / FunctionExecutor traits are kept exactly as they were
in lib.rs — they exist purely as test seams so unit tests can swap in
InMemoryStateBus / FakeExecutor. No new abstractions added.

Public items stay re-exported from the crate root.
Pure move, no behavior change. 141 approval-gate tests pass;
all 8 downstream worker crates pass (turn-orchestrator's pre-existing
dual_write flake notwithstanding).
Fourth step of breaking lib.rs into focused modules. This commit
lifts the intercept decision flow into its own module — the three
layers that decide what the gate does with an incoming function
call: policy rules, per-function interceptor rules, and classifier
replies, plus the async handle_intercept that writes the pending
record.

Moved (lib.rs -> intercept.rs):
- InterceptAction enum + decide_intercept_action()
- PolicyOutcome enum + apply_policy_rules()
- ClassifierDecision enum + interpret_classifier_reply()
- handle_intercept()

handle_intercept stays public (re-exported from the crate root).
The pub(crate) helpers stay pub(crate) and are imported back into
lib.rs for the subscriber closure in register().

Pure move, no behavior change. 141 approval-gate tests pass;
turn-orchestrator + harness + shell verified green.
Fifth step of breaking lib.rs into focused modules. This commit
lifts the approval::resolve flow into its own module.

Moved (lib.rs -> resolve.rs):
- handle_lookup_record()
- handle_resolve()
- cascade_allow_for_session() (the always=true sweep)
- approve_and_execute() (shared by user-driven allow + cascade)

handle_resolve and handle_lookup_record stay public via the crate
root re-exports. approve_and_execute and cascade_allow_for_session
stay private to the resolve module.

Pure move, no behavior change. 141 approval-gate tests pass.
Sixth step of breaking lib.rs into focused modules. This commit
lifts the delivery-tracking RPCs into their own module.

Moved (lib.rs -> delivery.rs):
- handle_list_pending()
- handle_list_undelivered()
- handle_ack_delivered()
- handle_consume_undelivered()
- handle_flush_delivered()
- handle_sweep_session()
- LIST_UNDELIVERED_DEFAULT_LIMIT

All public items stay re-exported from the crate root.
Pure move, no behavior change. 141 approval-gate tests pass.
Seventh step of breaking lib.rs into focused modules. This commit
lifts the periodic timeout sweeper and the small iii-stream helpers
it owns into their own module.

Moved (lib.rs -> sweeper.rs):
- spawn_timeout_sweeper()
- timeout_resolved_event()
- write_event()
- write_hook_reply()
- uuid_like()

All five stay pub(crate) — only the register() wiring in lib.rs
and the resolve flow need them. Pure move, no behavior change.
141 approval-gate tests pass.
Final step of breaking lib.rs into focused modules. This commit
lifts the iii function/trigger wiring — the heart of the worker
startup path — into its own module.

Moved (lib.rs -> register.rs):
- register() (~420 lines: the subscriber closure + 8 function
  registrations + trigger registration + sweeper spawn)
- Refs struct (the handle bag returned to the binary)
- FN_RESOLVE / FN_LIST_PENDING / FN_LIST_UNDELIVERED /
  FN_CONSUME_UNDELIVERED / FN_ACK_DELIVERED / FN_FLUSH_DELIVERED /
  FN_SWEEP_SESSION / FN_LOOKUP_RECORD constants
- STATE_SCOPE constant

lib.rs is now ~50 lines of module declarations + re-exports +
~2580 lines of inline tests. The test mod uses super::* plus
local serde_json and std::sync imports.

All public items stay re-exported from the crate root so the
binary's `use approval_gate::register;` keeps working unchanged.
Pure move, no behavior change.

141 approval-gate tests pass; turn-orchestrator + harness + shell
verified green (turn-orchestrator's pre-existing dual_write flake
notwithstanding).
First step of relocating the 2580-line inline test block from
lib.rs into more focused homes. This commit moves the 18 tests
that touch pub(crate) helpers — ones that can't live in tests/*.rs
because integration tests can only see pub items — into the
#[cfg(test)] mod tests block of whichever production module owns
the helper they exercise.

intercept.rs gained:
- interpret_classifier_reply_reads_decision_tags
- decide_intercept_action_* (5 tests)
- apply_policy_rules_* (5 tests)

state.rs gained:
- merge_from_approval_* (4 tests)
- rule_for_returns_matching_rule / rule_for_returns_none_when_absent

sweeper.rs gained:
- timeout_resolved_event_shape

lib.rs loses those 18 tests. 141 approval-gate tests pass (no
duplicate counts — each test runs once). turn-orchestrator + harness
+ shell verified green.

Public-API tests (the remaining ~96 in lib.rs) move to tests/*.rs
in a follow-up commit so they can share fakes via a common module.
Final test split. The 86 remaining inline tests in lib.rs that
exercise only the crate's `pub` surface now live in tests/*.rs,
organized by the area they cover. The state-machine proptest gets
its own file. lib.rs drops from 2369 to 52 lines (just module
declarations + re-exports).

New tests/ layout:
- common/mod.rs    — shared fakes: FakeExecutor, InMemoryStateBus,
                     FailingStateBus, sample_call(), empty_policy_rules()
- lifecycle.rs     — 19 tests: build_pending_record, transition_record,
                     maybe_flip_timed_out, collect_timed_out_for_sweep,
                     is_terminal_status, pending_key
- wire.rs          — 10 tests: extract_call, block_reply_for,
                     requires_approval
- intercept.rs     — 9 tests: handle_intercept (replays, fail-closed,
                     session_id stamping, force_pending)
- resolve.rs       — 19 tests: handle_resolve, cascade-on-`always`,
                     handle_lookup_record
- delivery.rs      — 25 tests: list_pending / list_undelivered /
                     ack_delivered / consume_undelivered /
                     flush_delivered / sweep_session
- misc.rs          — 4 tests: FN_* constants, unverified_marker_targets,
                     FakeExecutor smoke test
- state_machine.rs — the proptest with the four lifecycle invariants

Each tests/*.rs uses `mod common;` for shared fakes — only one source
of truth for in-memory bus/executor stand-ins.

141 approval-gate tests pass across 13 test binaries (previously 6).
turn-orchestrator + harness + shell verified green.
…ecycle

Pending → InFlight → Done(Outcome). InFlight is the intermediate
persist that closes the dup-exec race within a worker process — a
second approval::resolve arriving during the invoke await sees the
row is non-Pending and bails.

Outcome is a tagged enum (Executed/Failed/Denied/TimedOut). resolved_at
is stamped on the first non-Pending transition for deterministic
multi-row consume ordering.

lifecycle.rs is deleted; its only surviving helper
(flipped_to_timed_out_if_expired) is now a Record method.
Task 1 of the simplification plan deletes lifecycle.rs entirely, but
the plan was a bit optimistic: dependent modules (intercept, resolve,
delivery, sweeper) still import its helpers (build_pending_record,
transition_record, maybe_flip_timed_out, is_terminal_status,
collect_timed_out_for_sweep) and the crate fails to compile without
them. Cargo can't run record.rs's unit tests if the library itself
won't link.

Restore lifecycle.rs as a transitional shim. Tasks 5/6/8/11 will
progressively migrate each callsite to the new Record API; whichever
of those tasks lands last deletes lifecycle.rs for real.
…ttern

The classifier surface is being deleted in favor of the layered rules
engine; rename the Policy variant's detail fields to match the actual
data they'll carry once T5 lands. For now, classifier callsites
(intercept.rs, register.rs) map their old reason/fn locals into the
new field names — transient mapping, deleted in T5.

Test callsites in tests/wire.rs and tests/lifecycle.rs updated to use
representative rule_permission/rule_pattern values.

Note: IncomingCall.approval_required + requires_approval() field/method
are NOT removed in this commit (the plan's T2 calls for it but
intercept.rs's decide_intercept_action still depends on them). They
disappear in T5 when intercept.rs is rewritten end-to-end.
For shell::exec/shell::exec_bg, derive a pattern string from
{command, args} so layered rules can match argv-level patterns like
'git status*'. Other function ids default to '*'. Malformed/missing
args fall back to '*' (matches only wildcard rules).

8 new tests cover all branches: join+args, single-string command,
empty args list, missing command, malformed args, non-shell function
id, and the documented space-conflation case.
Required by the new approval::consume RPC (T8), which returns Done
rows and deletes them in the same call. IiiStateBus impl maps to the
existing state::delete RPC (used elsewhere in the codebase, e.g.
auth-credentials). Test fakes (InMemoryStateBus, FailingStateBus) gain
matching impls — InMemory removes the key idempotently, Failing errors
to match its other primitives.
The classifier surface and per-function InterceptorRule decision flow
are gone. handle_intercept now reads the verdict via the new lib-level
verdict_for(function_id, args, ruleset) helper:

  Verdict::Allow → {block:false}, no state write
  Verdict::Deny  → {block:true, denial:Policy{rule_permission, rule_pattern}}
  Verdict::Ask   → write Pending row + reply {block:true, status:pending}

Replay defense extended for the new 3-state schema:
  Pending  → replay:in_flight, status:pending
  InFlight → replay:in_flight, status:in_flight  (new)
  Done     → replay:already_resolved, status:done

intercept.rs drops InterceptAction, decide_intercept_action,
ClassifierDecision, interpret_classifier_reply, PolicyOutcome,
apply_policy_rules. register.rs's subscriber closure collapses the
old policy_outcome+decide_intercept+classifier dance into one
handle_intercept call. resolve.rs's cascade loop reuses verdict_for
instead of the deleted apply_policy_rules.

State-write failure still fails closed with Denial::StateError.
7 new tests cover all four verdict branches + the three replay
states + fail-closed behavior.

Note: InterceptorRule config struct and intercept_rules variable
are still in place for T12 to fully remove.
handle_resolve rewritten to typed Record + three-phase allow path:
  1. write InFlight (closes the dup-exec race within a worker process)
  2. iii.trigger(function_id, args) and await
  3. write Done(Executed{result}) or Done(Failed{error})

Deny is a single Pending → Done(Denied{denial}) write — no invoke,
no InFlight needed. WireDecision::Deny with no denial in payload
defaults to UserRejected. UserCorrected{feedback} round-trips.

Dup-exec guard: handle_resolve refuses non-Pending rows with
  in_flight (concurrent resolve mid-invoke) or
  already_resolved (terminal).

Cascade allow on always:true (T7) now pushes a runtime Allow rule with
the originator's EXACT pattern via rules::pattern_for(args) — not the
prior blanket pattern:'*'. 'Always allow git status' no longer
auto-allows rm -rf / via the same shell::exec function id.
Lock-ordering invariant pinned in module docs: never hold the ruleset
guard across .await.
…elete sweeper.rs/lifecycle.rs (T8+T9+T11)

Three behavioral changes folded into one commit because they share files:

T8 (new approval::consume RPC):
  Three-phase drain — gather Done candidates, sort by resolved_at + cap,
  delete-and-return. Defensive session_id filter. Lazy timeout flip on
  read. Default cap CONSUME_DEFAULT_LIMIT=50 bounds the response size
  against MB-scale stdout payloads.

T9 (delivery.rs strip):
  handle_list_undelivered, handle_consume_undelivered, handle_ack_delivered,
  handle_flush_delivered all deleted. LIST_UNDELIVERED_DEFAULT_LIMIT
  constant gone. handle_list_pending rewritten on the typed Record API
  with lazy timeout flip on read. handle_sweep_session rewritten on new
  schema — flips Pending+InFlight rows to Done(TimedOut).

T11 (sweeper.rs deleted):
  Background polling task gone. Timeouts now flip lazily on read in
  handle_resolve / handle_consume / handle_list_pending. UI handles
  expires_at countdown client-side; the LLM learns of timeouts on the
  next consume.
  lifecycle.rs transient shim also deleted (no more callers).
  Stream helpers (uuid_like, write_event, write_hook_reply) move into
  register.rs as their only consumer; spawn_timeout_sweeper +
  timeout_resolved_event deleted with the rest of sweeper.rs.

register.rs surgery:
  Refs struct loses 5 dead FunctionRef fields and the sweeper JoinHandle.
  RPC registrations for list_undelivered, consume_undelivered,
  ack_delivered, flush_delivered all gone. New FN_CONSUME registration
  added. Classifier-alias warning check trimmed to live function ids.

15 new tests cover handle_consume (7) + handle_sweep_session (1) +
handle_list_pending lazy flip (1) + the pre-existing edits. 65 lib
tests pass total.
IiiFunctionExecutor::invoke forwards function_id+args directly to
iii.trigger — no marker stamp.

Deleted from state.rs:
- merge_from_approval_marker_if_needed (and its 4 unit tests)
- unverified_marker_targets (and the 2 rule_for tests)
- rule_for (only marker-related callers)

Deleted from register.rs:
- IiiFunctionExecutor's rules: Arc<Vec<InterceptorRule>> field
- Boot-time unverified-marker check (refused-to-start guard)

Per the spec's Threat Model section: bus access ≡ shell access in the
new model. The harness's bus-level access control is the perimeter;
defense-in-depth via per-target marker verification is out of scope
for v1. Shell-side marker verification is deleted in T13.

T12 will fully retire InterceptorRule + the classifier-alias warning
loop in register.rs.
config.rs: WorkerConfig now has just {topic, approval_state_scope,
default_timeout_ms, rules}. interceptors + sweeper_interval_ms fields
gone. InterceptorRule struct kept as a no-op shim with minimal fields
so the classifier-alias warning loop in register.rs still compiles —
the loop is fed an empty Vec so the body never runs.

iii.worker.yaml: replaced interceptor config with a curated default
ruleset. Read-only fs/git auto-allowed; shell::exec/exec_bg ask;
catch-all asks. Operators stack their own rules on top (last-match
wins).

register.rs cfg.interceptors → cfg.rules wiring confirmed: policy_rules
is seeded from cfg.rules at startup and remains mutable via cascade.

Note: InterceptorRule shim + classifier-alias warning loop in register
will be deleted in a follow-up pass when the dependent code is fully
cleaned up. They're cosmetic at this point — neither has functional
effect.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
workers Error Error May 16, 2026 0:53am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d6acc78-ebe5-4695-884c-3a64ee311c17

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch approval-gate-simplification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

skill-check — worker

3 verified, 22 skipped (no docs/).

1 error across the verified workers.

File Approximate line Severity Violation
shell/docs/leaves/kill.md ~7 error [Terminology.SlopMarketing] Avoid marketing/anthropomorphic phrasing 'Reaping'. Use concrete, technical language. (error)

ytallo added 3 commits May 16, 2026 09:10
… (T13)

Shell becomes a plain executor. All policy decisions live in
approval-gate's rules layer (see the layered ruleset shipped in
approval-gate/iii.worker.yaml).

Deleted:
- shell/src/functions/approval_bypass.rs (marker validation module)
- shell/src/functions/classify.rs (classifier handler)
- shell/src/arity.rs (arity-aware allowlist matcher)
- ApprovalMarker + ClassifyArgvRequest types
- ExecRequest.from_approval + ExecBgRequest.from_approval fields
- ShellConfig.allowlist + allow_any + denylist_patterns +
  compiled_denylist fields and their methods (compile_denylist,
  denylist_hit_reason, allowlist_contains, is_command_allowed)
- shell::classify_argv function registration
- shell::classify_argv manifest entry
- All allowlist/denylist unit tests in shell/src/config.rs
- One test in shell/tests/function_handlers.rs that asserted shell
  rejected unlisted commands

Kept (independent surface):
- fs::host_root jail + fs.denylist_paths (filesystem path-based denials
  for shell::fs::* tools; distinct from the exec-policy denylist)
- The fs jail validate_fs_jail boot check

Threat model implication, per the spec: bus access ≡ shell access.
The harness's bus-level access control is the perimeter; any worker on
the bus can call shell::exec with arbitrary args. Defense-in-depth via
shell-side allowlists is gone for v1.

133 shell lib tests + 328 shell integration tests pass.
consume_approval_stitch now calls approval::consume (single RPC,
returns + deletes in one shot). Payload is { session_id } only; no
turn_id, no limit (gate enforces a default cap of 50, surfacing
overflow via the 'omitted' counter as before).

stitch_entries rewritten for the new Record wire shape: reads nested
outcome: { kind, detail } instead of the old top-level status +
result/error/denial. Outcome kinds: executed, failed, denied,
timed_out. function_call_id is the canonical key (call_id retained
as a legacy fallback for old rows on disk during the upgrade).

render_denial_lines's policy branch reads rule_permission/rule_pattern
(matching the renamed Denial::Policy detail shape from T2). The
'policy denied by classifier_fn: classifier_reason' wording becomes
'policy denied by rule <perm> : <pat>'.

omission_summary_message reworded: 'approval::flush_delivered' is gone;
the natural recovery path is the next-turn consume, which is what the
new message points at.

131 lib tests pass (was 122 + 9 failing from old-shape assertions).
Pre-existing integration test failure (tests/run_stop.rs imports
run_stop module that isn't declared in lib.rs) is unrelated to this
refactor.
…sts (T17)

Delete 8 integration test files whose assertions targeted the deleted
surface (FN_LIST_UNDELIVERED, FN_ACK_DELIVERED, transition_record,
InterceptAction, marker plumbing, ...):
- tests/approval_lifecycle.rs (engine-backed, used deleted RPCs)
- tests/delivery.rs (old delivery RPC surface)
- tests/integration.rs (old subscriber wiring)
- tests/intercept.rs (decide_intercept_action, classifier path)
- tests/lifecycle.rs (record-helper tests on deleted schema)
- tests/misc.rs (mixed assertions on old shape)
- tests/resolve.rs (old transition_record / Approved status)
- tests/state_machine.rs (proptest dep + old schema)

The src/* unit tests added across T1-T12 cover the equivalent surface
for the new wire shape (Pending|InFlight|Done, Outcome enum,
Denial::Policy {rule_permission, rule_pattern}, etc.).

Add tests/lifecycle.rs with FIVE E2E lifecycle tests against the
in-memory StateBus + FakeExecutor — the integration safety net for
the whole refactor:
1. allow_path_end_to_end — pending → InFlight → executed → consume drains
2. deny_path_with_user_corrected_feedback_end_to_end — feedback round-trips
3. timeout_path_lazy_flips_on_consume_end_to_end — past expires_at flips on read
4. cascade_path_end_to_end — two pending rows; allow+always pushes exact-pattern
   rule; second auto-resolves; consume drains both; pinned: pushed pattern is
   the originator's argv ('echo go'), NOT blanket '*'
5. allow_rule_short_circuits_with_no_state_write — bonus: Verdict::Allow
   doesn't touch state

72 approval-gate tests pass across 6 suites (lib + wire + manifest +
lifecycle + 2 from common helpers). Net deletion: ~2500 lines of
old-shape integration tests gone, ~250 lines of new E2E in their place.
ytallo added 2 commits May 16, 2026 09:22
…(T15)

ApprovalRow.tsx grows three real-feature additions matching the
approval-gate refactor's new wire surface:

1. Allow+Always button → sends {decision:'allow', always:true}. The
   gate cascades by pushing a runtime Allow rule with the originator's
   exact pattern and sweeping the session's other Pending rows
   (auto-resolving any that match). The response carries cascaded:N
   when the sweep was non-empty; not surfaced in the UI yet.

2. Optional Deny feedback textarea → when the operator types a
   correction message before clicking deny, the payload becomes
   {decision:'deny', denial:{kind:'user_corrected', detail:{feedback}}}.
   That feedback flows verbatim to the LLM via the stitched message
   on the next consume — the high-value path because the model gets
   actionable correction, not just 'denied'.

3. Client-side expires_at countdown → the modal shows MM:SS remaining
   and disables actions on expiry. No server-emitted timed_out frame
   anymore (the gate's sweeper is gone); the LLM learns of timeout
   via the next-turn approval::consume that lazy-flips the row.

Each pending row is split into its own ApprovalCard component so the
countdown hook respects rules-of-hooks (can't sit inside a .map
callback). ResolvePayload is constructed inline so the bridge's
Record<string, unknown> type accepts it without an explicit cast at
the interface boundary.

CSS classes for the new states (approval-allow-always,
approval-countdown, approval-feedback, approval-expired-note) inherit
default styling for now; a follow-up styling pass can polish.
The approval-gate refactor's T16 called for switching the fanout from
polling approval::list_pending to subscribing to agent::events for
approval_requested / approval_resolved frames. The polling
implementation continues to work correctly against the new wire shape
— diff_approvals reads function_call_id (now top-level on Record) and
list_pending was kept by approval-gate's T8/T11 — so this is a perf
cleanup, not a correctness fix.

Comment the constant accordingly; defer the actual stream subscription
to a follow-up PR. No code change beyond the doc.
Shell stopped enforcing command-level policy; the e2e suite still
referenced the removed allowlist/denylist_patterns config fields and
pinned behavior that no longer exists. Strip those expectations and
update the docs that described them.

- config.yaml / config-jailed.yaml: remove allowlist + denylist_patterns
  (would fail YAML parse against the new ShellConfig schema)
- cases-safety.ts: drop 3 allowlist/denylist assertions; keep one
  positive 'absolute-path command runs' case
- cases-edge.ts: 'nonexistent unlisted command' rewritten to assert
  the OS spawn error (ENOENT), not an allowlist message
- cases-exec-break.ts: drop 'unlisted command with extra fields'
- cases-exec-sandbox.ts: drop 'sandbox allowlist still enforced'
- cases-vuln-repro.ts: drop S-H3 case (advisory denylist) — the
  finding's whole point was that command policy belongs upstream
- ARCHITECTURE.md / e2e README: reflect the new boundary
@ytallo ytallo merged commit 8f9d090 into feat/shell-allowlist-approval May 16, 2026
3 of 5 checks passed
ytallo added a commit that referenced this pull request May 16, 2026
…ingle drain RPC (#146)

* feat(approval-gate): layered policy rules with findLast evaluator

Introduce a first-class Rule/Ruleset/Action surface ported from
opencode's Permission.evaluate. Rules are wildcard globs over the iii
function id (pattern is "*" in v1 — the field is reserved for per-
function pattern extractors in a follow-up). Evaluator semantics:
findLast across flattened layers, so operators stack a permissive
default with more-specific overrides without surgery on the base list.

approval-gate
- New rules module: Action (allow/deny/ask), Rule, Ruleset alias,
  wildcard_match (DP, no regex), evaluate(perm, pattern, IntoIterator
  of &Rule). 16 unit tests covering matching, layering, last-wins.
- WorkerConfig.rules: Vec<Rule> (additive; existing InterceptorRule
  flow unchanged).
- apply_policy_rules: pure helper that maps a rule hit to PolicyOutcome
  (Allow / Deny{rule_permission, rule_pattern} / FallThrough).
  Allow short-circuits to pass-through; Deny short-circuits with a
  Denial::Policy that names the matched rule; Ask and no-match fall
  through to the existing per-function interceptor flow.
- handle_subscriber consults the rules layer first, before
  decide_intercept_action.

* feat(approval-gate): cascade allow on `always: true` resolve reply

When a user resolves a pending call with decision=allow + always=true,
approval-gate now pushes a runtime Allow rule for the call's function
id and sweeps the same session's pending records, auto-resolving every
other one the new rule covers. One click instead of N.

approval-gate
- approve_and_execute() extracted from handle_resolve's allow branch as
  a reusable async helper. Both the user-driven allow path and the
  cascade sweep drive state through the same transitions
  (approved -> invoke -> executed | failed).
- Shared policy ruleset wrapped as Arc<RwLock<Ruleset>> so reply-time
  mutation is safe across the subscriber + resolve closures. Read
  guards are scoped so they never cross an .await (std::sync::RwLock
  is not async-safe to hold across suspension).
- cascade_allow_for_session() pushes the new rule under the write
  lock, snapshots session pending via list_prefix, and runs
  approve_and_execute for each newly-Allow record. The originator is
  skipped (already resolved above). Per-record state-write failures
  are logged and the rest of the cascade continues.
- handle_resolve response carries `cascaded: N` when the sweep
  resolved at least one extra record; omitted otherwise so the
  one-shot path stays unchanged.

Cascade scope is intentionally narrow for v1:
- function-id-only matching (pattern "*"), mirrors the v1 rules surface
- same session only
- in-memory rule (no cross-restart persistence)
- allow + always; deny + always is out of scope

Five new tests cover the cascade decision tree (no-always, same-session
match, cross-session non-effect, originator-skip, terminal-record skip).

* chore(approval-gate): remove legacy migration surface

The Denial refactor replaced decision_reason with a typed structure, but
several legacy hooks remained in place. They are now removed because the
underlying schemas they protected against (pre-trigger-model status
strings, pre-Denial flat decision_reason field) cannot occur in any
state store this branch is wired to.

approval-gate
- Drop migrate_legacy_record entirely. The pre-trigger-model
  status="allow"/"deny" rename and the decision_reason -> Denial::Legacy
  lift both go away. Old persisted records (if any) now surface as
  filtered-out orphans rather than silently mistranslated.
- Drop Denial::Legacy variant from the enum.
- Drop legacy_migrated flag from records.
- Drop the four migrate_legacy_record_* unit tests and the
  handle_list_undelivered_persists_migrated_legacy_record integration
  test.
- handle_list_pending no longer pre-filters legacy shapes; orphan
  records lacking a session_id stamp are dropped uniformly.
- timeout_resolved_event drops the decision_reason: "timeout" field
  (status: "timed_out" is self-describing).
- handle_sweep_session drops the optional `reason` payload field. The
  cause (session_deleted vs run_stopped) is the caller's concern; the
  swept timed_out records carry no Denial.
- skills/sweep_session.md updated.

turn-orchestrator
- approval_stitching drops the legacy denial-kind branch and the
  legacy_migrated note line; the corresponding stitch test is removed.
- run_stop no longer forwards reason: "run_stopped" on the
  approval::sweep_session payload; it logs the cause locally instead.
- The run_stop integration test now asserts the absence of the reason
  field on the sweep payload.

harness-types (6 vendored copies, byte-identical)
- Drop the Legacy variant and its docblock line from each agent_event.rs
  copy.

* feat(approval-gate): add typed Record, Status, Next schema

New src/record.rs defines the persisted approval record as a typed
struct alongside its lifecycle Status enum and a Next enum that pairs
each target status with its outcome data. Wire format is byte-identical
to the existing serde_json::Value blobs — operators can adopt the
typed shape incrementally without changing any persisted state.

approval-gate
- New module record:
  - Status enum (Pending / Approved / Executed / Failed / Denied /
    TimedOut), serde snake_case, with is_terminal() helper.
  - Record struct mirroring the historical JSON keys field-for-field,
    optional outcome fields scoped to terminal statuses, serde
    skip-when-None so existing rows round-trip cleanly.
  - Next enum that bundles (target_status, outcome_payload) so the
    type system rules out impossible transitions (e.g. Executed without
    a result, Denied without a Denial). Replaces the old
    (status_string, Option<result>, Option<error>, Option<denial>)
    parallel-Option signature when callers opt in.
  - Record::to_value / Record::from_value bridge to the Value blobs the
    iii state bus still expects.
  - 8 unit tests covering serde round-trips, status semantics, expiry
    saturation, forward-compat with unknown fields.
- lib.rs re-exports Next, Record, Status.

Handler migration to the typed schema is a separate change; this commit
just makes the types available so future work can lift Value access
into typed field access incrementally rather than in one big diff.

* refactor(approval-gate): extract wire.rs from lib.rs

First step of breaking lib.rs (~4300 lines) into focused modules.
This commit lifts the wire-format types and small wire-shape helpers
into their own module — pure data shapes with no I/O, no iii-sdk
dependency, no async — so downstream workers that only need to
understand the approval-gate protocol can depend on a small surface.

Moved (lib.rs -> wire.rs):
- Denial (structured deny payload)
- IncomingCall + requires_approval()
- Decision (Allow | Deny(Denial))
- WireDecision (allow|deny wire enum)
- pending_key()
- extract_call()
- block_reply_for()

All public items stay re-exported from the crate root, so every
existing call site (handlers, tests, integration tests, downstream
crates) continues to work without import changes.

Pure move, no behavior change. 141 approval-gate tests pass;
turn-orchestrator + harness + session + providers + harness-tui +
shell all pass.

* refactor(approval-gate): extract lifecycle.rs from lib.rs

Second step of breaking lib.rs into focused modules. This commit
lifts the persisted-record lifecycle helpers — pure Value-blob
constructors and transitions, no I/O, no iii-sdk dependency — into
their own module.

Moved (lib.rs -> lifecycle.rs):
- is_terminal_status()
- build_pending_record()
- transition_record() + transition_record_with_now()
- collect_timed_out_for_sweep()
- maybe_flip_timed_out()

All public items stay re-exported from the crate root.
Pure move, no behavior change. 141 approval-gate tests pass;
all 8 downstream worker crates pass.

* refactor(approval-gate): extract state.rs from lib.rs

Third step of breaking lib.rs into focused modules. This commit lifts
the iii-backed state-bus and function-executor implementations into
their own module, along with the `__from_approval` marker plumbing
and the boot-time marker-target safety check.

Moved (lib.rs -> state.rs):
- StateBus trait + IiiStateBus impl  (iii state::* wrapper)
- FunctionExecutor trait + IiiFunctionExecutor impl  (iii.trigger wrapper)
- rule_for()  (lookup helper used by IiiFunctionExecutor)
- merge_from_approval_marker_if_needed()  (marker payload composer)
- unverified_marker_targets()  (boot guard)

The StateBus / FunctionExecutor traits are kept exactly as they were
in lib.rs — they exist purely as test seams so unit tests can swap in
InMemoryStateBus / FakeExecutor. No new abstractions added.

Public items stay re-exported from the crate root.
Pure move, no behavior change. 141 approval-gate tests pass;
all 8 downstream worker crates pass (turn-orchestrator's pre-existing
dual_write flake notwithstanding).

* refactor(approval-gate): extract intercept.rs from lib.rs

Fourth step of breaking lib.rs into focused modules. This commit
lifts the intercept decision flow into its own module — the three
layers that decide what the gate does with an incoming function
call: policy rules, per-function interceptor rules, and classifier
replies, plus the async handle_intercept that writes the pending
record.

Moved (lib.rs -> intercept.rs):
- InterceptAction enum + decide_intercept_action()
- PolicyOutcome enum + apply_policy_rules()
- ClassifierDecision enum + interpret_classifier_reply()
- handle_intercept()

handle_intercept stays public (re-exported from the crate root).
The pub(crate) helpers stay pub(crate) and are imported back into
lib.rs for the subscriber closure in register().

Pure move, no behavior change. 141 approval-gate tests pass;
turn-orchestrator + harness + shell verified green.

* refactor(approval-gate): extract resolve.rs from lib.rs

Fifth step of breaking lib.rs into focused modules. This commit
lifts the approval::resolve flow into its own module.

Moved (lib.rs -> resolve.rs):
- handle_lookup_record()
- handle_resolve()
- cascade_allow_for_session() (the always=true sweep)
- approve_and_execute() (shared by user-driven allow + cascade)

handle_resolve and handle_lookup_record stay public via the crate
root re-exports. approve_and_execute and cascade_allow_for_session
stay private to the resolve module.

Pure move, no behavior change. 141 approval-gate tests pass.

* refactor(approval-gate): extract delivery.rs from lib.rs

Sixth step of breaking lib.rs into focused modules. This commit
lifts the delivery-tracking RPCs into their own module.

Moved (lib.rs -> delivery.rs):
- handle_list_pending()
- handle_list_undelivered()
- handle_ack_delivered()
- handle_consume_undelivered()
- handle_flush_delivered()
- handle_sweep_session()
- LIST_UNDELIVERED_DEFAULT_LIMIT

All public items stay re-exported from the crate root.
Pure move, no behavior change. 141 approval-gate tests pass.

* refactor(approval-gate): extract sweeper.rs from lib.rs

Seventh step of breaking lib.rs into focused modules. This commit
lifts the periodic timeout sweeper and the small iii-stream helpers
it owns into their own module.

Moved (lib.rs -> sweeper.rs):
- spawn_timeout_sweeper()
- timeout_resolved_event()
- write_event()
- write_hook_reply()
- uuid_like()

All five stay pub(crate) — only the register() wiring in lib.rs
and the resolve flow need them. Pure move, no behavior change.
141 approval-gate tests pass.

* refactor(approval-gate): extract register.rs from lib.rs

Final step of breaking lib.rs into focused modules. This commit
lifts the iii function/trigger wiring — the heart of the worker
startup path — into its own module.

Moved (lib.rs -> register.rs):
- register() (~420 lines: the subscriber closure + 8 function
  registrations + trigger registration + sweeper spawn)
- Refs struct (the handle bag returned to the binary)
- FN_RESOLVE / FN_LIST_PENDING / FN_LIST_UNDELIVERED /
  FN_CONSUME_UNDELIVERED / FN_ACK_DELIVERED / FN_FLUSH_DELIVERED /
  FN_SWEEP_SESSION / FN_LOOKUP_RECORD constants
- STATE_SCOPE constant

lib.rs is now ~50 lines of module declarations + re-exports +
~2580 lines of inline tests. The test mod uses super::* plus
local serde_json and std::sync imports.

All public items stay re-exported from the crate root so the
binary's `use approval_gate::register;` keeps working unchanged.
Pure move, no behavior change.

141 approval-gate tests pass; turn-orchestrator + harness + shell
verified green (turn-orchestrator's pre-existing dual_write flake
notwithstanding).

* refactor(approval-gate): move pub(crate)-using tests inline

First step of relocating the 2580-line inline test block from
lib.rs into more focused homes. This commit moves the 18 tests
that touch pub(crate) helpers — ones that can't live in tests/*.rs
because integration tests can only see pub items — into the
#[cfg(test)] mod tests block of whichever production module owns
the helper they exercise.

intercept.rs gained:
- interpret_classifier_reply_reads_decision_tags
- decide_intercept_action_* (5 tests)
- apply_policy_rules_* (5 tests)

state.rs gained:
- merge_from_approval_* (4 tests)
- rule_for_returns_matching_rule / rule_for_returns_none_when_absent

sweeper.rs gained:
- timeout_resolved_event_shape

lib.rs loses those 18 tests. 141 approval-gate tests pass (no
duplicate counts — each test runs once). turn-orchestrator + harness
+ shell verified green.

Public-API tests (the remaining ~96 in lib.rs) move to tests/*.rs
in a follow-up commit so they can share fakes via a common module.

* refactor(approval-gate): move public-API tests to tests/*.rs

Final test split. The 86 remaining inline tests in lib.rs that
exercise only the crate's `pub` surface now live in tests/*.rs,
organized by the area they cover. The state-machine proptest gets
its own file. lib.rs drops from 2369 to 52 lines (just module
declarations + re-exports).

New tests/ layout:
- common/mod.rs    — shared fakes: FakeExecutor, InMemoryStateBus,
                     FailingStateBus, sample_call(), empty_policy_rules()
- lifecycle.rs     — 19 tests: build_pending_record, transition_record,
                     maybe_flip_timed_out, collect_timed_out_for_sweep,
                     is_terminal_status, pending_key
- wire.rs          — 10 tests: extract_call, block_reply_for,
                     requires_approval
- intercept.rs     — 9 tests: handle_intercept (replays, fail-closed,
                     session_id stamping, force_pending)
- resolve.rs       — 19 tests: handle_resolve, cascade-on-`always`,
                     handle_lookup_record
- delivery.rs      — 25 tests: list_pending / list_undelivered /
                     ack_delivered / consume_undelivered /
                     flush_delivered / sweep_session
- misc.rs          — 4 tests: FN_* constants, unverified_marker_targets,
                     FakeExecutor smoke test
- state_machine.rs — the proptest with the four lifecycle invariants

Each tests/*.rs uses `mod common;` for shared fakes — only one source
of truth for in-memory bus/executor stand-ins.

141 approval-gate tests pass across 13 test binaries (previously 6).
turn-orchestrator + harness + shell verified green.

* feat(approval-gate): new Record schema with Pending|InFlight|Done lifecycle

Pending → InFlight → Done(Outcome). InFlight is the intermediate
persist that closes the dup-exec race within a worker process — a
second approval::resolve arriving during the invoke await sees the
row is non-Pending and bails.

Outcome is a tagged enum (Executed/Failed/Denied/TimedOut). resolved_at
is stamped on the first non-Pending transition for deterministic
multi-row consume ordering.

lifecycle.rs is deleted; its only surviving helper
(flipped_to_timed_out_if_expired) is now a Record method.

* fix(approval-gate): keep lifecycle.rs as transitional compat shim

Task 1 of the simplification plan deletes lifecycle.rs entirely, but
the plan was a bit optimistic: dependent modules (intercept, resolve,
delivery, sweeper) still import its helpers (build_pending_record,
transition_record, maybe_flip_timed_out, is_terminal_status,
collect_timed_out_for_sweep) and the crate fails to compile without
them. Cargo can't run record.rs's unit tests if the library itself
won't link.

Restore lifecycle.rs as a transitional shim. Tasks 5/6/8/11 will
progressively migrate each callsite to the new Record API; whichever
of those tasks lands last deletes lifecycle.rs for real.

* feat(approval-gate): Denial::Policy carries rule_permission + rule_pattern

The classifier surface is being deleted in favor of the layered rules
engine; rename the Policy variant's detail fields to match the actual
data they'll carry once T5 lands. For now, classifier callsites
(intercept.rs, register.rs) map their old reason/fn locals into the
new field names — transient mapping, deleted in T5.

Test callsites in tests/wire.rs and tests/lifecycle.rs updated to use
representative rule_permission/rule_pattern values.

Note: IncomingCall.approval_required + requires_approval() field/method
are NOT removed in this commit (the plan's T2 calls for it but
intercept.rs's decide_intercept_action still depends on them). They
disappear in T5 when intercept.rs is rewritten end-to-end.

* feat(approval-gate): pattern_for(function_id, args) extractor

For shell::exec/shell::exec_bg, derive a pattern string from
{command, args} so layered rules can match argv-level patterns like
'git status*'. Other function ids default to '*'. Malformed/missing
args fall back to '*' (matches only wildcard rules).

8 new tests cover all branches: join+args, single-string command,
empty args list, missing command, malformed args, non-shell function
id, and the documented space-conflation case.

* feat(approval-gate): StateBus::delete primitive

Required by the new approval::consume RPC (T8), which returns Done
rows and deletes them in the same call. IiiStateBus impl maps to the
existing state::delete RPC (used elsewhere in the codebase, e.g.
auth-credentials). Test fakes (InMemoryStateBus, FailingStateBus) gain
matching impls — InMemory removes the key idempotently, Failing errors
to match its other primitives.

* feat(approval-gate): verdict-driven handle_intercept

The classifier surface and per-function InterceptorRule decision flow
are gone. handle_intercept now reads the verdict via the new lib-level
verdict_for(function_id, args, ruleset) helper:

  Verdict::Allow → {block:false}, no state write
  Verdict::Deny  → {block:true, denial:Policy{rule_permission, rule_pattern}}
  Verdict::Ask   → write Pending row + reply {block:true, status:pending}

Replay defense extended for the new 3-state schema:
  Pending  → replay:in_flight, status:pending
  InFlight → replay:in_flight, status:in_flight  (new)
  Done     → replay:already_resolved, status:done

intercept.rs drops InterceptAction, decide_intercept_action,
ClassifierDecision, interpret_classifier_reply, PolicyOutcome,
apply_policy_rules. register.rs's subscriber closure collapses the
old policy_outcome+decide_intercept+classifier dance into one
handle_intercept call. resolve.rs's cascade loop reuses verdict_for
instead of the deleted apply_policy_rules.

State-write failure still fails closed with Denial::StateError.
7 new tests cover all four verdict branches + the three replay
states + fail-closed behavior.

Note: InterceptorRule config struct and intercept_rules variable
are still in place for T12 to fully remove.

* feat(approval-gate): three-phase resolve + cascade exact-pattern (T6+T7)

handle_resolve rewritten to typed Record + three-phase allow path:
  1. write InFlight (closes the dup-exec race within a worker process)
  2. iii.trigger(function_id, args) and await
  3. write Done(Executed{result}) or Done(Failed{error})

Deny is a single Pending → Done(Denied{denial}) write — no invoke,
no InFlight needed. WireDecision::Deny with no denial in payload
defaults to UserRejected. UserCorrected{feedback} round-trips.

Dup-exec guard: handle_resolve refuses non-Pending rows with
  in_flight (concurrent resolve mid-invoke) or
  already_resolved (terminal).

Cascade allow on always:true (T7) now pushes a runtime Allow rule with
the originator's EXACT pattern via rules::pattern_for(args) — not the
prior blanket pattern:'*'. 'Always allow git status' no longer
auto-allows rm -rf / via the same shell::exec function id.
Lock-ordering invariant pinned in module docs: never hold the ruleset
guard across .await.

* feat(approval-gate): approval::consume + strip delivery dead RPCs + delete sweeper.rs/lifecycle.rs (T8+T9+T11)

Three behavioral changes folded into one commit because they share files:

T8 (new approval::consume RPC):
  Three-phase drain — gather Done candidates, sort by resolved_at + cap,
  delete-and-return. Defensive session_id filter. Lazy timeout flip on
  read. Default cap CONSUME_DEFAULT_LIMIT=50 bounds the response size
  against MB-scale stdout payloads.

T9 (delivery.rs strip):
  handle_list_undelivered, handle_consume_undelivered, handle_ack_delivered,
  handle_flush_delivered all deleted. LIST_UNDELIVERED_DEFAULT_LIMIT
  constant gone. handle_list_pending rewritten on the typed Record API
  with lazy timeout flip on read. handle_sweep_session rewritten on new
  schema — flips Pending+InFlight rows to Done(TimedOut).

T11 (sweeper.rs deleted):
  Background polling task gone. Timeouts now flip lazily on read in
  handle_resolve / handle_consume / handle_list_pending. UI handles
  expires_at countdown client-side; the LLM learns of timeouts on the
  next consume.
  lifecycle.rs transient shim also deleted (no more callers).
  Stream helpers (uuid_like, write_event, write_hook_reply) move into
  register.rs as their only consumer; spawn_timeout_sweeper +
  timeout_resolved_event deleted with the rest of sweeper.rs.

register.rs surgery:
  Refs struct loses 5 dead FunctionRef fields and the sweeper JoinHandle.
  RPC registrations for list_undelivered, consume_undelivered,
  ack_delivered, flush_delivered all gone. New FN_CONSUME registration
  added. Classifier-alias warning check trimmed to live function ids.

15 new tests cover handle_consume (7) + handle_sweep_session (1) +
handle_list_pending lazy flip (1) + the pre-existing edits. 65 lib
tests pass total.

* feat(approval-gate): strip __from_approval marker plumbing (T10)

IiiFunctionExecutor::invoke forwards function_id+args directly to
iii.trigger — no marker stamp.

Deleted from state.rs:
- merge_from_approval_marker_if_needed (and its 4 unit tests)
- unverified_marker_targets (and the 2 rule_for tests)
- rule_for (only marker-related callers)

Deleted from register.rs:
- IiiFunctionExecutor's rules: Arc<Vec<InterceptorRule>> field
- Boot-time unverified-marker check (refused-to-start guard)

Per the spec's Threat Model section: bus access ≡ shell access in the
new model. The harness's bus-level access control is the perimeter;
defense-in-depth via per-target marker verification is out of scope
for v1. Shell-side marker verification is deleted in T13.

T12 will fully retire InterceptorRule + the classifier-alias warning
loop in register.rs.

* feat(approval-gate): strip config to topic+scope+timeout+rules (T12)

config.rs: WorkerConfig now has just {topic, approval_state_scope,
default_timeout_ms, rules}. interceptors + sweeper_interval_ms fields
gone. InterceptorRule struct kept as a no-op shim with minimal fields
so the classifier-alias warning loop in register.rs still compiles —
the loop is fed an empty Vec so the body never runs.

iii.worker.yaml: replaced interceptor config with a curated default
ruleset. Read-only fs/git auto-allowed; shell::exec/exec_bg ask;
catch-all asks. Operators stack their own rules on top (last-match
wins).

register.rs cfg.interceptors → cfg.rules wiring confirmed: policy_rules
is seeded from cfg.rules at startup and remains mutable via cascade.

Note: InterceptorRule shim + classifier-alias warning loop in register
will be deleted in a follow-up pass when the dependent code is fully
cleaned up. They're cosmetic at this point — neither has functional
effect.

* feat(shell): strip classify_argv / allowlist / __from_approval marker (T13)

Shell becomes a plain executor. All policy decisions live in
approval-gate's rules layer (see the layered ruleset shipped in
approval-gate/iii.worker.yaml).

Deleted:
- shell/src/functions/approval_bypass.rs (marker validation module)
- shell/src/functions/classify.rs (classifier handler)
- shell/src/arity.rs (arity-aware allowlist matcher)
- ApprovalMarker + ClassifyArgvRequest types
- ExecRequest.from_approval + ExecBgRequest.from_approval fields
- ShellConfig.allowlist + allow_any + denylist_patterns +
  compiled_denylist fields and their methods (compile_denylist,
  denylist_hit_reason, allowlist_contains, is_command_allowed)
- shell::classify_argv function registration
- shell::classify_argv manifest entry
- All allowlist/denylist unit tests in shell/src/config.rs
- One test in shell/tests/function_handlers.rs that asserted shell
  rejected unlisted commands

Kept (independent surface):
- fs::host_root jail + fs.denylist_paths (filesystem path-based denials
  for shell::fs::* tools; distinct from the exec-policy denylist)
- The fs jail validate_fs_jail boot check

Threat model implication, per the spec: bus access ≡ shell access.
The harness's bus-level access control is the perimeter; any worker on
the bus can call shell::exec with arbitrary args. Defense-in-depth via
shell-side allowlists is gone for v1.

133 shell lib tests + 328 shell integration tests pass.

* feat(turn-orchestrator): switch stitch to approval::consume (T14)

consume_approval_stitch now calls approval::consume (single RPC,
returns + deletes in one shot). Payload is { session_id } only; no
turn_id, no limit (gate enforces a default cap of 50, surfacing
overflow via the 'omitted' counter as before).

stitch_entries rewritten for the new Record wire shape: reads nested
outcome: { kind, detail } instead of the old top-level status +
result/error/denial. Outcome kinds: executed, failed, denied,
timed_out. function_call_id is the canonical key (call_id retained
as a legacy fallback for old rows on disk during the upgrade).

render_denial_lines's policy branch reads rule_permission/rule_pattern
(matching the renamed Denial::Policy detail shape from T2). The
'policy denied by classifier_fn: classifier_reason' wording becomes
'policy denied by rule <perm> : <pat>'.

omission_summary_message reworded: 'approval::flush_delivered' is gone;
the natural recovery path is the next-turn consume, which is what the
new message points at.

131 lib tests pass (was 122 + 9 failing from old-shape assertions).
Pre-existing integration test failure (tests/run_stop.rs imports
run_stop module that isn't declared in lib.rs) is unrelated to this
refactor.

* test(approval-gate): E2E lifecycle tests + delete dead integration tests (T17)

Delete 8 integration test files whose assertions targeted the deleted
surface (FN_LIST_UNDELIVERED, FN_ACK_DELIVERED, transition_record,
InterceptAction, marker plumbing, ...):
- tests/approval_lifecycle.rs (engine-backed, used deleted RPCs)
- tests/delivery.rs (old delivery RPC surface)
- tests/integration.rs (old subscriber wiring)
- tests/intercept.rs (decide_intercept_action, classifier path)
- tests/lifecycle.rs (record-helper tests on deleted schema)
- tests/misc.rs (mixed assertions on old shape)
- tests/resolve.rs (old transition_record / Approved status)
- tests/state_machine.rs (proptest dep + old schema)

The src/* unit tests added across T1-T12 cover the equivalent surface
for the new wire shape (Pending|InFlight|Done, Outcome enum,
Denial::Policy {rule_permission, rule_pattern}, etc.).

Add tests/lifecycle.rs with FIVE E2E lifecycle tests against the
in-memory StateBus + FakeExecutor — the integration safety net for
the whole refactor:
1. allow_path_end_to_end — pending → InFlight → executed → consume drains
2. deny_path_with_user_corrected_feedback_end_to_end — feedback round-trips
3. timeout_path_lazy_flips_on_consume_end_to_end — past expires_at flips on read
4. cascade_path_end_to_end — two pending rows; allow+always pushes exact-pattern
   rule; second auto-resolves; consume drains both; pinned: pushed pattern is
   the originator's argv ('echo go'), NOT blanket '*'
5. allow_rule_short_circuits_with_no_state_write — bonus: Verdict::Allow
   doesn't touch state

72 approval-gate tests pass across 6 suites (lib + wire + manifest +
lifecycle + 2 from common helpers). Net deletion: ~2500 lines of
old-shape integration tests gone, ~250 lines of new E2E in their place.

* feat(harness/web): Allow+Always, Deny feedback, expires_at countdown (T15)

ApprovalRow.tsx grows three real-feature additions matching the
approval-gate refactor's new wire surface:

1. Allow+Always button → sends {decision:'allow', always:true}. The
   gate cascades by pushing a runtime Allow rule with the originator's
   exact pattern and sweeping the session's other Pending rows
   (auto-resolving any that match). The response carries cascaded:N
   when the sweep was non-empty; not surfaced in the UI yet.

2. Optional Deny feedback textarea → when the operator types a
   correction message before clicking deny, the payload becomes
   {decision:'deny', denial:{kind:'user_corrected', detail:{feedback}}}.
   That feedback flows verbatim to the LLM via the stitched message
   on the next consume — the high-value path because the model gets
   actionable correction, not just 'denied'.

3. Client-side expires_at countdown → the modal shows MM:SS remaining
   and disables actions on expiry. No server-emitted timed_out frame
   anymore (the gate's sweeper is gone); the LLM learns of timeout
   via the next-turn approval::consume that lazy-flips the row.

Each pending row is split into its own ApprovalCard component so the
countdown hook respects rules-of-hooks (can't sit inside a .map
callback). ResolvePayload is constructed inline so the bridge's
Record<string, unknown> type accepts it without an explicit cast at
the interface boundary.

CSS classes for the new states (approval-allow-always,
approval-countdown, approval-feedback, approval-expired-note) inherit
default styling for now; a follow-up styling pass can polish.

* docs(harness/fanout): note T16 stream-rewrite as deferred follow-up

The approval-gate refactor's T16 called for switching the fanout from
polling approval::list_pending to subscribing to agent::events for
approval_requested / approval_resolved frames. The polling
implementation continues to work correctly against the new wire shape
— diff_approvals reads function_call_id (now top-level on Record) and
list_pending was kept by approval-gate's T8/T11 — so this is a perf
cleanup, not a correctness fix.

Comment the constant accordingly; defer the actual stream subscription
to a follow-up PR. No code change beyond the doc.

* fix(shell/e2e): drop allowlist/denylist tests + configs

Shell stopped enforcing command-level policy; the e2e suite still
referenced the removed allowlist/denylist_patterns config fields and
pinned behavior that no longer exists. Strip those expectations and
update the docs that described them.

- config.yaml / config-jailed.yaml: remove allowlist + denylist_patterns
  (would fail YAML parse against the new ShellConfig schema)
- cases-safety.ts: drop 3 allowlist/denylist assertions; keep one
  positive 'absolute-path command runs' case
- cases-edge.ts: 'nonexistent unlisted command' rewritten to assert
  the OS spawn error (ENOENT), not an allowlist message
- cases-exec-break.ts: drop 'unlisted command with extra fields'
- cases-exec-sandbox.ts: drop 'sandbox allowlist still enforced'
- cases-vuln-repro.ts: drop S-H3 case (advisory denylist) — the
  finding's whole point was that command policy belongs upstream
- ARCHITECTURE.md / e2e README: reflect the new boundary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant