docs: update CLAUDE.md with test count, CI/CD, and security invariants#1
Merged
Merged
Conversation
…ity invariants - Update test count from 440+ to 501+ to reflect actual suite size - Add clippy and fmt commands to build section - Add Repository section with GitHub URL and CI/CD description - Add path traversal prevention to Critical Invariants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
escapeboy
added a commit
that referenced
this pull request
Apr 25, 2026
…al-gate UX Adds three new CLI subcommands closing the operator UX deferred from 0.3-S2b. After a workflow pauses at an approval gate, operators can discover paused runs (`boruna workflow list`), record a decision (`boruna workflow approve <run-id> <step-id>` or `reject ... --reason`), and resume the run, which honors the sentinel and either advances past the gate (synthetic empty-record output) or halts as Failed. Persistence additions: `get_run_metadata`, `update_run_metadata`, `compare_and_swap_metadata` (atomicity primitive for the approve flow's read-validate-write cycle), `list_runs`. Plus 9 new typed error variants on `WorkflowRunError` covering wrong-state, wrong-kind, already-decided, not-resumable, etc. — project-conventions §1. Runner changes: `PersistedRunMetadata` carries an `approvals` map (BTreeMap<step_id, ApprovalDecision>); each decision records who decided when (operational only — does not feed any audit hash). Resume's sentinel-handling pass: AwaitingApproval steps with an Approved sentinel get a Completed checkpoint with a synthetic empty-record output; Rejected sentinels halt the run as Failed. Defense-in-depth re-validates StepKind::ApprovalGate at resume time. Adversarial review found 7 HIGH issues across 2 reviewers; consensus on the biggest one (TOCTOU race in record_approval_decision). All 5 actionable fixes applied: - Race in record_approval_decision (consensus #1+H1): prior 3-tx read+validate+write let two concurrent operators both pass the in-memory prior-decision check and silently overwrite each other. Now uses a CAS retry loop via the new compare_and_swap_metadata primitive. 4-thread regression test asserts exactly 1 ok + 3 already-decided. - halt_with_failed_step overwrite (#2): rejected sentinel was overwriting an earlier independent step failure as the halt cause. Now uses get_or_insert to preserve the FIRST failure. - Silent no-op on non-awaiting sentinel (#3): now emits a warning so operators see when their approval doesn't apply. - Synthetic output hash (H2): now locked by a regression test against a golden hex computed externally (`{"Map":{}}` → f4242fc8...). - Defense-in-depth StepKind check (H3): refuses to apply a sentinel to a non-gate step even if validation slipped past. Documented but deferred: H4 (concurrent resume torn-write on store_output) — the documented assumption is single-writer-by-process; revisit in a future hardening sprint with flock or atomic-rename. Tests: 682 workspace tests pass (was 659). Persistence tests 41 (was 35). Runner tests 33 (was 16) — +13 for approval-gate UX, +4 for review-driven regressions. Smoke test on customer_support_triage end-to-end (run → list → approve → resume → list completed) verified. Anchored docs: docs/{design,architecture,test-plan}-0.3-s2c-approval-gate-completion.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
escapeboy
added a commit
that referenced
this pull request
Apr 25, 2026
Wave-based scheduler that fans out steps with no inter-dependencies
across `std::thread::spawn`'d workers. Operators opt in via
`--concurrency <N>`; default `1` = sequential (preserves prior
behavior). Persistent path only; ephemeral `run` stays single-
threaded. No tokio dependency.
End-to-end determinism verified: per-step `output_hash` is
bit-identical across concurrency levels for successful runs.
Document_processing fan-out workflow (5 steps, ingest →
{classify, extract, summarize} → merge) confirmed identical hashes
at concurrency=1 vs concurrency=4.
Implementation:
- `WorkflowValidator::topological_levels(def)` partitions the DAG
into Vec<Vec<String>> waves. Within each wave, steps are sorted
alphabetically for deterministic dispatch order.
- `WorkflowRunner::execute_steps_concurrent` processes one wave at
a time; gates inline (sequential, pause-the-run); source steps
fan out chunked by max_concurrency.
- `compile_and_run_step` extracted helper: pure compile+run path
returning a Value. Workers call this; coordinator owns all
DataStore and SQLite mutation.
Adversarial review found 4 HIGH issues; 3 fixed in-commit, 1
documented:
- #1: panic in chunk detached sibling workers — `?` inside join
loop dropped pending JoinHandles. Now collects all join results
into a Vec before processing; guarantees no detached threads on
early-exit paths.
- #2: input failure mid-chunk left earlier siblings Running on
disk forever — interleaved validation+marking. Now two-pass:
pass 1 validates all chunk inputs (no side effects); pass 2
marks Running atomically and dispatches. Regression test
asserts no Running checkpoint survives a halted run.
- #3: panic handler only caught `&'static str` payloads (missing
the common `panic!("...{}...", var)` shape) and lost step_id.
Now: tries `String` first, carries step_id alongside each
JoinHandle, persists a Failed checkpoint with the panic message.
- #4 (documented): failed runs at concurrency > 1 may produce
more Completed step_results than a sequential run (siblings
that started before the failure was detected complete normally).
This is honest reporting of what executed; cross-concurrency
audit replay is NOT a contract. Successful runs ARE concurrency-
invariant.
Tests: 703 workspace tests pass (was 694 after 0.3-S3). +9 net
including:
- topological_levels (4 tests in validator)
- concurrency_n_produces_identical_output_hashes_to_concurrency_1
(the headline determinism test)
- concurrent_run_persists_all_step_checkpoints
- concurrent_run_with_failure_halts_and_other_in_flight_complete
- concurrent_resume_honors_already_completed
- concurrent_input_failure_does_not_leave_siblings_running
(review-driven #2 regression)
Anchored doc: docs/design-0.3-s4-concurrent-execution.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
escapeboy
added a commit
that referenced
this pull request
Apr 25, 2026
Replaces the prior primitive "retry once regardless of max_attempts" with a proper RetryPolicy honoring `max_attempts` up to 100 attempts with `100ms × 2^N` (capped at 5s) backoff between. A shared `retry_with_backoff` helper is used by both sequential and concurrent execution paths so retry semantics don't drift. Final-attempt failure surfaces as "failed after N attempts: <reason>" in error_msg for operator triage. Adversarial review found 2 HIGH issues, both fixed: - #1: eprintln retry log polluted unit-test stderr (and any embedder capturing stderr). Now gated under cfg(not(test)) so production embedders see the log line and tests are silent. - #2: cfg(test) sleep skip is correctly scoped, but the gap surfaced by the reviewer was that NO test verified real-timing backoff. Added orchestrator/tests/retry_timing.rs which runs in a build where cfg(test) is NOT set on the orchestrator lib, exercising real sleeps. Asserts elapsed >= 250ms for a 3-attempt retry — catches future regressions that accidentally remove the sleep. Tests: 712 workspace tests pass (was 703 after 0.3-S4). +9 net: - 8 unit tests in `runner::tests::retry::` (backoff curve, success on first attempt, success after failures, exhaustion-and-wrap, on_transient=false, max_attempts<=1, no policy, end-to-end with bad step compile error) - 1 integration test in tests/retry_timing.rs (real-sleep contract) Anchored doc: docs/design-0.3-s5-retry-policies.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
escapeboy
added a commit
that referenced
this pull request
Apr 26, 2026
Closes the residual gap flagged in the 0.3-S15 review. The concurrent execution path now pauses ALL pause-steps in the same DAG level in a single pass — previously only the first was processed and remaining pauses were silently deferred to subsequent resumes. Use case: "wait for payment AND fraud-check" webhook fan-in. Multiple external_trigger or approval_gate steps depending on a shared upstream now pause in parallel, advancing independently as each decision/event arrives. The downstream step (which depends on all of them) runs exactly once after the last pause clears. Wiring: - `execute_steps_concurrent` wave loop iterates `pauses` instead of taking `pauses.first()`. Each pause persists its own checkpoint and (for ExternalTrigger) mints its own token via the existing reentrant acquire_trigger_token. - New `persist_one_pause` helper isolates per-pause logic. - Resume sentinel pass already iterates all approvals + triggers, so it picks up multiple advances per resume. Review-driven fix (1 HIGH bug from adversarial review): - **Partial-pause failure must not terminally-fail the run.** Earlier draft propagated the first per-pause error via `?` — if pause #1 committed and pause #2's acquire_trigger_token failed (transient urandom error, CAS retry exhaustion), the run became terminally Failed, stranding pause #1's token with no recovery path (resume short-circuits on terminal status; record_external_trigger refuses with RunNotResumable). Fixed by isolating per-pause errors inside the loop: log a warning, continue to next pause, mark the run Paused on what DID commit. The wave loop is idempotent (acquire_trigger_token reuses existing tokens, upsert is upsert), so the next resume retries the failed pauses cleanly. Sequential path asymmetry: `--concurrency 1` (`execute_steps`) is unchanged. It processes one step at a time and serializes parallel pauses across multiple resumes. Operators expecting AND-fan-in webhook patterns must use `--concurrency 2+`. Documented in CHANGELOG. Tests: +5 in `tests::multi_pause_per_wave` covering 2-trigger parallel, partial trigger fire, full trigger fire, mixed approval+trigger, AND a partial-pause-failure recovery test that simulates the failure on-disk shape via direct SQL and verifies resume re-pauses cleanly with pause #1's token unchanged. Workspace: orchestrator 202 → 207 tests. Clippy + fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cargo clippyandcargo fmtcommands to the Build & Test sectionTest plan
cargo test,cargo clippy,cargo fmt)🤖 Generated with Claude Code