docs: update CLAUDE.md with test count, CI/CD, and security invariants by escapeboy · Pull Request #1 · escapeboy/boruna

escapeboy · 2026-02-21T10:49:48Z

Summary

Updated test count from 440+ to 501+ to reflect the actual test suite size
Added cargo clippy and cargo fmt commands to the Build & Test section
Added Repository section with GitHub URL and CI/CD job descriptions
Added path traversal prevention to Critical Invariants section

Test plan

Verify CLAUDE.md renders correctly on GitHub
Confirm all referenced commands work (cargo test, cargo clippy, cargo fmt)
CI passes on this branch

🤖 Generated with Claude Code

…ity invariants - Update test count from 440+ to 501+ to reflect actual suite size - Add clippy and fmt commands to build section - Add Repository section with GitHub URL and CI/CD description - Add path traversal prevention to Critical Invariants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…al-gate UX Adds three new CLI subcommands closing the operator UX deferred from 0.3-S2b. After a workflow pauses at an approval gate, operators can discover paused runs (`boruna workflow list`), record a decision (`boruna workflow approve <run-id> <step-id>` or `reject ... --reason`), and resume the run, which honors the sentinel and either advances past the gate (synthetic empty-record output) or halts as Failed. Persistence additions: `get_run_metadata`, `update_run_metadata`, `compare_and_swap_metadata` (atomicity primitive for the approve flow's read-validate-write cycle), `list_runs`. Plus 9 new typed error variants on `WorkflowRunError` covering wrong-state, wrong-kind, already-decided, not-resumable, etc. — project-conventions §1. Runner changes: `PersistedRunMetadata` carries an `approvals` map (BTreeMap<step_id, ApprovalDecision>); each decision records who decided when (operational only — does not feed any audit hash). Resume's sentinel-handling pass: AwaitingApproval steps with an Approved sentinel get a Completed checkpoint with a synthetic empty-record output; Rejected sentinels halt the run as Failed. Defense-in-depth re-validates StepKind::ApprovalGate at resume time. Adversarial review found 7 HIGH issues across 2 reviewers; consensus on the biggest one (TOCTOU race in record_approval_decision). All 5 actionable fixes applied: - Race in record_approval_decision (consensus #1+H1): prior 3-tx read+validate+write let two concurrent operators both pass the in-memory prior-decision check and silently overwrite each other. Now uses a CAS retry loop via the new compare_and_swap_metadata primitive. 4-thread regression test asserts exactly 1 ok + 3 already-decided. - halt_with_failed_step overwrite (#2): rejected sentinel was overwriting an earlier independent step failure as the halt cause. Now uses get_or_insert to preserve the FIRST failure. - Silent no-op on non-awaiting sentinel (#3): now emits a warning so operators see when their approval doesn't apply. - Synthetic output hash (H2): now locked by a regression test against a golden hex computed externally (`{"Map":{}}` → f4242fc8...). - Defense-in-depth StepKind check (H3): refuses to apply a sentinel to a non-gate step even if validation slipped past. Documented but deferred: H4 (concurrent resume torn-write on store_output) — the documented assumption is single-writer-by-process; revisit in a future hardening sprint with flock or atomic-rename. Tests: 682 workspace tests pass (was 659). Persistence tests 41 (was 35). Runner tests 33 (was 16) — +13 for approval-gate UX, +4 for review-driven regressions. Smoke test on customer_support_triage end-to-end (run → list → approve → resume → list completed) verified. Anchored docs: docs/{design,architecture,test-plan}-0.3-s2c-approval-gate-completion.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wave-based scheduler that fans out steps with no inter-dependencies across `std::thread::spawn`'d workers. Operators opt in via `--concurrency <N>`; default `1` = sequential (preserves prior behavior). Persistent path only; ephemeral `run` stays single- threaded. No tokio dependency. End-to-end determinism verified: per-step `output_hash` is bit-identical across concurrency levels for successful runs. Document_processing fan-out workflow (5 steps, ingest → {classify, extract, summarize} → merge) confirmed identical hashes at concurrency=1 vs concurrency=4. Implementation: - `WorkflowValidator::topological_levels(def)` partitions the DAG into Vec<Vec<String>> waves. Within each wave, steps are sorted alphabetically for deterministic dispatch order. - `WorkflowRunner::execute_steps_concurrent` processes one wave at a time; gates inline (sequential, pause-the-run); source steps fan out chunked by max_concurrency. - `compile_and_run_step` extracted helper: pure compile+run path returning a Value. Workers call this; coordinator owns all DataStore and SQLite mutation. Adversarial review found 4 HIGH issues; 3 fixed in-commit, 1 documented: - #1: panic in chunk detached sibling workers — `?` inside join loop dropped pending JoinHandles. Now collects all join results into a Vec before processing; guarantees no detached threads on early-exit paths. - #2: input failure mid-chunk left earlier siblings Running on disk forever — interleaved validation+marking. Now two-pass: pass 1 validates all chunk inputs (no side effects); pass 2 marks Running atomically and dispatches. Regression test asserts no Running checkpoint survives a halted run. - #3: panic handler only caught `&'static str` payloads (missing the common `panic!("...{}...", var)` shape) and lost step_id. Now: tries `String` first, carries step_id alongside each JoinHandle, persists a Failed checkpoint with the panic message. - #4 (documented): failed runs at concurrency > 1 may produce more Completed step_results than a sequential run (siblings that started before the failure was detected complete normally). This is honest reporting of what executed; cross-concurrency audit replay is NOT a contract. Successful runs ARE concurrency- invariant. Tests: 703 workspace tests pass (was 694 after 0.3-S3). +9 net including: - topological_levels (4 tests in validator) - concurrency_n_produces_identical_output_hashes_to_concurrency_1 (the headline determinism test) - concurrent_run_persists_all_step_checkpoints - concurrent_run_with_failure_halts_and_other_in_flight_complete - concurrent_resume_honors_already_completed - concurrent_input_failure_does_not_leave_siblings_running (review-driven #2 regression) Anchored doc: docs/design-0.3-s4-concurrent-execution.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the prior primitive "retry once regardless of max_attempts" with a proper RetryPolicy honoring `max_attempts` up to 100 attempts with `100ms × 2^N` (capped at 5s) backoff between. A shared `retry_with_backoff` helper is used by both sequential and concurrent execution paths so retry semantics don't drift. Final-attempt failure surfaces as "failed after N attempts: <reason>" in error_msg for operator triage. Adversarial review found 2 HIGH issues, both fixed: - #1: eprintln retry log polluted unit-test stderr (and any embedder capturing stderr). Now gated under cfg(not(test)) so production embedders see the log line and tests are silent. - #2: cfg(test) sleep skip is correctly scoped, but the gap surfaced by the reviewer was that NO test verified real-timing backoff. Added orchestrator/tests/retry_timing.rs which runs in a build where cfg(test) is NOT set on the orchestrator lib, exercising real sleeps. Asserts elapsed >= 250ms for a 3-attempt retry — catches future regressions that accidentally remove the sleep. Tests: 712 workspace tests pass (was 703 after 0.3-S4). +9 net: - 8 unit tests in `runner::tests::retry::` (backoff curve, success on first attempt, success after failures, exhaustion-and-wrap, on_transient=false, max_attempts<=1, no policy, end-to-end with bad step compile error) - 1 integration test in tests/retry_timing.rs (real-sleep contract) Anchored doc: docs/design-0.3-s5-retry-policies.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the residual gap flagged in the 0.3-S15 review. The concurrent execution path now pauses ALL pause-steps in the same DAG level in a single pass — previously only the first was processed and remaining pauses were silently deferred to subsequent resumes. Use case: "wait for payment AND fraud-check" webhook fan-in. Multiple external_trigger or approval_gate steps depending on a shared upstream now pause in parallel, advancing independently as each decision/event arrives. The downstream step (which depends on all of them) runs exactly once after the last pause clears. Wiring: - `execute_steps_concurrent` wave loop iterates `pauses` instead of taking `pauses.first()`. Each pause persists its own checkpoint and (for ExternalTrigger) mints its own token via the existing reentrant acquire_trigger_token. - New `persist_one_pause` helper isolates per-pause logic. - Resume sentinel pass already iterates all approvals + triggers, so it picks up multiple advances per resume. Review-driven fix (1 HIGH bug from adversarial review): - **Partial-pause failure must not terminally-fail the run.** Earlier draft propagated the first per-pause error via `?` — if pause #1 committed and pause #2's acquire_trigger_token failed (transient urandom error, CAS retry exhaustion), the run became terminally Failed, stranding pause #1's token with no recovery path (resume short-circuits on terminal status; record_external_trigger refuses with RunNotResumable). Fixed by isolating per-pause errors inside the loop: log a warning, continue to next pause, mark the run Paused on what DID commit. The wave loop is idempotent (acquire_trigger_token reuses existing tokens, upsert is upsert), so the next resume retries the failed pauses cleanly. Sequential path asymmetry: `--concurrency 1` (`execute_steps`) is unchanged. It processes one step at a time and serializes parallel pauses across multiple resumes. Operators expecting AND-fan-in webhook patterns must use `--concurrency 2+`. Documented in CHANGELOG. Tests: +5 in `tests::multi_pause_per_wave` covering 2-trigger parallel, partial trigger fire, full trigger fire, mixed approval+trigger, AND a partial-pause-failure recovery test that simulates the failure on-disk shape via direct SQL and verifies resume re-pauses cleanly with pause #1's token unchanged. Workspace: orchestrator 202 → 207 tests. Clippy + fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

escapeboy merged commit fafe492 into master Feb 21, 2026
3 checks passed

escapeboy deleted the docs/update-claude-md-post-security-fixes branch February 21, 2026 10:58

escapeboy mentioned this pull request May 17, 2026

feat(cli): agent-native inspection surfaces #68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update CLAUDE.md with test count, CI/CD, and security invariants#1

docs: update CLAUDE.md with test count, CI/CD, and security invariants#1
escapeboy merged 1 commit into
masterfrom
docs/update-claude-md-post-security-fixes

escapeboy commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

escapeboy commented Feb 21, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant