Skip to content

docs: update CLAUDE.md with test count, CI/CD, and security invariants#1

Merged
escapeboy merged 1 commit into
masterfrom
docs/update-claude-md-post-security-fixes
Feb 21, 2026
Merged

docs: update CLAUDE.md with test count, CI/CD, and security invariants#1
escapeboy merged 1 commit into
masterfrom
docs/update-claude-md-post-security-fixes

Conversation

@escapeboy
Copy link
Copy Markdown
Owner

Summary

  • Updated test count from 440+ to 501+ to reflect the actual test suite size
  • Added cargo clippy and cargo fmt commands to the Build & Test section
  • Added Repository section with GitHub URL and CI/CD job descriptions
  • Added path traversal prevention to Critical Invariants section

Test plan

  • Verify CLAUDE.md renders correctly on GitHub
  • Confirm all referenced commands work (cargo test, cargo clippy, cargo fmt)
  • CI passes on this branch

🤖 Generated with Claude Code

…ity invariants

- Update test count from 440+ to 501+ to reflect actual suite size
- Add clippy and fmt commands to build section
- Add Repository section with GitHub URL and CI/CD description
- Add path traversal prevention to Critical Invariants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@escapeboy escapeboy merged commit fafe492 into master Feb 21, 2026
3 checks passed
@escapeboy escapeboy deleted the docs/update-claude-md-post-security-fixes branch February 21, 2026 10:58
escapeboy added a commit that referenced this pull request Apr 25, 2026
…al-gate UX

Adds three new CLI subcommands closing the operator UX deferred from
0.3-S2b. After a workflow pauses at an approval gate, operators can
discover paused runs (`boruna workflow list`), record a decision
(`boruna workflow approve <run-id> <step-id>` or `reject ... --reason`),
and resume the run, which honors the sentinel and either advances past
the gate (synthetic empty-record output) or halts as Failed.

Persistence additions: `get_run_metadata`, `update_run_metadata`,
`compare_and_swap_metadata` (atomicity primitive for the approve flow's
read-validate-write cycle), `list_runs`. Plus 9 new typed error variants
on `WorkflowRunError` covering wrong-state, wrong-kind, already-decided,
not-resumable, etc. — project-conventions §1.

Runner changes: `PersistedRunMetadata` carries an `approvals` map
(BTreeMap<step_id, ApprovalDecision>); each decision records who decided
when (operational only — does not feed any audit hash). Resume's
sentinel-handling pass: AwaitingApproval steps with an Approved sentinel
get a Completed checkpoint with a synthetic empty-record output;
Rejected sentinels halt the run as Failed. Defense-in-depth re-validates
StepKind::ApprovalGate at resume time.

Adversarial review found 7 HIGH issues across 2 reviewers; consensus on
the biggest one (TOCTOU race in record_approval_decision). All 5
actionable fixes applied:

- Race in record_approval_decision (consensus #1+H1): prior 3-tx
  read+validate+write let two concurrent operators both pass the
  in-memory prior-decision check and silently overwrite each other. Now
  uses a CAS retry loop via the new compare_and_swap_metadata primitive.
  4-thread regression test asserts exactly 1 ok + 3 already-decided.
- halt_with_failed_step overwrite (#2): rejected sentinel was
  overwriting an earlier independent step failure as the halt cause.
  Now uses get_or_insert to preserve the FIRST failure.
- Silent no-op on non-awaiting sentinel (#3): now emits a warning so
  operators see when their approval doesn't apply.
- Synthetic output hash (H2): now locked by a regression test against a
  golden hex computed externally (`{"Map":{}}` → f4242fc8...).
- Defense-in-depth StepKind check (H3): refuses to apply a sentinel to
  a non-gate step even if validation slipped past.

Documented but deferred: H4 (concurrent resume torn-write on
store_output) — the documented assumption is single-writer-by-process;
revisit in a future hardening sprint with flock or atomic-rename.

Tests: 682 workspace tests pass (was 659). Persistence tests 41 (was
35). Runner tests 33 (was 16) — +13 for approval-gate UX, +4 for
review-driven regressions. Smoke test on customer_support_triage
end-to-end (run → list → approve → resume → list completed) verified.

Anchored docs: docs/{design,architecture,test-plan}-0.3-s2c-approval-gate-completion.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
escapeboy added a commit that referenced this pull request Apr 25, 2026
Wave-based scheduler that fans out steps with no inter-dependencies
across `std::thread::spawn`'d workers. Operators opt in via
`--concurrency <N>`; default `1` = sequential (preserves prior
behavior). Persistent path only; ephemeral `run` stays single-
threaded. No tokio dependency.

End-to-end determinism verified: per-step `output_hash` is
bit-identical across concurrency levels for successful runs.
Document_processing fan-out workflow (5 steps, ingest →
{classify, extract, summarize} → merge) confirmed identical hashes
at concurrency=1 vs concurrency=4.

Implementation:
- `WorkflowValidator::topological_levels(def)` partitions the DAG
  into Vec<Vec<String>> waves. Within each wave, steps are sorted
  alphabetically for deterministic dispatch order.
- `WorkflowRunner::execute_steps_concurrent` processes one wave at
  a time; gates inline (sequential, pause-the-run); source steps
  fan out chunked by max_concurrency.
- `compile_and_run_step` extracted helper: pure compile+run path
  returning a Value. Workers call this; coordinator owns all
  DataStore and SQLite mutation.

Adversarial review found 4 HIGH issues; 3 fixed in-commit, 1
documented:

- #1: panic in chunk detached sibling workers — `?` inside join
  loop dropped pending JoinHandles. Now collects all join results
  into a Vec before processing; guarantees no detached threads on
  early-exit paths.
- #2: input failure mid-chunk left earlier siblings Running on
  disk forever — interleaved validation+marking. Now two-pass:
  pass 1 validates all chunk inputs (no side effects); pass 2
  marks Running atomically and dispatches. Regression test
  asserts no Running checkpoint survives a halted run.
- #3: panic handler only caught `&'static str` payloads (missing
  the common `panic!("...{}...", var)` shape) and lost step_id.
  Now: tries `String` first, carries step_id alongside each
  JoinHandle, persists a Failed checkpoint with the panic message.
- #4 (documented): failed runs at concurrency > 1 may produce
  more Completed step_results than a sequential run (siblings
  that started before the failure was detected complete normally).
  This is honest reporting of what executed; cross-concurrency
  audit replay is NOT a contract. Successful runs ARE concurrency-
  invariant.

Tests: 703 workspace tests pass (was 694 after 0.3-S3). +9 net
including:
- topological_levels (4 tests in validator)
- concurrency_n_produces_identical_output_hashes_to_concurrency_1
  (the headline determinism test)
- concurrent_run_persists_all_step_checkpoints
- concurrent_run_with_failure_halts_and_other_in_flight_complete
- concurrent_resume_honors_already_completed
- concurrent_input_failure_does_not_leave_siblings_running
  (review-driven #2 regression)

Anchored doc: docs/design-0.3-s4-concurrent-execution.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
escapeboy added a commit that referenced this pull request Apr 25, 2026
Replaces the prior primitive "retry once regardless of max_attempts"
with a proper RetryPolicy honoring `max_attempts` up to 100 attempts
with `100ms × 2^N` (capped at 5s) backoff between. A shared
`retry_with_backoff` helper is used by both sequential and concurrent
execution paths so retry semantics don't drift.

Final-attempt failure surfaces as "failed after N attempts: <reason>"
in error_msg for operator triage.

Adversarial review found 2 HIGH issues, both fixed:

- #1: eprintln retry log polluted unit-test stderr (and any embedder
  capturing stderr). Now gated under cfg(not(test)) so production
  embedders see the log line and tests are silent.
- #2: cfg(test) sleep skip is correctly scoped, but the gap surfaced
  by the reviewer was that NO test verified real-timing backoff.
  Added orchestrator/tests/retry_timing.rs which runs in a build
  where cfg(test) is NOT set on the orchestrator lib, exercising
  real sleeps. Asserts elapsed >= 250ms for a 3-attempt retry —
  catches future regressions that accidentally remove the sleep.

Tests: 712 workspace tests pass (was 703 after 0.3-S4). +9 net:
- 8 unit tests in `runner::tests::retry::` (backoff curve, success
  on first attempt, success after failures, exhaustion-and-wrap,
  on_transient=false, max_attempts<=1, no policy, end-to-end with
  bad step compile error)
- 1 integration test in tests/retry_timing.rs (real-sleep contract)

Anchored doc: docs/design-0.3-s5-retry-policies.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
escapeboy added a commit that referenced this pull request Apr 26, 2026
Closes the residual gap flagged in the 0.3-S15 review. The concurrent
execution path now pauses ALL pause-steps in the same DAG level in a
single pass — previously only the first was processed and remaining
pauses were silently deferred to subsequent resumes.

Use case: "wait for payment AND fraud-check" webhook fan-in. Multiple
external_trigger or approval_gate steps depending on a shared upstream
now pause in parallel, advancing independently as each decision/event
arrives. The downstream step (which depends on all of them) runs
exactly once after the last pause clears.

Wiring:
- `execute_steps_concurrent` wave loop iterates `pauses` instead of
  taking `pauses.first()`. Each pause persists its own checkpoint and
  (for ExternalTrigger) mints its own token via the existing reentrant
  acquire_trigger_token.
- New `persist_one_pause` helper isolates per-pause logic.
- Resume sentinel pass already iterates all approvals + triggers, so
  it picks up multiple advances per resume.

Review-driven fix (1 HIGH bug from adversarial review):

- **Partial-pause failure must not terminally-fail the run.** Earlier
  draft propagated the first per-pause error via `?` — if pause #1
  committed and pause #2's acquire_trigger_token failed (transient
  urandom error, CAS retry exhaustion), the run became terminally
  Failed, stranding pause #1's token with no recovery path (resume
  short-circuits on terminal status; record_external_trigger refuses
  with RunNotResumable). Fixed by isolating per-pause errors inside
  the loop: log a warning, continue to next pause, mark the run
  Paused on what DID commit. The wave loop is idempotent
  (acquire_trigger_token reuses existing tokens, upsert is upsert),
  so the next resume retries the failed pauses cleanly.

Sequential path asymmetry: `--concurrency 1` (`execute_steps`) is
unchanged. It processes one step at a time and serializes parallel
pauses across multiple resumes. Operators expecting AND-fan-in
webhook patterns must use `--concurrency 2+`. Documented in
CHANGELOG.

Tests: +5 in `tests::multi_pause_per_wave` covering 2-trigger
parallel, partial trigger fire, full trigger fire, mixed
approval+trigger, AND a partial-pause-failure recovery test that
simulates the failure on-disk shape via direct SQL and verifies
resume re-pauses cleanly with pause #1's token unchanged.

Workspace: orchestrator 202 → 207 tests. Clippy + fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant