feat: baseline test gate, PAR evidence gate, milestone writes by egerev · Pull Request #30 · egerev/superflow

egerev · 2026-03-23T12:15:02Z

Summary

Sprint 2/6 of Supervisor Enforcement Hardening. Integrates Sprint 1 validators into execution flow.

Baseline test gate in execute_sprint() — stops before Claude invocation if tests fail
PAR evidence validation with separate retry counter (2 max, independent from Claude retries)
JSON summary validation — invalid summaries trigger retry
PR verification — 3 attempts, 5s delay, empty pr_url = failure
Milestone checkpoint writes at every transition (preserved in failure paths)
Parallel safety: queue reload after execute_parallel(), checkpoint on worker exception
Prompt rebuilt fresh each iteration (no unbounded growth)

Depends on: PR #27 (Sprint 1)

Tests: 176 total (24 new), all passing.

PAR:

Claude Code Quality: REQUEST_CHANGES → 7 fixes applied
Claude Product: reviewed
Codex Code Quality: reviewed
Codex Product: NEEDS_FIXES → fixes applied

All confirmed issues fixed, re-verified 176 tests pass.

Test plan

Baseline pass → proceed (test_baseline_pass)
Baseline fail → mark_failed + checkpoint (test_baseline_fail_marks_failed)
No test runner → skip (test_baseline_no_runner_skips)
PAR retry separate from Claude retry (test_par_integration_retry_separate_counter)
PAR max retries → mark_failed (test_par_integration_max_retries_fails)
PR retry 3 times (test_pr_validation_retry_3_times)
PR hard fail after 3 (test_pr_validation_hard_fail_after_retries)
Milestones in checkpoint (test_milestone_writes_in_checkpoint)

🤖 Generated with Claude Code

…rompt hardening Sprint 1 of enforcement hardening. Adds foundational validation infrastructure: - checkpoint.py: string ID support (int|str), load_checkpoint_by_name(), load_all_checkpoints() keeps numeric-only to protect downstream consumers - queue.py: baseline_cmd attribute with load/save roundtrip - supervisor.py: validation constants, _validate_evidence_verdicts(), _validate_par_evidence() with type guard, _validate_sprint_summary(), preflight .worktrees gitignore check, build_prompt() frontend injection - supervisor-sprint-prompt.md: PAR evidence instructions, pre-verified baseline section, {frontend_instructions} placeholder 20 new tests (152 total). Validators are foundations — integration into _attempt_sprint() is Sprint 2 scope. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…idation hardening Sprint 2 of enforcement hardening. Integrates Sprint 1 validators into execution: - Baseline test gate: _resolve_baseline_cmd() with priority chain, run_baseline_tests(), integrated in execute_sprint() inside try block - PAR validation: separate par_retries counter (not shared with Claude retries), max 2 PAR retries before mark_failed - Summary validation: invalid summaries treated as json_parse_error - PR verification: 3 retries with 5s delay, empty pr_url = failure - Milestone writes: baseline_passed, implemented, par_validated, pr_created in checkpoints (preserved in failure paths too) - Parallel safety: queue reload after execute_parallel, checkpoint on worker exception - Prompt rebuilt each iteration (no unbounded growth) - Attempt counter independent of retry counter for log filenames 24 new tests (176 total). Review fixes applied: stale queue.save, log counter, prompt growth, parallel checkpoint, empty pr_url, milestones. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

egerev and others added 2 commits March 23, 2026 19:52

This was referenced Mar 23, 2026

feat: notifications, milestone-aware resume, CLI wiring #32

Open

feat: complete supervisor enforcement hardening scope #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: baseline test gate, PAR evidence gate, milestone writes#30

feat: baseline test gate, PAR evidence gate, milestone writes#30
egerev wants to merge 2 commits intomainfrom
feat/enforcement-hardening-sprint-2

egerev commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

egerev commented Mar 23, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant