Skip to content

WIP: Harness resilience — relaxed phase-2 gate, enriched resume diagnostics, timestamped summaries#46

Merged
pruiz merged 7 commits into
masterfrom
harness-resilience-summary-gating
Jun 6, 2026
Merged

WIP: Harness resilience — relaxed phase-2 gate, enriched resume diagnostics, timestamped summaries#46
pruiz merged 7 commits into
masterfrom
harness-resilience-summary-gating

Conversation

@pruiz
Copy link
Copy Markdown
Owner

@pruiz pruiz commented Jun 5, 2026

What

Implements the plan in .project/harness-resilience-summary-gating-plan.md to make the run-summary gate actionable for weak models.

Why

make phase-2 with a weak model failed (exit 2) despite producing PENDING findings, because the harness only knew the run was incomplete — it could not tell the model which artifacts were missing. The auto-resume prompt was generic, so a model that thought it was done could not self-diagnose.

Changes

Completion gate (tools/phases/completion.py)

  • check_phase_graceful_completion now returns (bool, list[str]); the list carries human-readable missing-artifact messages consumed by the resume prompt builder.
  • Phase 2 gate relaxes from pending_fresh AND summary_fresh to summary_fresh alone. A legitimate "0 new findings" outcome no longer fails. Phase 3 (counter-analysis) provides the second-line review.
  • build_phase_resume_prompt takes optional failure_details and renders a "Missing required artifacts:" block when given, otherwise keeps the existing generic reassess wording.

Phase prompts (7 files under prompts/)

  • All run-summary paths now use runs/phase-X[-FINDING]-summary-YYYY-MM-DD-HHMMSS.md to prevent overwrites on rerun. Globs in completion.py already use wildcards.

Agent files (5 files under .opencode/agents/)

  • Removed the "run summary ... when practical" hedge. Phase prompts are now the single source of truth for required durable artifacts. A delegating note replaces each removed line.

Runtime callers (tools/codecome/harness.py, tools/codecome/phase_1.py)

  • Unpack the new tuple, preserve the short-circuit on check_phase_graceful_completion, and pass phase_failures (or None) to the resume prompt builder.

Tests (5 test files)

  • Updated graceful-completion assertions to assert on the tuple and on the diagnostic fragments (runs/phase-2-summary, itemdb/evidence/<id>/, itemdb/reports, itemdb/notes/sandbox-plan.md, etc.) so regressions in the human-readable messages are caught.
  • Updated callers that monkeypatch check_phase_graceful_completion to return a tuple.

Verification

  • make tests: 676 passed.
  • Frontmatter check: 16/16 finding files validated.
  • Phase artifact check: all implemented phases pass.

Non-goals

  • Chat mode gating — chat mode has no harness gating.
  • Phase 1 multi-subphase gating — subphase-specific artifact sets preserved; only failure reporting is enriched.
  • Models that hallucinate findings — a model-quality issue, not a harness issue.

WIP checklist for review

  • Confirm the phase-2 gate relaxation matches intent (no new findings = pass).
  • Confirm the failure-detail strings are informative enough for a weak model to act on.
  • Confirm timestamped summary paths are acceptable across all phases.
  • Confirm agent files no longer need a hedge for chat mode.

Summary by CodeRabbit

  • Documentation

    • Agent workflow guides updated with stricter artifact and validation requirements
    • Run summary output filenames now include timestamps for better tracking
  • Improvements

    • Phase completion now provides specific diagnostics identifying missing required artifacts
    • Resume operations now include targeted guidance on which artifacts must be fixed
    • Session status queries improved for better reliability

pruiz added 2 commits June 5, 2026 23:37
…, timestamped summaries

Make the run-summary gate actionable for weak models.

Completion gate
- check_phase_graceful_completion now returns (bool, list[str]); the
  list carries human-readable missing-artifact messages consumed by the
  resume prompt builder.
- Phase 2 gate relaxes from (pending_fresh AND summary_fresh) to
  summary_fresh alone. A legitimate '0 new findings' outcome no longer
  fails. Phase 3 (counter-analysis) provides the second-line review.
- build_phase_resume_prompt takes optional failure_details and renders
  a 'Missing required artifacts:' block when given, otherwise keeps
  the existing generic reassess wording.

Phase prompts
- All run-summary paths now use runs/phase-X[-FINDING]-summary-YYYY-MM-DD-HHMMSS.md
  to prevent overwrites on rerun. Globs in completion.py already use
  wildcards, so the new format is matched by the gate.

Agent files
- Removed the 'run summary ... when practical' hedge from all five
  agent files (auditor, reviewer, validator, exploiter, recon).
  Phase prompts are now the single source of truth for required
  durable artifacts. A delegating note replaces each removed line.

Runtime callers
- harness.py and phase_1.py unpack the new tuple, preserve the
  short-circuit on check_phase_graceful_completion, and pass
  phase_failures (or None) to the resume prompt builder.

Tests
- Updated graceful-completion assertions to assert on the tuple and
  to assert on the diagnostic fragments (runs/phase-2-summary,
  itemdb/evidence/<id>/, itemdb/reports, itemdb/notes/sandbox-plan.md,
  etc.) so regressions in the human-readable messages are caught.
- Updated callers that monkeypatch check_phase_graceful_completion to
  return a tuple.

All 676 tests pass; frontmatter and phase-artifact checks pass.
Records the design rationale and verification checklist for the
harness resilience changes implemented in 7cd8843. Kept on the
feature branch for PR review; the master commit itself ships only
the code changes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f40018e2-1629-4e62-b2d2-7ea918dfd3d7

📥 Commits

Reviewing files that changed from the base of the PR and between 77584e5 and 5f70fde.

📒 Files selected for processing (9)
  • .project/harness-resilience-summary-gating-plan.md
  • tests/test_phase_failure_state_reset.py
  • tests/test_phase_graceful_completion_subphases.py
  • tests/test_phases_completion.py
  • tests/test_session.py
  • tools/codecome/harness.py
  • tools/codecome/phase_1.py
  • tools/codecome/session.py
  • tools/phases/completion.py

📝 Walkthrough

Walkthrough

This PR implements run-summary gating resilience by refactoring phase-completion gating from boolean returns to structured tuples containing success status and human-readable failure details. All phase prompts are standardized to use timestamped run-summary filenames; agent documentation is cleaned of conflicting run-summary obligations and made to defer to phase-prompt specifications; completion logic is enhanced to validate phase-specific artifacts and return exact failure reasons; resume prompts are enriched with targeted guidance for missing artifacts; and comprehensive test coverage validates the new tuple contract and failure-state reset across retry attempts. A separate session API update changes status endpoint querying.

Changes

Harness Resilience and Run-Summary Gating

Layer / File(s) Summary
Harness resilience plan and design decisions
.project/harness-resilience-summary-gating-plan.md
Design document specifying the observed phase-2 gating failure, root causes (overly strict phase-2 gate, generic resume prompt, conflicting "when practical" language), and implementation plan including artifact-freshness checks, timestamped run-summary naming, relaxed gating, structured failure messaging, and verification/rollback strategies.
Agent documentation cleanup
.opencode/agents/auditor.md, .opencode/agents/exploiter.md, .opencode/agents/recon.md, .opencode/agents/reviewer.md, .opencode/agents/validator.md
Removes run-summary template references from agent required reading and completion checklists; adds explicit requirement that agents follow the phase prompt's durable-artifacts specification; harmonizes completion-checklist language across auditor, exploiter, recon, reviewer, and validator agents.
Phase prompts with standardized timestamped run-summaries
prompts/phase-1a-profile.md, prompts/phase-1b-recon.md, prompts/phase-1c-sandbox.md, prompts/phase-2-audit.md, prompts/phase-4-validate.md, prompts/phase-5-exploit.md, prompts/phase-6-report.md
Standardizes all phase prompt run-summary output paths to use timestamped filenames (YYYY-MM-DD-HHMMSS format) instead of static paths, making run-summary freshness checkable and idempotent across multiple runs.
Completion API refactor: tuple return with failure details
tools/phases/completion.py
Refactors check_phase_graceful_completion to return tuple (bool, list[str]) with human-readable failure messages; adds internal helpers for path display and timestamped run-summary freshness checking; implements phase-specific gating where Phase 1a/1b/1c validate subphase-required artifacts, monolith Phase 1 requires full notes plus run-summary, Phases 2/3 enforce run-summary freshness, and Phases 4/5/6 validate evidence/exploit/report freshness plus finding consistency; all paths accumulate failure details instead of silent boolean negation.
Resume prompt enhancement
tools/phases/completion.py
build_phase_resume_prompt now accepts optional failure_details list; when provided, generates targeted "Missing required artifacts" section listing specific gaps and instructs fixing only those items; when absent, uses previous generic reassessment guidance; improves resume/repair UX by narrowing focus to actual gaps.
Harness orchestration
tools/codecome/harness.py
Harness declares phase_ok and phase_failures local state variables; unpacks check_phase_graceful_completion result into these instead of discarding failures; uses phase_ok to decide graceful-forgiveness vs. incomplete-run state on mid-turn cutoff; passes phase_failures into build_phase_resume_prompt to enrich resume context with exact missing artifacts.
Phase 1 orchestration
tools/codecome/phase_1.py
_run_subphase destructures check_phase_graceful_completion tuple into phase_ok and phase_failures; preserves phase_failures alongside existing success side effects; passes accumulated phase_failures to build_phase_resume_prompt on mid-turn resume so diagnostic details inform the next iteration.
Core test updates
tests/test_phase_1_codeql_plan_repair.py, tests/test_phase_1_mid_turn_forgiveness.py, tests/test_phase_graceful_completion_subphases.py, tests/test_phases_completion.py, tests/test_render_settings_propagation.py
All existing test mocks and assertions updated to handle (ok, failures) tuple; success cases assert ok is True and failures == []; failure cases assert ok is False and validate failure-detail content includes expected artifact paths (e.g., "itemdb/notes", "sandbox-plan.md", "runs/phase-X-summary*.md"); new tests validate Phase 2 glob-string correctness, resume-prompt opener wording varies by finish reason, and failure-detail reset behavior across retries.
New tests: failure-detail reset
tests/test_phase_failure_state_reset.py
New test module with two tests verifying failure details from one attempt are not reused in subsequent resume attempt; one test covers harness path by patching harness/runner helpers; second test covers phase-1 subphase path; both assert failure_details argument to build_phase_resume_prompt transitions from populated list to None on resume, preventing stale diagnostics.
Session API update
tools/codecome/session.py, tests/test_session.py
Updates get_session_status() to query /session/status instead of /session/{session_id}, decodes session-id-keyed status map, returns "idle" for missing entries, validates response is a dictionary; tests cover busy/retry/idle resolution, missing-entry defaults, and request-failure handling.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • pruiz/CodeCome#43: Both PRs modify the resume/repair control flow in tools/codecome/harness.py and tools/codecome/phase_1.py—main PR adds failure_details to resume prompts after incomplete runs, while retrieved PR adds early-exit logic for resume_not_ready to prevent those prompts.
  • pruiz/CodeCome#40: Both PRs modify tools/phases/completion.py and graceful-completion gating for Phase 1 subphases, with tightly coupled tests in tests/test_phase_graceful_completion_subphases.py, so changes overlap at artifact-freshness/check logic.

🐰 From gating's gates to prompts that state,
We've traced the artifacts before too late.
With failures named, and timestamps clear,
The harness knows just what's sincere. ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch harness-resilience-summary-gating

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Coverage Report

Metric Value
Line Coverage 76.5%
Lines Covered 0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-06T10:44:06.797Z

Copy link
Copy Markdown
Owner Author

@pruiz pruiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments added inline. Main theme: align the new “phase prompt is the source of truth for required artifacts” rule with what the completion gates and resume checklists actually enforce.

Comment thread tools/phases/completion.py
Comment thread tools/phases/completion.py
Comment thread tools/phases/completion.py
Comment thread tools/phases/completion.py
Comment thread tools/phases/completion.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves phase-mode harness resilience by (1) enriching completion-gate diagnostics to make auto-resume actionable, (2) relaxing the Phase 2 completion gate to allow “0 new findings” runs to pass as long as a run summary is produced, and (3) standardizing run-summary filenames across prompts to be timestamped to avoid overwrites.

Changes:

  • check_phase_graceful_completion now returns (ok, failure_details) and Phase 2 gating is relaxed to require only a fresh run summary.
  • Resume prompts can include an explicit “Missing required artifacts” block populated from gate diagnostics.
  • Phase prompts now instruct timestamped run-summary filenames; agent docs remove “when practical” hedges and delegate artifact requirements to phase prompts; tests updated for the tuple return.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/phases/completion.py Returns (bool, list[str]) from completion gate; relaxes Phase 2 gate; adds failure-detail-driven resume prompt output; updates Phase 2 checklist.
tools/codecome/harness.py Unpacks the new completion-gate tuple and forwards failure details into resume prompts.
tools/codecome/phase_1.py Unpacks the new completion-gate tuple and forwards failure details into resume prompts for subphases.
tests/test_render_settings_propagation.py Updates monkeypatches for the new tuple return shape.
tests/test_phases_completion.py Updates assertions to validate (ok, failures) and adds coverage for Phase 2 relaxed gate + diagnostics + resume prompt rendering.
tests/test_phase_graceful_completion_subphases.py Updates subphase tests for tuple return and validates failure detail presence.
tests/test_phase_1_mid_turn_forgiveness.py Updates mocks to return (ok, failures) tuples.
tests/test_phase_1_codeql_plan_repair.py Updates mocks to return (ok, failures) tuples.
prompts/phase-1a-profile.md Switches run-summary path guidance to timestamped filename.
prompts/phase-1b-recon.md Switches run-summary path guidance to timestamped filename; removes hardcoded summary path mentions.
prompts/phase-1c-sandbox.md Switches run-summary path guidance to timestamped filename; removes hardcoded summary path mentions in prose.
prompts/phase-2-audit.md Switches run-summary path guidance to timestamped filename.
prompts/phase-4-validate.md Switches run-summary path guidance to timestamped filename and updates example.
prompts/phase-5-exploit.md Switches run-summary path guidance to timestamped filename and updates example.
prompts/phase-6-report.md Switches run-summary path guidance to timestamped filename.
.opencode/agents/auditor.md Removes run-summary “when practical” guidance; adds delegation note pointing to phase prompt requirements.
.opencode/agents/reviewer.md Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.opencode/agents/recon.md Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.opencode/agents/exploiter.md Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.opencode/agents/validator.md Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.project/harness-resilience-summary-gating-plan.md Adds the design/implementation plan document for the change set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/codecome/phase_1.py
Comment thread tools/phases/completion.py
Comment thread tools/phases/completion.py
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 5, 2026

Greptile Summary

This PR makes the phase harness more resilient for weak models by enriching resume diagnostics with specific missing-artifact details, relaxing the Phase 2 completion gate to require only a fresh run summary (zero new findings is a valid outcome), and switching run-summary filenames to a timestamped format to prevent overwrites on re-run.

  • completion.py: check_phase_graceful_completion now returns (bool, list[str]) carrying human-readable missing-artifact messages; Phase 2 gate drops the pending_fresh requirement; build_phase_resume_prompt accepts optional failure_details and renders a targeted "Missing required artifacts:" block.
  • harness.py / phase_1.py: callers unpack the new tuple, phase_failures/phase_ok are defensively initialised before the retry loop (resolving the previous unbound-variable concern), and failure details are forwarded to the resume prompt builder.
  • session.py: get_session_status migrated from a per-session endpoint to the global /session/status map; absence of a session_id in the response is now treated as idle.

Confidence Score: 5/5

Safe to merge; the changes are well-scoped harness retry/diagnostic improvements with 676 passing tests.

All structural changes — tuple return from the completion gate, failure-details forwarding, phase_failures/phase_ok initialisation — are correctly implemented and covered by targeted new tests. The session endpoint migration is the most externally-facing change but degrades gracefully to None on failure.

tools/phases/completion.py — the Phase 3 glob inconsistency and the dead code in _find_finding_file are minor but worth a second look before the final merge.

Important Files Changed

Filename Overview
tools/phases/completion.py Core gating logic refactored: check_phase_graceful_completion now returns (bool, list[str]); Phase 2 gate relaxed to summary-only; build_phase_resume_prompt gains optional failure_details. Minor: dead code in _find_finding_file and Phase 3 glob is stricter than Phase 2's backward-compatible form.
tools/codecome/harness.py Properly unpacks new tuple from check_phase_graceful_completion; phase_failures and phase_ok are defensively initialised at lines 137-138 before the retry loop; failure_details forwarded to resume prompt builder.
tools/codecome/phase_1.py _run_subphase now initialises phase_failures and phase_ok at lines 553-554 before the while-True loop (resolving the prior unbound-variable concern); tuple unpacking and failure_details forwarding are correct.
tools/codecome/session.py get_session_status migrated from per-session GET /session/{id} to global GET /session/status; absence of session_id in the returned map is correctly treated as idle. Tests updated to match new API shape.
tests/test_phase_failure_state_reset.py New test confirming that phase_failures from a previous attempt are not reused in subsequent retry loops, for both harness.run_phase_mode and phase_1._run_subphase.
tests/test_phases_completion.py Comprehensive new tests for the tuple return shape, Phase 2 no-findings pass, failure-detail diagnostic strings, resume prompt rendering, and per-subphase run-summary requirements.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_run_single_attempt] --> B{returncode?}
    B -- "!= 0" --> INFRA[Infrastructure error\nfatal retry or break]
    B -- "== 0" --> C{finish_warning?}
    C -- "None" --> D{FINISH_TERMINAL_OK?}
    C -- "set" --> E{mid-turn AND\nno permission error?}
    E -- "yes" --> F[check_phase_graceful_completion\nreturns bool, failures]
    F -- "ok=True" --> G[graceful_forgiveness\nfinish_warning=None]
    F -- "ok=False" --> H[returncode=2]
    E -- "no" --> H
    D -- "yes" --> I[check_phase_graceful_completion]
    I -- "ok=True" --> J[continue to validation]
    I -- "ok=False" --> K[returncode=2\nfinish_warning set]
    D -- "no" --> J
    J --> L[frontmatter / artifact\nvalidation]
    L -- "pass" --> M[Phase complete]
    L -- "fail" --> N[resume with repair prompt]
    H --> O{retry budget?}
    K --> O
    O -- "yes" --> P[build_phase_resume_prompt\nfailure_details=phase_failures]
    O -- "no" --> Q[exit 2]
    P --> A
    G --> D
Loading

Reviews (2): Last reviewed commit: "fix(completion): keep path diagnostics a..." | Re-trigger Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/codecome/phase_1.py (1)

626-643: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Initialize phase_failures before the mid-turn resume path.

If a subphase stops mid-turn with a permission error, Lines 627-645 skip the graceful-completion call, so phase_failures is never assigned. The auto-resume block still reads it at Line 770, which will raise UnboundLocalError instead of sending the resume prompt.

Suggested fix
     step_finish_count = 0
     transcript_path: Path = Path()
     finish_warning: str | None = None
     subphase_start_time = time.time()
 
@@
     while True:
         attempt_number += 1
+        phase_failures: list[str] = []
         _reset_subagent_state()
         finish_warning = None

Also applies to: 768-770

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codecome/phase_1.py` around lines 626 - 643, The variable
phase_failures can be unbound when the mid-turn resume path skips calling
check_phase_graceful_completion; to fix, initialize phase_failures to a safe
default (e.g., an empty list or None) before the block that checks
finish_warning so it is always defined for the later auto-resume logic that
reads phase_failures; update the code around the finish_warning handling
(reference symbols: phase_failures, finish_warning, last_permission_error,
check_phase_graceful_completion) to set phase_failures = [] (or None) before the
conditional or explicitly assign it in the permission-error branch so the later
auto-resume code can safely use it.
🧹 Nitpick comments (1)
.project/harness-resilience-summary-gating-plan.md (1)

18-18: ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint.

These unlabeled fences trigger MD040. Add explicit languages (text, python, bash) so docs lint stays clean.

Also applies to: 112-112, 119-119, 146-146, 150-150, 155-155, 164-164, 178-178, 184-184, 199-199, 205-205, 220-220, 226-226, 241-241, 256-256, 276-276, 296-296

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.project/harness-resilience-summary-gating-plan.md at line 18, Update all
unlabeled fenced code blocks (``` ) in the document by adding explicit language
identifiers (e.g., ```text, ```bash, ```python) so they satisfy markdownlint
MD040; locate every triple-backtick fence in the file (including the instances
noted around lines 112, 119, 146, 150, 155, 164, 178, 184, 199, 205, 220, 226,
241, 256, 276, 296) and replace the opening fence with the appropriate language
token based on the block content.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/codecome/harness.py`:
- Around line 137-138: phase_failures is being initialized once and then reused
across retry attempts, leaking previous-attempt diagnostics into subsequent
resumes; to fix, reinitialize phase_failures = [] at the start of each retry
attempt (i.e., inside the retry loop, before any calls to
check_phase_graceful_completion() or before logic that sets/reads phase_ok) so
each attempt starts with a fresh failures list; apply the same change to the
other places where phase_failures is declared/used to ensure no stale failures
are carried between attempts.

In `@tools/phases/completion.py`:
- Around line 204-226: The code currently returns immediately after appending
the "required phase-1 notes are not all present" failure, which prevents later
checks (sandbox_state_recorded and summary_fresh) from running and leaves
failures incomplete; update the control flow in the function that builds the
failures list (the block that computes fresh_required via _path_is_fresh over
required_artifacts and appends the NOTES_ROOT/... message using
_PHASE1_REQUIRED_ARTIFACT_NAMES) to remove the early return and let execution
continue to evaluate sandbox_state_recorded and summary_fresh so all missing
items are appended to failures, then return once at the end (so
build_phase_resume_prompt receives a complete failure_details list).

---

Outside diff comments:
In `@tools/codecome/phase_1.py`:
- Around line 626-643: The variable phase_failures can be unbound when the
mid-turn resume path skips calling check_phase_graceful_completion; to fix,
initialize phase_failures to a safe default (e.g., an empty list or None) before
the block that checks finish_warning so it is always defined for the later
auto-resume logic that reads phase_failures; update the code around the
finish_warning handling (reference symbols: phase_failures, finish_warning,
last_permission_error, check_phase_graceful_completion) to set phase_failures =
[] (or None) before the conditional or explicitly assign it in the
permission-error branch so the later auto-resume code can safely use it.

---

Nitpick comments:
In @.project/harness-resilience-summary-gating-plan.md:
- Line 18: Update all unlabeled fenced code blocks (``` ) in the document by
adding explicit language identifiers (e.g., ```text, ```bash, ```python) so they
satisfy markdownlint MD040; locate every triple-backtick fence in the file
(including the instances noted around lines 112, 119, 146, 150, 155, 164, 178,
184, 199, 205, 220, 226, 241, 256, 276, 296) and replace the opening fence with
the appropriate language token based on the block content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ec9cf21c-483b-4578-9a23-6af534c4bfeb

📥 Commits

Reviewing files that changed from the base of the PR and between f140f4b and 77584e5.

📒 Files selected for processing (21)
  • .opencode/agents/auditor.md
  • .opencode/agents/exploiter.md
  • .opencode/agents/recon.md
  • .opencode/agents/reviewer.md
  • .opencode/agents/validator.md
  • .project/harness-resilience-summary-gating-plan.md
  • prompts/phase-1a-profile.md
  • prompts/phase-1b-recon.md
  • prompts/phase-1c-sandbox.md
  • prompts/phase-2-audit.md
  • prompts/phase-4-validate.md
  • prompts/phase-5-exploit.md
  • prompts/phase-6-report.md
  • tests/test_phase_1_codeql_plan_repair.py
  • tests/test_phase_1_mid_turn_forgiveness.py
  • tests/test_phase_graceful_completion_subphases.py
  • tests/test_phases_completion.py
  • tests/test_render_settings_propagation.py
  • tools/codecome/harness.py
  • tools/codecome/phase_1.py
  • tools/phases/completion.py

Comment thread tools/codecome/harness.py
Comment thread tools/phases/completion.py
pruiz added 3 commits June 6, 2026 00:28
…, 5, 6; align phase 2 glob; honor resume reason

Address PR #46 review feedback (owner + Copilot):

- Subphase gates (1a, 1b, 1c) now also check for a fresh
  runs/phase-X-summary*.md via the new _append_run_summary_check helper
  in tools/phases/completion.py.
- Phase 4 and 5 gates now also check for a fresh
  runs/phase-{4|5}-<finding>-summary*.md.
- Phase 6 gate now also checks for a fresh runs/phase-6-summary*.md.
- Phase 3 resume checklist now includes a run-summary line in
  phase_checklist_lines, matching the already-enforced gate check.
- Phase 2 diagnostic message and glob now use the same pattern
  (no mandatory hyphen after 'summary'): both are
  runs/phase-2-summary*.md.
- build_phase_resume_prompt opening line is no longer a universal
  'Your previous run completed'. A new _resume_opener_for_reason helper
  classifies the recorded reason (infrastructure_error, mid-turn
  cutoffs, finish failures, graceful_forgiveness, terminal-OK with
  missing artifacts, or unknown) and renders a context-specific
  opener.
- tools/codecome/phase_1.py defensively initializes phase_failures
  and phase_ok at the top of _run_subphase, mirroring the pattern
  already established in tools/codecome/harness.py. This eliminates
  the UnboundLocalError risk on the path that builds the resume
  prompt when the run was set to returncode=2 without ever entering
  the graceful-completion branch.

Tests:
- 15 new tests across 5 new classes in
  tests/test_phases_completion.py:
  * TestPhase2GlobStringMatchesDiagnostic
  * TestResumePromptOpenerDistinguishesReasons
  * TestSubphaseGatesRequireRunSummary
  * TestPhase45And6GatesRequireRunSummary
  * TestPhase3ChecklistMentionsRunSummary
- Existing test_resume_prompt_with_failure_details_lists_missing_artifacts
  updated to use the unhyphenated glob fragment.
- tests/test_phase_graceful_completion_subphases.py: positive tests
  now create a fresh runs/phase-1{a,b,c}-summary.md; negative tests
  assert the new summary failure fragment.

make tests: 676 -> 691 pass; pre-existing threat-model.md
check-phase-artifacts warning is unrelated (verified with stash).
Append section 8 to .project/harness-resilience-summary-gating-plan.md
documenting the owner-review-driven expansion:

- Subphase 1a/1b/1c, phase 4, phase 5, and phase 6 gates now also
  check for a fresh run summary.
- Phase 3 resume checklist now includes a run-summary line.
- Phase 2 diagnostic and glob are now consistent (no mandatory
  hyphen).
- build_phase_resume_prompt opener is now context-specific via
  _resume_opener_for_reason.
- tools/codecome/phase_1.py has defensive init of phase_failures
  and phase_ok.

The original plan explicitly deferred subphase summary gating as a
non-goal and did not gate phases 4/5/6 on summary either; the
follow-up section records that the review pushed back and that
this commit resolves the gap.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comment thread tools/phases/completion.py Outdated
Comment thread tools/phases/completion.py Outdated
Comment thread tools/phases/completion.py Outdated
Comment thread tools/phases/completion.py Outdated
@pruiz pruiz marked this pull request as ready for review June 6, 2026 14:26
@pruiz pruiz merged commit 3a10252 into master Jun 6, 2026
7 of 8 checks passed
pruiz added a commit that referenced this pull request Jun 6, 2026
…, 5, 6; align phase 2 glob; honor resume reason

Address PR #46 review feedback (owner + Copilot):

- Subphase gates (1a, 1b, 1c) now also check for a fresh
  runs/phase-X-summary*.md via the new _append_run_summary_check helper
  in tools/phases/completion.py.
- Phase 4 and 5 gates now also check for a fresh
  runs/phase-{4|5}-<finding>-summary*.md.
- Phase 6 gate now also checks for a fresh runs/phase-6-summary*.md.
- Phase 3 resume checklist now includes a run-summary line in
  phase_checklist_lines, matching the already-enforced gate check.
- Phase 2 diagnostic message and glob now use the same pattern
  (no mandatory hyphen after 'summary'): both are
  runs/phase-2-summary*.md.
- build_phase_resume_prompt opening line is no longer a universal
  'Your previous run completed'. A new _resume_opener_for_reason helper
  classifies the recorded reason (infrastructure_error, mid-turn
  cutoffs, finish failures, graceful_forgiveness, terminal-OK with
  missing artifacts, or unknown) and renders a context-specific
  opener.
- tools/codecome/phase_1.py defensively initializes phase_failures
  and phase_ok at the top of _run_subphase, mirroring the pattern
  already established in tools/codecome/harness.py. This eliminates
  the UnboundLocalError risk on the path that builds the resume
  prompt when the run was set to returncode=2 without ever entering
  the graceful-completion branch.

Tests:
- 15 new tests across 5 new classes in
  tests/test_phases_completion.py:
  * TestPhase2GlobStringMatchesDiagnostic
  * TestResumePromptOpenerDistinguishesReasons
  * TestSubphaseGatesRequireRunSummary
  * TestPhase45And6GatesRequireRunSummary
  * TestPhase3ChecklistMentionsRunSummary
- Existing test_resume_prompt_with_failure_details_lists_missing_artifacts
  updated to use the unhyphenated glob fragment.
- tests/test_phase_graceful_completion_subphases.py: positive tests
  now create a fresh runs/phase-1{a,b,c}-summary.md; negative tests
  assert the new summary failure fragment.

make tests: 676 -> 691 pass; pre-existing threat-model.md
check-phase-artifacts warning is unrelated (verified with stash).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants