WIP: Harness resilience — relaxed phase-2 gate, enriched resume diagnostics, timestamped summaries by pruiz · Pull Request #46 · pruiz/CodeCome

pruiz · 2026-06-05T21:41:27Z

What

Implements the plan in .project/harness-resilience-summary-gating-plan.md to make the run-summary gate actionable for weak models.

Why

make phase-2 with a weak model failed (exit 2) despite producing PENDING findings, because the harness only knew the run was incomplete — it could not tell the model which artifacts were missing. The auto-resume prompt was generic, so a model that thought it was done could not self-diagnose.

Changes

Completion gate (tools/phases/completion.py)

check_phase_graceful_completion now returns (bool, list[str]); the list carries human-readable missing-artifact messages consumed by the resume prompt builder.
Phase 2 gate relaxes from pending_fresh AND summary_fresh to summary_fresh alone. A legitimate "0 new findings" outcome no longer fails. Phase 3 (counter-analysis) provides the second-line review.
build_phase_resume_prompt takes optional failure_details and renders a "Missing required artifacts:" block when given, otherwise keeps the existing generic reassess wording.

Phase prompts (7 files under prompts/)

All run-summary paths now use runs/phase-X[-FINDING]-summary-YYYY-MM-DD-HHMMSS.md to prevent overwrites on rerun. Globs in completion.py already use wildcards.

Agent files (5 files under .opencode/agents/)

Removed the "run summary ... when practical" hedge. Phase prompts are now the single source of truth for required durable artifacts. A delegating note replaces each removed line.

Runtime callers (tools/codecome/harness.py, tools/codecome/phase_1.py)

Unpack the new tuple, preserve the short-circuit on check_phase_graceful_completion, and pass phase_failures (or None) to the resume prompt builder.

Tests (5 test files)

Updated graceful-completion assertions to assert on the tuple and on the diagnostic fragments (runs/phase-2-summary, itemdb/evidence/<id>/, itemdb/reports, itemdb/notes/sandbox-plan.md, etc.) so regressions in the human-readable messages are caught.
Updated callers that monkeypatch check_phase_graceful_completion to return a tuple.

Verification

make tests: 676 passed.
Frontmatter check: 16/16 finding files validated.
Phase artifact check: all implemented phases pass.

Non-goals

Chat mode gating — chat mode has no harness gating.
Phase 1 multi-subphase gating — subphase-specific artifact sets preserved; only failure reporting is enriched.
Models that hallucinate findings — a model-quality issue, not a harness issue.

WIP checklist for review

Confirm the phase-2 gate relaxation matches intent (no new findings = pass).
Confirm the failure-detail strings are informative enough for a weak model to act on.
Confirm timestamped summary paths are acceptable across all phases.
Confirm agent files no longer need a hedge for chat mode.

Summary by CodeRabbit

Documentation
- Agent workflow guides updated with stricter artifact and validation requirements
- Run summary output filenames now include timestamps for better tracking
Improvements
- Phase completion now provides specific diagnostics identifying missing required artifacts
- Resume operations now include targeted guidance on which artifacts must be fixed
- Session status queries improved for better reliability

…, timestamped summaries Make the run-summary gate actionable for weak models. Completion gate - check_phase_graceful_completion now returns (bool, list[str]); the list carries human-readable missing-artifact messages consumed by the resume prompt builder. - Phase 2 gate relaxes from (pending_fresh AND summary_fresh) to summary_fresh alone. A legitimate '0 new findings' outcome no longer fails. Phase 3 (counter-analysis) provides the second-line review. - build_phase_resume_prompt takes optional failure_details and renders a 'Missing required artifacts:' block when given, otherwise keeps the existing generic reassess wording. Phase prompts - All run-summary paths now use runs/phase-X[-FINDING]-summary-YYYY-MM-DD-HHMMSS.md to prevent overwrites on rerun. Globs in completion.py already use wildcards, so the new format is matched by the gate. Agent files - Removed the 'run summary ... when practical' hedge from all five agent files (auditor, reviewer, validator, exploiter, recon). Phase prompts are now the single source of truth for required durable artifacts. A delegating note replaces each removed line. Runtime callers - harness.py and phase_1.py unpack the new tuple, preserve the short-circuit on check_phase_graceful_completion, and pass phase_failures (or None) to the resume prompt builder. Tests - Updated graceful-completion assertions to assert on the tuple and to assert on the diagnostic fragments (runs/phase-2-summary, itemdb/evidence/<id>/, itemdb/reports, itemdb/notes/sandbox-plan.md, etc.) so regressions in the human-readable messages are caught. - Updated callers that monkeypatch check_phase_graceful_completion to return a tuple. All 676 tests pass; frontmatter and phase-artifact checks pass.

Records the design rationale and verification checklist for the harness resilience changes implemented in 7cd8843. Kept on the feature branch for PR review; the master commit itself ships only the code changes.

coderabbitai · 2026-06-05T21:41:33Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f40018e2-1629-4e62-b2d2-7ea918dfd3d7

📥 Commits

Reviewing files that changed from the base of the PR and between 77584e5 and 5f70fde.

📒 Files selected for processing (9)

.project/harness-resilience-summary-gating-plan.md
tests/test_phase_failure_state_reset.py
tests/test_phase_graceful_completion_subphases.py
tests/test_phases_completion.py
tests/test_session.py
tools/codecome/harness.py
tools/codecome/phase_1.py
tools/codecome/session.py
tools/phases/completion.py

📝 Walkthrough

Walkthrough

This PR implements run-summary gating resilience by refactoring phase-completion gating from boolean returns to structured tuples containing success status and human-readable failure details. All phase prompts are standardized to use timestamped run-summary filenames; agent documentation is cleaned of conflicting run-summary obligations and made to defer to phase-prompt specifications; completion logic is enhanced to validate phase-specific artifacts and return exact failure reasons; resume prompts are enriched with targeted guidance for missing artifacts; and comprehensive test coverage validates the new tuple contract and failure-state reset across retry attempts. A separate session API update changes status endpoint querying.

Changes

Harness Resilience and Run-Summary Gating

Layer / File(s)	Summary
Harness resilience plan and design decisions `.project/harness-resilience-summary-gating-plan.md`	Design document specifying the observed phase-2 gating failure, root causes (overly strict phase-2 gate, generic resume prompt, conflicting "when practical" language), and implementation plan including artifact-freshness checks, timestamped run-summary naming, relaxed gating, structured failure messaging, and verification/rollback strategies.
Agent documentation cleanup `.opencode/agents/auditor.md`, `.opencode/agents/exploiter.md`, `.opencode/agents/recon.md`, `.opencode/agents/reviewer.md`, `.opencode/agents/validator.md`	Removes run-summary template references from agent required reading and completion checklists; adds explicit requirement that agents follow the phase prompt's durable-artifacts specification; harmonizes completion-checklist language across auditor, exploiter, recon, reviewer, and validator agents.
Phase prompts with standardized timestamped run-summaries `prompts/phase-1a-profile.md`, `prompts/phase-1b-recon.md`, `prompts/phase-1c-sandbox.md`, `prompts/phase-2-audit.md`, `prompts/phase-4-validate.md`, `prompts/phase-5-exploit.md`, `prompts/phase-6-report.md`	Standardizes all phase prompt run-summary output paths to use timestamped filenames (YYYY-MM-DD-HHMMSS format) instead of static paths, making run-summary freshness checkable and idempotent across multiple runs.
Completion API refactor: tuple return with failure details `tools/phases/completion.py`	Refactors `check_phase_graceful_completion` to return tuple (bool, list[str]) with human-readable failure messages; adds internal helpers for path display and timestamped run-summary freshness checking; implements phase-specific gating where Phase 1a/1b/1c validate subphase-required artifacts, monolith Phase 1 requires full notes plus run-summary, Phases 2/3 enforce run-summary freshness, and Phases 4/5/6 validate evidence/exploit/report freshness plus finding consistency; all paths accumulate failure details instead of silent boolean negation.
Resume prompt enhancement `tools/phases/completion.py`	`build_phase_resume_prompt` now accepts optional `failure_details` list; when provided, generates targeted "Missing required artifacts" section listing specific gaps and instructs fixing only those items; when absent, uses previous generic reassessment guidance; improves resume/repair UX by narrowing focus to actual gaps.
Harness orchestration `tools/codecome/harness.py`	Harness declares `phase_ok` and `phase_failures` local state variables; unpacks `check_phase_graceful_completion` result into these instead of discarding failures; uses `phase_ok` to decide graceful-forgiveness vs. incomplete-run state on mid-turn cutoff; passes `phase_failures` into `build_phase_resume_prompt` to enrich resume context with exact missing artifacts.
Phase 1 orchestration `tools/codecome/phase_1.py`	`_run_subphase` destructures `check_phase_graceful_completion` tuple into `phase_ok` and `phase_failures`; preserves `phase_failures` alongside existing success side effects; passes accumulated `phase_failures` to `build_phase_resume_prompt` on mid-turn resume so diagnostic details inform the next iteration.
Core test updates `tests/test_phase_1_codeql_plan_repair.py`, `tests/test_phase_1_mid_turn_forgiveness.py`, `tests/test_phase_graceful_completion_subphases.py`, `tests/test_phases_completion.py`, `tests/test_render_settings_propagation.py`	All existing test mocks and assertions updated to handle (ok, failures) tuple; success cases assert `ok is True` and `failures == []`; failure cases assert `ok is False` and validate failure-detail content includes expected artifact paths (e.g., "itemdb/notes", "sandbox-plan.md", "runs/phase-X-summary*.md"); new tests validate Phase 2 glob-string correctness, resume-prompt opener wording varies by finish reason, and failure-detail reset behavior across retries.
New tests: failure-detail reset `tests/test_phase_failure_state_reset.py`	New test module with two tests verifying failure details from one attempt are not reused in subsequent resume attempt; one test covers harness path by patching harness/runner helpers; second test covers phase-1 subphase path; both assert `failure_details` argument to `build_phase_resume_prompt` transitions from populated list to `None` on resume, preventing stale diagnostics.
Session API update `tools/codecome/session.py`, `tests/test_session.py`	Updates `get_session_status()` to query `/session/status` instead of `/session/{session_id}`, decodes session-id-keyed status map, returns `"idle"` for missing entries, validates response is a dictionary; tests cover busy/retry/idle resolution, missing-entry defaults, and request-failure handling.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

pruiz/CodeCome#43: Both PRs modify the resume/repair control flow in tools/codecome/harness.py and tools/codecome/phase_1.py—main PR adds failure_details to resume prompts after incomplete runs, while retrieved PR adds early-exit logic for resume_not_ready to prevent those prompts.
pruiz/CodeCome#40: Both PRs modify tools/phases/completion.py and graceful-completion gating for Phase 1 subphases, with tightly coupled tests in tests/test_phase_graceful_completion_subphases.py, so changes overlap at artifact-freshness/check logic.

🐰 From gating's gates to prompts that state,
We've traced the artifacts before too late.
With failures named, and timestamps clear,
The harness knows just what's sincere. ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch harness-resilience-summary-gating

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-05T21:43:17Z

Coverage Report

Metric	Value
Line Coverage	76.5%
Lines Covered	0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-06T10:44:06.797Z

pruiz

Review comments added inline. Main theme: align the new “phase prompt is the source of truth for required artifacts” rule with what the completion gates and resume checklists actually enforce.

Copilot

Pull request overview

This PR improves phase-mode harness resilience by (1) enriching completion-gate diagnostics to make auto-resume actionable, (2) relaxing the Phase 2 completion gate to allow “0 new findings” runs to pass as long as a run summary is produced, and (3) standardizing run-summary filenames across prompts to be timestamped to avoid overwrites.

Changes:

check_phase_graceful_completion now returns (ok, failure_details) and Phase 2 gating is relaxed to require only a fresh run summary.
Resume prompts can include an explicit “Missing required artifacts” block populated from gate diagnostics.
Phase prompts now instruct timestamped run-summary filenames; agent docs remove “when practical” hedges and delegate artifact requirements to phase prompts; tests updated for the tuple return.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/phases/completion.py	Returns `(bool, list[str])` from completion gate; relaxes Phase 2 gate; adds failure-detail-driven resume prompt output; updates Phase 2 checklist.
tools/codecome/harness.py	Unpacks the new completion-gate tuple and forwards failure details into resume prompts.
tools/codecome/phase_1.py	Unpacks the new completion-gate tuple and forwards failure details into resume prompts for subphases.
tests/test_render_settings_propagation.py	Updates monkeypatches for the new tuple return shape.
tests/test_phases_completion.py	Updates assertions to validate `(ok, failures)` and adds coverage for Phase 2 relaxed gate + diagnostics + resume prompt rendering.
tests/test_phase_graceful_completion_subphases.py	Updates subphase tests for tuple return and validates failure detail presence.
tests/test_phase_1_mid_turn_forgiveness.py	Updates mocks to return `(ok, failures)` tuples.
tests/test_phase_1_codeql_plan_repair.py	Updates mocks to return `(ok, failures)` tuples.
prompts/phase-1a-profile.md	Switches run-summary path guidance to timestamped filename.
prompts/phase-1b-recon.md	Switches run-summary path guidance to timestamped filename; removes hardcoded summary path mentions.
prompts/phase-1c-sandbox.md	Switches run-summary path guidance to timestamped filename; removes hardcoded summary path mentions in prose.
prompts/phase-2-audit.md	Switches run-summary path guidance to timestamped filename.
prompts/phase-4-validate.md	Switches run-summary path guidance to timestamped filename and updates example.
prompts/phase-5-exploit.md	Switches run-summary path guidance to timestamped filename and updates example.
prompts/phase-6-report.md	Switches run-summary path guidance to timestamped filename.
.opencode/agents/auditor.md	Removes run-summary “when practical” guidance; adds delegation note pointing to phase prompt requirements.
.opencode/agents/reviewer.md	Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.opencode/agents/recon.md	Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.opencode/agents/exploiter.md	Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.opencode/agents/validator.md	Removes run-summary references; adds delegation note pointing to phase prompt requirements.
.project/harness-resilience-summary-gating-plan.md	Adds the design/implementation plan document for the change set.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

greptile-apps · 2026-06-05T21:59:13Z

Greptile Summary

This PR makes the phase harness more resilient for weak models by enriching resume diagnostics with specific missing-artifact details, relaxing the Phase 2 completion gate to require only a fresh run summary (zero new findings is a valid outcome), and switching run-summary filenames to a timestamped format to prevent overwrites on re-run.

completion.py: check_phase_graceful_completion now returns (bool, list[str]) carrying human-readable missing-artifact messages; Phase 2 gate drops the pending_fresh requirement; build_phase_resume_prompt accepts optional failure_details and renders a targeted "Missing required artifacts:" block.
harness.py / phase_1.py: callers unpack the new tuple, phase_failures/phase_ok are defensively initialised before the retry loop (resolving the previous unbound-variable concern), and failure details are forwarded to the resume prompt builder.
session.py: get_session_status migrated from a per-session endpoint to the global /session/status map; absence of a session_id in the response is now treated as idle.

Confidence Score: 5/5

Safe to merge; the changes are well-scoped harness retry/diagnostic improvements with 676 passing tests.

All structural changes — tuple return from the completion gate, failure-details forwarding, phase_failures/phase_ok initialisation — are correctly implemented and covered by targeted new tests. The session endpoint migration is the most externally-facing change but degrades gracefully to None on failure.

tools/phases/completion.py — the Phase 3 glob inconsistency and the dead code in _find_finding_file are minor but worth a second look before the final merge.

Important Files Changed

Filename	Overview
tools/phases/completion.py	Core gating logic refactored: check_phase_graceful_completion now returns (bool, list[str]); Phase 2 gate relaxed to summary-only; build_phase_resume_prompt gains optional failure_details. Minor: dead code in _find_finding_file and Phase 3 glob is stricter than Phase 2's backward-compatible form.
tools/codecome/harness.py	Properly unpacks new tuple from check_phase_graceful_completion; phase_failures and phase_ok are defensively initialised at lines 137-138 before the retry loop; failure_details forwarded to resume prompt builder.
tools/codecome/phase_1.py	_run_subphase now initialises phase_failures and phase_ok at lines 553-554 before the while-True loop (resolving the prior unbound-variable concern); tuple unpacking and failure_details forwarding are correct.
tools/codecome/session.py	get_session_status migrated from per-session GET /session/{id} to global GET /session/status; absence of session_id in the returned map is correctly treated as idle. Tests updated to match new API shape.
tests/test_phase_failure_state_reset.py	New test confirming that phase_failures from a previous attempt are not reused in subsequent retry loops, for both harness.run_phase_mode and phase_1._run_subphase.
tests/test_phases_completion.py	Comprehensive new tests for the tuple return shape, Phase 2 no-findings pass, failure-detail diagnostic strings, resume prompt rendering, and per-subphase run-summary requirements.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_run_single_attempt] --> B{returncode?}
    B -- "!= 0" --> INFRA[Infrastructure error\nfatal retry or break]
    B -- "== 0" --> C{finish_warning?}
    C -- "None" --> D{FINISH_TERMINAL_OK?}
    C -- "set" --> E{mid-turn AND\nno permission error?}
    E -- "yes" --> F[check_phase_graceful_completion\nreturns bool, failures]
    F -- "ok=True" --> G[graceful_forgiveness\nfinish_warning=None]
    F -- "ok=False" --> H[returncode=2]
    E -- "no" --> H
    D -- "yes" --> I[check_phase_graceful_completion]
    I -- "ok=True" --> J[continue to validation]
    I -- "ok=False" --> K[returncode=2\nfinish_warning set]
    D -- "no" --> J
    J --> L[frontmatter / artifact\nvalidation]
    L -- "pass" --> M[Phase complete]
    L -- "fail" --> N[resume with repair prompt]
    H --> O{retry budget?}
    K --> O
    O -- "yes" --> P[build_phase_resume_prompt\nfailure_details=phase_failures]
    O -- "no" --> Q[exit 2]
    P --> A
    G --> D

_{Reviews (2): Last reviewed commit: "fix(completion): keep path diagnostics a..." | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tools/codecome/phase_1.py (1)

626-643: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Initialize phase_failures before the mid-turn resume path.

If a subphase stops mid-turn with a permission error, Lines 627-645 skip the graceful-completion call, so phase_failures is never assigned. The auto-resume block still reads it at Line 770, which will raise UnboundLocalError instead of sending the resume prompt.

Suggested fix

     step_finish_count = 0
     transcript_path: Path = Path()
     finish_warning: str | None = None
     subphase_start_time = time.time()
 
@@
     while True:
         attempt_number += 1
+        phase_failures: list[str] = []
         _reset_subagent_state()
         finish_warning = None

Also applies to: 768-770

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/codecome/phase_1.py` around lines 626 - 643, The variable
phase_failures can be unbound when the mid-turn resume path skips calling
check_phase_graceful_completion; to fix, initialize phase_failures to a safe
default (e.g., an empty list or None) before the block that checks
finish_warning so it is always defined for the later auto-resume logic that
reads phase_failures; update the code around the finish_warning handling
(reference symbols: phase_failures, finish_warning, last_permission_error,
check_phase_graceful_completion) to set phase_failures = [] (or None) before the
conditional or explicitly assign it in the permission-error branch so the later
auto-resume code can safely use it.

🧹 Nitpick comments (1)

.project/harness-resilience-summary-gating-plan.md (1)
18-18: ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint.

These unlabeled fences trigger MD040. Add explicit languages (text, python, bash) so docs lint stays clean.

Also applies to: 112-112, 119-119, 146-146, 150-150, 155-155, 164-164, 178-178, 184-184, 199-199, 205-205, 220-220, 226-226, 241-241, 256-256, 276-276, 296-296
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.project/harness-resilience-summary-gating-plan.md at line 18, Update all
unlabeled fenced code blocks (``` ) in the document by adding explicit language
identifiers (e.g., ```text, ```bash, ```python) so they satisfy markdownlint
MD040; locate every triple-backtick fence in the file (including the instances
noted around lines 112, 119, 146, 150, 155, 164, 178, 184, 199, 205, 220, 226,
241, 256, 276, 296) and replace the opening fence with the appropriate language
token based on the block content.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/codecome/harness.py`:
- Around line 137-138: phase_failures is being initialized once and then reused
across retry attempts, leaking previous-attempt diagnostics into subsequent
resumes; to fix, reinitialize phase_failures = [] at the start of each retry
attempt (i.e., inside the retry loop, before any calls to
check_phase_graceful_completion() or before logic that sets/reads phase_ok) so
each attempt starts with a fresh failures list; apply the same change to the
other places where phase_failures is declared/used to ensure no stale failures
are carried between attempts.

In `@tools/phases/completion.py`:
- Around line 204-226: The code currently returns immediately after appending
the "required phase-1 notes are not all present" failure, which prevents later
checks (sandbox_state_recorded and summary_fresh) from running and leaves
failures incomplete; update the control flow in the function that builds the
failures list (the block that computes fresh_required via _path_is_fresh over
required_artifacts and appends the NOTES_ROOT/... message using
_PHASE1_REQUIRED_ARTIFACT_NAMES) to remove the early return and let execution
continue to evaluate sandbox_state_recorded and summary_fresh so all missing
items are appended to failures, then return once at the end (so
build_phase_resume_prompt receives a complete failure_details list).

---

Outside diff comments:
In `@tools/codecome/phase_1.py`:
- Around line 626-643: The variable phase_failures can be unbound when the
mid-turn resume path skips calling check_phase_graceful_completion; to fix,
initialize phase_failures to a safe default (e.g., an empty list or None) before
the block that checks finish_warning so it is always defined for the later
auto-resume logic that reads phase_failures; update the code around the
finish_warning handling (reference symbols: phase_failures, finish_warning,
last_permission_error, check_phase_graceful_completion) to set phase_failures =
[] (or None) before the conditional or explicitly assign it in the
permission-error branch so the later auto-resume code can safely use it.

---

Nitpick comments:
In @.project/harness-resilience-summary-gating-plan.md:
- Line 18: Update all unlabeled fenced code blocks (``` ) in the document by
adding explicit language identifiers (e.g., ```text, ```bash, ```python) so they
satisfy markdownlint MD040; locate every triple-backtick fence in the file
(including the instances noted around lines 112, 119, 146, 150, 155, 164, 178,
184, 199, 205, 220, 226, 241, 256, 276, 296) and replace the opening fence with
the appropriate language token based on the block content.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ec9cf21c-483b-4578-9a23-6af534c4bfeb

📥 Commits

Reviewing files that changed from the base of the PR and between f140f4b and 77584e5.

📒 Files selected for processing (21)

.opencode/agents/auditor.md
.opencode/agents/exploiter.md
.opencode/agents/recon.md
.opencode/agents/reviewer.md
.opencode/agents/validator.md
.project/harness-resilience-summary-gating-plan.md
prompts/phase-1a-profile.md
prompts/phase-1b-recon.md
prompts/phase-1c-sandbox.md
prompts/phase-2-audit.md
prompts/phase-4-validate.md
prompts/phase-5-exploit.md
prompts/phase-6-report.md
tests/test_phase_1_codeql_plan_repair.py
tests/test_phase_1_mid_turn_forgiveness.py
tests/test_phase_graceful_completion_subphases.py
tests/test_phases_completion.py
tests/test_render_settings_propagation.py
tools/codecome/harness.py
tools/codecome/phase_1.py
tools/phases/completion.py

…, 5, 6; align phase 2 glob; honor resume reason Address PR #46 review feedback (owner + Copilot): - Subphase gates (1a, 1b, 1c) now also check for a fresh runs/phase-X-summary*.md via the new _append_run_summary_check helper in tools/phases/completion.py. - Phase 4 and 5 gates now also check for a fresh runs/phase-{4|5}-<finding>-summary*.md. - Phase 6 gate now also checks for a fresh runs/phase-6-summary*.md. - Phase 3 resume checklist now includes a run-summary line in phase_checklist_lines, matching the already-enforced gate check. - Phase 2 diagnostic message and glob now use the same pattern (no mandatory hyphen after 'summary'): both are runs/phase-2-summary*.md. - build_phase_resume_prompt opening line is no longer a universal 'Your previous run completed'. A new _resume_opener_for_reason helper classifies the recorded reason (infrastructure_error, mid-turn cutoffs, finish failures, graceful_forgiveness, terminal-OK with missing artifacts, or unknown) and renders a context-specific opener. - tools/codecome/phase_1.py defensively initializes phase_failures and phase_ok at the top of _run_subphase, mirroring the pattern already established in tools/codecome/harness.py. This eliminates the UnboundLocalError risk on the path that builds the resume prompt when the run was set to returncode=2 without ever entering the graceful-completion branch. Tests: - 15 new tests across 5 new classes in tests/test_phases_completion.py: * TestPhase2GlobStringMatchesDiagnostic * TestResumePromptOpenerDistinguishesReasons * TestSubphaseGatesRequireRunSummary * TestPhase45And6GatesRequireRunSummary * TestPhase3ChecklistMentionsRunSummary - Existing test_resume_prompt_with_failure_details_lists_missing_artifacts updated to use the unhyphenated glob fragment. - tests/test_phase_graceful_completion_subphases.py: positive tests now create a fresh runs/phase-1{a,b,c}-summary.md; negative tests assert the new summary failure fragment. make tests: 676 -> 691 pass; pre-existing threat-model.md check-phase-artifacts warning is unrelated (verified with stash).

Append section 8 to .project/harness-resilience-summary-gating-plan.md documenting the owner-review-driven expansion: - Subphase 1a/1b/1c, phase 4, phase 5, and phase 6 gates now also check for a fresh run summary. - Phase 3 resume checklist now includes a run-summary line. - Phase 2 diagnostic and glob are now consistent (no mandatory hyphen). - build_phase_resume_prompt opener is now context-specific via _resume_opener_for_reason. - tools/codecome/phase_1.py has defensive init of phase_failures and phase_ok. The original plan explicitly deferred subphase summary gating as a non-goal and did not gate phases 4/5/6 on summary either; the follow-up section records that the review pushed back and that this commit resolves the gap.

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

…, 5, 6; align phase 2 glob; honor resume reason Address PR #46 review feedback (owner + Copilot): - Subphase gates (1a, 1b, 1c) now also check for a fresh runs/phase-X-summary*.md via the new _append_run_summary_check helper in tools/phases/completion.py. - Phase 4 and 5 gates now also check for a fresh runs/phase-{4|5}-<finding>-summary*.md. - Phase 6 gate now also checks for a fresh runs/phase-6-summary*.md. - Phase 3 resume checklist now includes a run-summary line in phase_checklist_lines, matching the already-enforced gate check. - Phase 2 diagnostic message and glob now use the same pattern (no mandatory hyphen after 'summary'): both are runs/phase-2-summary*.md. - build_phase_resume_prompt opening line is no longer a universal 'Your previous run completed'. A new _resume_opener_for_reason helper classifies the recorded reason (infrastructure_error, mid-turn cutoffs, finish failures, graceful_forgiveness, terminal-OK with missing artifacts, or unknown) and renders a context-specific opener. - tools/codecome/phase_1.py defensively initializes phase_failures and phase_ok at the top of _run_subphase, mirroring the pattern already established in tools/codecome/harness.py. This eliminates the UnboundLocalError risk on the path that builds the resume prompt when the run was set to returncode=2 without ever entering the graceful-completion branch. Tests: - 15 new tests across 5 new classes in tests/test_phases_completion.py: * TestPhase2GlobStringMatchesDiagnostic * TestResumePromptOpenerDistinguishesReasons * TestSubphaseGatesRequireRunSummary * TestPhase45And6GatesRequireRunSummary * TestPhase3ChecklistMentionsRunSummary - Existing test_resume_prompt_with_failure_details_lists_missing_artifacts updated to use the unhyphenated glob fragment. - tests/test_phase_graceful_completion_subphases.py: positive tests now create a fresh runs/phase-1{a,b,c}-summary.md; negative tests assert the new summary failure fragment. make tests: 676 -> 691 pass; pre-existing threat-model.md check-phase-artifacts warning is unrelated (verified with stash).

pruiz added 2 commits June 5, 2026 23:37

docs: include harness-resilience plan in branch for review

77584e5

Records the design rationale and verification checklist for the harness resilience changes implemented in 7cd8843. Kept on the feature branch for PR review; the master commit itself ships only the code changes.

pruiz commented Jun 5, 2026

View reviewed changes

Comment thread tools/phases/completion.py

Comment thread tools/phases/completion.py

Comment thread tools/phases/completion.py

Comment thread tools/phases/completion.py

Comment thread tools/phases/completion.py

pruiz requested a review from Copilot June 5, 2026 21:53

Copilot started reviewing on behalf of pruiz June 5, 2026 21:53 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread tools/codecome/phase_1.py

Comment thread tools/phases/completion.py

Comment thread tools/phases/completion.py

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread tools/codecome/harness.py

Comment thread tools/phases/completion.py

pruiz added 3 commits June 6, 2026 00:28

fix(session): poll opencode status map for resume readiness

c666446

pruiz requested a review from Copilot June 6, 2026 10:31

Copilot started reviewing on behalf of pruiz June 6, 2026 10:31 View session

Copilot AI reviewed Jun 6, 2026

View reviewed changes

Comment thread tools/phases/completion.py Outdated

Comment thread tools/phases/completion.py Outdated

Comment thread tools/phases/completion.py Outdated

Comment thread tools/phases/completion.py Outdated

pruiz added 2 commits June 6, 2026 12:39

fix(harness): reset retry diagnostics before each attempt

55c2f71

fix(completion): keep path diagnostics actionable

5f70fde

pruiz marked this pull request as ready for review June 6, 2026 14:26

pruiz merged commit 3a10252 into master Jun 6, 2026
7 of 8 checks passed

Conversation

pruiz commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Changes

Verification

Non-goals

WIP checklist for review

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Possibly Related PRs

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

pruiz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pruiz commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading