Problem
When a Phase 1 subphase encounters a mid-turn model/provider cutoff (returncode == 2, finish reason in _FINISH_MID_TURN), the harness either retries the iteration (max 1) or gives up. It never runs artifact validation or auto-repair because those gates are guarded by returncode == 0.
This means valid artifacts that were already written to disk by the model before the cutoff are never validated, and the auto-repair loop (which can fix formatting issues like H1 vs H2 heading errors in threat-model.md) is never triggered.
Real example
Phase 1b produced all artifacts correctly (19.7K threat-model.md, all other notes) but the model was cut off by the provider mid-stream. The harness reported "Phase 1b did not complete cleanly (exit code 2)" and stopped. The artifacts existed on disk but were never validated. The threat-model.md had the right content but wrong heading levels (H2 instead of H1), which the auto-repair loop would have caught and fixed — if it had been allowed to run.
Proposed fix
In tools/codecome/phase_1.py::_run_subphase(), after a mid-turn cutoff:
- Check if any files were produced during the attempt (look for
file.edited events or stat the expected artifact paths for freshness).
- If artifacts were produced, set
returncode = 0 (fake clean finish) so the existing CodeQL plan / frontmatter / artifact validation blocks run.
- If the validation blocks find errors, the auto-repair loop resumes the session with a targeted repair prompt — exactly as if the model had finished cleanly.
- If no artifacts were produced at all, fall through to the existing iteration retry.
Implementation sketch
In _run_subphase(), modify the mid-turn handling:
if returncode == 2 and last_finish_reason in _FINISH_MID_TURN:
if _subphase_produced_artifacts(phase_id, run_start_time):
# Artifacts exist — run validation + auto-repair
returncode = 0
# Falls through to the validation blocks below
# (CodeQL plan → frontmatter → artifacts)
elif iteration_retry_count < max_iteration_retries:
iteration_retry_count += 1
# ... resume prompt, continue ...
The _subphase_produced_artifacts() helper checks whether any expected Phase 1b artifacts were created/modified during this run (using mtime >= run_start_time and the existing REQUIRED_NOTES_1B list).
Acceptance criteria
- Phase 1b that produces artifacts but cuts off mid-turn triggers artifact validation
- Auto-repair loop fires when validation finds heading/format errors
- If no artifacts were produced at all, the existing iteration retry behavior is unchanged
- Phase 1a and 1c benefit from the same fix
- Tests cover: artifacts produced + cutoff → validation runs; no artifacts + cutoff → iteration retry
Problem
When a Phase 1 subphase encounters a mid-turn model/provider cutoff (
returncode == 2, finish reason in_FINISH_MID_TURN), the harness either retries the iteration (max 1) or gives up. It never runs artifact validation or auto-repair because those gates are guarded byreturncode == 0.This means valid artifacts that were already written to disk by the model before the cutoff are never validated, and the auto-repair loop (which can fix formatting issues like H1 vs H2 heading errors in
threat-model.md) is never triggered.Real example
Phase 1b produced all artifacts correctly (19.7K threat-model.md, all other notes) but the model was cut off by the provider mid-stream. The harness reported "Phase 1b did not complete cleanly (exit code 2)" and stopped. The artifacts existed on disk but were never validated. The
threat-model.mdhad the right content but wrong heading levels (H2 instead of H1), which the auto-repair loop would have caught and fixed — if it had been allowed to run.Proposed fix
In
tools/codecome/phase_1.py::_run_subphase(), after a mid-turn cutoff:file.editedevents or stat the expected artifact paths for freshness).returncode = 0(fake clean finish) so the existing CodeQL plan / frontmatter / artifact validation blocks run.Implementation sketch
In
_run_subphase(), modify the mid-turn handling:The
_subphase_produced_artifacts()helper checks whether any expected Phase 1b artifacts were created/modified during this run (using mtime >=run_start_timeand the existingREQUIRED_NOTES_1Blist).Acceptance criteria