Skip to content

Run artifact validation + auto-repair after mid-turn model cutoffs (not only clean finishes) #39

@pruiz

Description

@pruiz

Problem

When a Phase 1 subphase encounters a mid-turn model/provider cutoff (returncode == 2, finish reason in _FINISH_MID_TURN), the harness either retries the iteration (max 1) or gives up. It never runs artifact validation or auto-repair because those gates are guarded by returncode == 0.

This means valid artifacts that were already written to disk by the model before the cutoff are never validated, and the auto-repair loop (which can fix formatting issues like H1 vs H2 heading errors in threat-model.md) is never triggered.

Real example

Phase 1b produced all artifacts correctly (19.7K threat-model.md, all other notes) but the model was cut off by the provider mid-stream. The harness reported "Phase 1b did not complete cleanly (exit code 2)" and stopped. The artifacts existed on disk but were never validated. The threat-model.md had the right content but wrong heading levels (H2 instead of H1), which the auto-repair loop would have caught and fixed — if it had been allowed to run.

Proposed fix

In tools/codecome/phase_1.py::_run_subphase(), after a mid-turn cutoff:

  1. Check if any files were produced during the attempt (look for file.edited events or stat the expected artifact paths for freshness).
  2. If artifacts were produced, set returncode = 0 (fake clean finish) so the existing CodeQL plan / frontmatter / artifact validation blocks run.
  3. If the validation blocks find errors, the auto-repair loop resumes the session with a targeted repair prompt — exactly as if the model had finished cleanly.
  4. If no artifacts were produced at all, fall through to the existing iteration retry.

Implementation sketch

In _run_subphase(), modify the mid-turn handling:

if returncode == 2 and last_finish_reason in _FINISH_MID_TURN:
    if _subphase_produced_artifacts(phase_id, run_start_time):
        # Artifacts exist — run validation + auto-repair
        returncode = 0
        # Falls through to the validation blocks below
        # (CodeQL plan → frontmatter → artifacts)
    elif iteration_retry_count < max_iteration_retries:
        iteration_retry_count += 1
        # ... resume prompt, continue ...

The _subphase_produced_artifacts() helper checks whether any expected Phase 1b artifacts were created/modified during this run (using mtime >= run_start_time and the existing REQUIRED_NOTES_1B list).

Acceptance criteria

  • Phase 1b that produces artifacts but cuts off mid-turn triggers artifact validation
  • Auto-repair loop fires when validation finds heading/format errors
  • If no artifacts were produced at all, the existing iteration retry behavior is unchanged
  • Phase 1a and 1c benefit from the same fix
  • Tests cover: artifacts produced + cutoff → validation runs; no artifacts + cutoff → iteration retry

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions