Implement CompletionCheck trait and standard implementations

## Context

`CompletionCheck` is referenced in two loop strategies in the Harness (#3):
- `LoopStrategy::Ralph { completion_check: Box<dyn CompletionCheck> }` — the external check that determines when a Ralph continuation loop is actually done
- `LoopStrategy::SelfVerifying { verifier, evaluator_harness }` — uses the termination policy's `CompletionCheck` to determine when the build phase should stop and hand off to the evaluator

Currently a stub in the harness module tagged `// SPEC: full trait lives in this issue`. The Ralph and SelfVerifying strategies both return `HaltReason::StrategyNotYetImplemented` until this trait is implemented.

## Trait Definition

```
// Returns None if the task is complete, Some(reason) if not done yet.
// reason is injected into the next turn's context — tells the agent
// what it still needs to do. This is what prevents premature victory.
trait CompletionCheck {
  async fn check(state: &SessionStateSnapshot) -> Option<String>

  // Human-readable description of what this check evaluates.
  // Injected into agent context at session start so it understands
  // what "done" means for this task.
  fn description() -> String
}
```

## Standard Implementations

### FeatureListCheck
Reads `feature_list.json` from the workspace. Returns `Some` with the list of incomplete features if any have `passes: false`. Returns `None` when all features pass.

```
FeatureListCheck {
  path: PathBuf,   // default: "feature_list.json"
}
```

### TestSuiteCheck
Runs the test suite and returns `Some(failure_summary)` if any tests fail. Returns `None` when the full suite passes.

```
TestSuiteCheck {
  command: String,          // e.g. "npm test", "cargo test", "pytest"
  working_dir: PathBuf,
  timeout: Duration,
  sandbox: Arc<dyn SandboxProvider>,
}
```

### QuestionAnsweredCheck
LLM-as-judge: evaluates whether the agent's final response actually answered the user's question. Used for RAG and conversational agents.

```
QuestionAnsweredCheck {
  judge_model: ModelConfig,
  original_question: String,
  rubric: Option<String>,    // custom evaluation criteria
}
```

### SqlResultCheck
Validates that the SQL result set is non-empty and structurally correct (column names match expectation). Used for NL-to-SQL agents.

```
SqlResultCheck {
  expected_columns: Option<Vec<String>>,
  min_rows: Option<usize>,
}
```

### AlwaysComplete
Returns `None` immediately — task is always considered done when the model claims it is. Used for simple single-turn tasks where the model's self-assessment is sufficient.

```
AlwaysComplete
```

## Relationship to TerminationPolicy

`CompletionCheck` is injected into `TerminationPolicy` and called only when `agent_claims_done: true`. The `TerminationPolicy` evaluates budget limits first (unconditionally), then sensor results, then calls `CompletionCheck`. This is the mechanism that prevents premature victory — the agent claims done, the check says "not yet, here's what's missing", and the harness injects that reason into the next turn.

## Checklist

- [ ] Rust: `CompletionCheck` trait + all standard implementations
- [ ] TypeScript: `CompletionCheck` trait + all standard implementations
- [ ] Python: `CompletionCheck` trait + all standard implementations
- [ ] Go: `CompletionCheck` trait + all standard implementations
- [ ] Unit tests: each implementation returns correct `None` / `Some(reason)` for its domain
- [ ] Harness (#3) stubs replaced with real `CompletionCheck` in Ralph and TerminationPolicy
- [ ] `TerminationPolicy` (#13) updated to call `CompletionCheck` correctly
- [ ] Fixture: `fixtures/completion_checks/feature_list_complete.jsonl`

## Related Issues

- #3 Harness runtime loop (Ralph and SelfVerifying strategies depend on this)
- #13 TerminationPolicy (calls CompletionCheck when agent claims done)
- #23 MetricEvaluator (sibling — HillClimbing's equivalent of CompletionCheck)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CompletionCheck trait and standard implementations #43

Context

Trait Definition

Standard Implementations

FeatureListCheck

TestSuiteCheck

QuestionAnsweredCheck

SqlResultCheck

AlwaysComplete

Relationship to TerminationPolicy

Checklist

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement CompletionCheck trait and standard implementations #43

Description

Context

Trait Definition

Standard Implementations

FeatureListCheck

TestSuiteCheck

QuestionAnsweredCheck

SqlResultCheck

AlwaysComplete

Relationship to TerminationPolicy

Checklist

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions