Implement Verifier trait and standard implementations (SelfVerifying loop strategy)

## Context

`Verifier` is referenced in `LoopStrategy::SelfVerifying` in the Harness (#3):

```
SelfVerifying {
  verifier: Box<dyn Verifier>,
  evaluator_harness: Arc<dyn Harness>,
}
```

The `Verifier` is distinct from `CompletionCheck` (#43). `CompletionCheck` answers "is the task done?" `Verifier` answers "is what was produced correct?" — it is the oracle that the SelfVerifying loop uses to decide whether the evaluator's verdict should halt the build loop or continue it.

Currently a stub in the harness module tagged `// SPEC: full trait lives in this issue`. The `SelfVerifying` strategy returns `HaltReason::StrategyNotYetImplemented` until both this trait and #43 are implemented.

## The SelfVerifying Loop Pattern

```
SelfVerifying loop:

  // Build phase — standard ReAct loop until agent claims done
  run_standard_loop(context) → build_result

  // Evaluate phase — separate evaluator harness
  // Read-only sandbox, fresh session, explicit evaluator role chunk
  // Default-FAIL contract: evaluator cannot be biased by watching the build
  eval_result = evaluator_harness.run(eval_task)

  // Verifier decides what to do with the evaluator's verdict
  match verifier.verify(build_result, eval_result):
    Passed          → HaltSuccess
    Failed { why }  → inject why into build context, continue build loop
```

The `Verifier` sits between the evaluator harness output and the build loop decision. It translates the evaluator's `RunResult` into an actionable verdict.

## Trait Definition

```
VerifierVerdict {
  Passed,
  Failed { reason: String },   // injected into build context next turn
}

// Input to the verifier — what the build produced and what the evaluator said
VerifierInput {
  build_result: RunResult,
  eval_result: RunResult,
  workspace: PathBuf,
  iteration: u32,              // which build-evaluate cycle this is
}

trait Verifier {
  async fn verify(input: &VerifierInput) -> VerifierVerdict

  // Maximum number of build-evaluate cycles before giving up.
  // Prevents infinite build loops when the evaluator always finds problems.
  fn max_iterations() -> u32   // default: 3
}
```

## Standard Implementations

### EvaluatorResponseVerifier
Parses the evaluator harness's `RunResult::Success { output }` for pass/fail signals. The simplest verifier — trusts the evaluator's final text response.

```
EvaluatorResponseVerifier {
  pass_pattern: String,    // regex: if output matches this, Passed
  fail_pattern: String,    // regex: if output matches this, extract reason
  max_iterations: u32,
}
```

### TestSuiteVerifier
Runs the test suite after the evaluator completes and uses the result as the verdict. Ignores the evaluator's text output — ground truth is the tests.

```
TestSuiteVerifier {
  command: String,
  working_dir: PathBuf,
  timeout: Duration,
  sandbox: Arc<dyn SandboxProvider>,
  max_iterations: u32,
}
```

### CompositeVerifier
Passes only when all child verifiers pass.

```
CompositeVerifier {
  verifiers: Vec<Box<dyn Verifier>>,
  max_iterations: u32,
}
```

## Evaluator Harness Constraints

The `evaluator_harness` in `SelfVerifying` must be constructed with:
- **Read-only sandbox** — `SandboxProvider::read_only(workspace)`. No write or edit tools.
- **Fresh session** — always a new `SessionId`, never shares with the build harness.
- **Evaluator role chunk** — `"role-evaluator"` from `PromptChunkRegistry`. This chunk must be registered in the standard chunk library before `SelfVerifying` is usable.
- **`Mode::AlwaysAsk`** — evaluator never acts, only reports.

`SubagentTool::new()` already enforces no nested subagents. The evaluator harness is a peer harness, not a subagent — it is constructed directly by the caller and injected, not spawned by the build harness.

## Checklist

- [ ] Rust: `Verifier` trait + `VerifierVerdict` + `VerifierInput` + all standard implementations
- [ ] TypeScript: same
- [ ] Python: same
- [ ] Go: same
- [ ] Unit tests: each implementation returns correct verdict for pass and fail cases
- [ ] `max_iterations` enforcement tested — loop halts after N cycles even without Passed verdict
- [ ] Harness (#3) SelfVerifying stub replaced with real execution using `Verifier`
- [ ] `"role-evaluator"` chunk registered in standard chunk library (#24)
- [ ] Fixture: `fixtures/verifier/evaluator_pass.jsonl`, `fixtures/verifier/evaluator_fail.jsonl`

## Related Issues

- #3 Harness runtime loop (SelfVerifying strategy depends on this)
- #43 CompletionCheck (sibling — Ralph strategy's equivalent)
- #24 PromptChunkRegistry (role-evaluator chunk must exist)
- #6 SandboxProvider (read-only mode required for evaluator harness)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Verifier trait and standard implementations (SelfVerifying loop strategy) #44

Context

The SelfVerifying Loop Pattern

Trait Definition

Standard Implementations

EvaluatorResponseVerifier

TestSuiteVerifier

CompositeVerifier

Evaluator Harness Constraints

Checklist

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Verifier trait and standard implementations (SelfVerifying loop strategy) #44

Description

Context

The SelfVerifying Loop Pattern

Trait Definition

Standard Implementations

EvaluatorResponseVerifier

TestSuiteVerifier

CompositeVerifier

Evaluator Harness Constraints

Checklist

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions