Skip to content

feat: add Mode E (Harness) — build-and-verify with multi-role evaluation#1

Closed
bohemianpan wants to merge 4 commits intoiamtouchskyer:mainfrom
bohemianpan:feature/mode-e-harness
Closed

feat: add Mode E (Harness) — build-and-verify with multi-role evaluation#1
bohemianpan wants to merge 4 commits intoiamtouchskyer:mainfrom
bohemianpan:feature/mode-e-harness

Conversation

@bohemianpan
Copy link
Copy Markdown
Contributor

Summary

  • Adds Mode E (Harness) to OPC — build code with harness, evaluate with OPC specialist roles
  • Only 6 lines added to skill.md — triage row + pointer to harness/mode-e.md
  • Harness lives in harness/ as independent files — update by replacing them
  • harness/mode-e.md is the bridge: defines how OPC roles replace harness's single evaluator

How it works

  1. Implementer builds code (harness pipeline)
  2. OPC roles evaluate from specialist angles (security, tester, PM, etc.)
  3. Coordinator synthesizes into PASS/ITERATE/FAIL
  4. Fix-and-retest loop until quality passes (10-round cap)

Files

harness/
├── SKILL.md                 # Standard harness (update independently)
├── implementer-prompt.md    # Build/Fix/Polish modes
├── evaluator-prompt.md      # Standalone evaluator (used as reference)
├── handoff-template.md      # Structured handoff between build and eval
└── mode-e.md                # OPC bridge: multi-role eval + verdict synthesis

Usage

/opc harness <task>

Test plan

  • /opc harness triggers Mode E triage
  • Implementer builds code and writes .harness/ state
  • OPC roles receive Mode E evaluator template with their persona
  • Coordinator synthesizes PASS/ITERATE/FAIL correctly
  • Existing modes A-D unaffected (only 6 lines changed in skill.md)

Dazhen Pan added 4 commits March 31, 2026 13:39
Integrates the harness build-and-verify pipeline as OPC's 5th mode.
The implementer builds code, then OPC specialist roles evaluate it
from multiple angles. Coordinator synthesizes into PASS/ITERATE/FAIL
and drives fix-and-retest iteration until quality passes.

- Mode E triage with signal keywords
- Two-phase dispatch: Build then Multi-Role Evaluation
- Mode E evaluator template: OPC role persona + harness evaluation framework
- Verdict synthesis with iteration loop (10-round cap)
- Per-role report format with iteration history
- JSON schema extensions for harness state
- Implementer and handoff prompt templates in prompts/
Adds harness build-and-verify pipeline as OPC Mode E. The harness
skill lives in harness/ as independent files that can be updated
by replacing them. Only 6 lines added to skill.md (triage row +
pointer to harness/mode-e.md).

harness/mode-e.md bridges the two systems: harness implementer
builds code, OPC roles evaluate from multiple specialist angles,
coordinator synthesizes into PASS/ITERATE/FAIL.
iamtouchskyer pushed a commit that referenced this pull request Apr 15, 2026
…dd role tag check

- Escalate identical heading from warning to error (same heading = same agent)
- Add role/agent/reviewer tag extraction and comparison
- Eval files with identical Role: tags now produce hard error

Gap #1 of 10: review independence enforcement

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
iamtouchskyer added a commit that referenced this pull request Apr 19, 2026
Addresses 3 ITERATE findings from U1.6r contract + semantics reviewers:

1. fireArtifactEmit: recordSuccess was unconditionally resetting _failStreak
   after per-item write failures, so circuit-breaker would never trip on
   persistent write failures. Track anyItemFailed and only call recordSuccess
   when every item in the call succeeded. (semantics F1, contract #2)

2. fireArtifactEmit: accept ArrayBufferView (Uint8Array, DataView) in addition
   to string / Buffer. Modern APIs (crypto.subtle, TextEncoder, Playwright)
   commonly return Uint8Array — tight Buffer.isBuffer check was silently
   dropping them with a misleading WARN. (semantics F2)

3. cmdExtensionArtifact: add nodeCapabilities to stdout JSON for consistency
   with cmdExtensionVerdict. (contract #1)

4. CONTRIBUTING.md: document executeRun + artifactEmit hooks with sample
   skeleton + hook surface summary table. (contract #3)

Regression tests: 4 new tests (Uint8Array accepted, _failStreak persists
across calls, success reset is all-or-nothing, CLI JSON includes
nodeCapabilities). Total 118/118 extension tests, 22/22 suite files green.
@iamtouchskyer
Copy link
Copy Markdown
Owner

Closing — superseded by #2 (already merged). This PR also has merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants