feat: add Mode E (Harness) — build-and-verify with multi-role evaluation#1
Closed
bohemianpan wants to merge 4 commits intoiamtouchskyer:mainfrom
Closed
feat: add Mode E (Harness) — build-and-verify with multi-role evaluation#1bohemianpan wants to merge 4 commits intoiamtouchskyer:mainfrom
bohemianpan wants to merge 4 commits intoiamtouchskyer:mainfrom
Conversation
added 4 commits
March 31, 2026 13:39
Integrates the harness build-and-verify pipeline as OPC's 5th mode. The implementer builds code, then OPC specialist roles evaluate it from multiple angles. Coordinator synthesizes into PASS/ITERATE/FAIL and drives fix-and-retest iteration until quality passes. - Mode E triage with signal keywords - Two-phase dispatch: Build then Multi-Role Evaluation - Mode E evaluator template: OPC role persona + harness evaluation framework - Verdict synthesis with iteration loop (10-round cap) - Per-role report format with iteration history - JSON schema extensions for harness state - Implementer and handoff prompt templates in prompts/
Adds harness build-and-verify pipeline as OPC Mode E. The harness skill lives in harness/ as independent files that can be updated by replacing them. Only 6 lines added to skill.md (triage row + pointer to harness/mode-e.md). harness/mode-e.md bridges the two systems: harness implementer builds code, OPC roles evaluate from multiple specialist angles, coordinator synthesizes into PASS/ITERATE/FAIL.
iamtouchskyer
pushed a commit
that referenced
this pull request
Apr 15, 2026
…dd role tag check - Escalate identical heading from warning to error (same heading = same agent) - Add role/agent/reviewer tag extraction and comparison - Eval files with identical Role: tags now produce hard error Gap #1 of 10: review independence enforcement Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
iamtouchskyer
added a commit
that referenced
this pull request
Apr 19, 2026
Addresses 3 ITERATE findings from U1.6r contract + semantics reviewers: 1. fireArtifactEmit: recordSuccess was unconditionally resetting _failStreak after per-item write failures, so circuit-breaker would never trip on persistent write failures. Track anyItemFailed and only call recordSuccess when every item in the call succeeded. (semantics F1, contract #2) 2. fireArtifactEmit: accept ArrayBufferView (Uint8Array, DataView) in addition to string / Buffer. Modern APIs (crypto.subtle, TextEncoder, Playwright) commonly return Uint8Array — tight Buffer.isBuffer check was silently dropping them with a misleading WARN. (semantics F2) 3. cmdExtensionArtifact: add nodeCapabilities to stdout JSON for consistency with cmdExtensionVerdict. (contract #1) 4. CONTRIBUTING.md: document executeRun + artifactEmit hooks with sample skeleton + hook surface summary table. (contract #3) Regression tests: 4 new tests (Uint8Array accepted, _failStreak persists across calls, success reset is all-or-nothing, CLI JSON includes nodeCapabilities). Total 118/118 extension tests, 22/22 suite files green.
Owner
|
Closing — superseded by #2 (already merged). This PR also has merge conflicts. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
harness/mode-e.mdharness/as independent files — update by replacing themharness/mode-e.mdis the bridge: defines how OPC roles replace harness's single evaluatorHow it works
Files
Usage
Test plan
/opc harnesstriggers Mode E triage.harness/state