docs: AW harness design plan (Pi SDK + extensions)#29667
Conversation
Design plan for the 'aw' engine harness — a Pi SDK-based replacement for copilot_harness.cjs that adds multi-step DAG orchestration, multi-agent/multi-model support, context engineering, cost tracking, steering, session repair, and observability. Key design decisions: - Built on Pi agent ecosystem (pi-coding-agent, pi-agent-core, pi-ai) - All gh-aw capabilities implemented as Pi extensions (ExtensionAPI) - Model resolution via existing api-proxy (no provider inference) - TypeScript compiled for Node 24, bundled as single .cjs - New opt-in engine (engine: aw) — existing engines untouched Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment MemoryNote This comment is managed by comment memory.It stores persistent context for this thread in the code block at the top of this comment. |
There was a problem hiding this comment.
Pull request overview
Adds a design document for a proposed aw engine harness that would replace the current single-CLI retry wrapper with a Pi SDK-based orchestration layer for multi-step, multi-agent workflows.
Changes:
- Introduces the proposed AW harness architecture, including Pi extension-based capabilities such as MCP bridging, safe outputs, cost tracking, repair, checkpointing, and observability.
- Defines a draft workflow/frontmatter format for
engine: aw, step extraction from markdown headings, and DAG-based execution. - Documents an implementation/build/testing plan for packaging a bundled
aw_harness.cjsintoactions/setup/js/.
Show a summary per file
| File | Description |
|---|---|
specs/aw-harness.md |
New end-to-end design spec for the proposed AW engine, covering architecture, workflow format, extension model, runtime orchestration, and rollout/build plan. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 9
| pi.on("agent_end", async (event, ctx) => { | ||
| await finalizeSafeOutputManifest(); |
|
|
||
| for (const stepGroup of dag.executionOrder()) { | ||
| await Promise.all(stepGroup.map(async (step) => { | ||
| const { session } = await createAgentSession({ | ||
| sessionManager: SessionManager.inMemory(), | ||
| extensions, | ||
| model: resolveModel(step.agent?.model || workflow.defaultModel), | ||
| systemPrompt: buildSystemPrompt(step), | ||
| }); | ||
|
|
||
| const prompt = buildStepPrompt(step, transcripts); | ||
| await session.prompt(prompt); | ||
|
|
||
| transcripts[step.name] = captureTranscript(session); | ||
| session.dispose(); | ||
| })); |
| // index.ts — entry point | ||
| import { createAgentSession, SessionManager } from "@mariozechner/pi-coding-agent"; | ||
|
|
||
| async function main() { | ||
| const workflow = parseWorkflow(process.argv[2]); |
| export default function(pi: ExtensionAPI) { | ||
| const budget = loadBudgetConfig(); | ||
| let totalCost = 0; | ||
|
|
||
| pi.on("turn_end", async (event, ctx) => { | ||
| totalCost += extractCostFromTurn(event); |
| 1. **Scaffold project** — Initialize TypeScript project in `aw-harness/`. Configure package.json with Pi SDK deps (`@mariozechner/pi-coding-agent`, `pi-agent-core`, `pi-ai`). Set up tsconfig for ES2024/Node 24. Configure esbuild bundle → `dist/aw_harness.cjs`. | ||
|
|
||
| 2. **Implement api-proxy provider extension** — Pi extension that registers the api-proxy as a custom provider via `pi.registerProvider()`. Async factory fetches available models at startup. All model requests route through the proxy. |
| **Step extraction**: Each `## Heading` or `### Heading` is a potential step. Linked to `harness.steps` via `<!-- harness-step: name -->` HTML comments. | ||
|
|
||
| **Implicit behavior**: | ||
|
|
||
| - No `<!-- harness-step -->` → sequential by default (document order) | ||
| - With annotation → follows `harness.steps` config (parallel, depends, agent) |
| for (const stepGroup of dag.executionOrder()) { | ||
| await Promise.all(stepGroup.map(async (step) => { | ||
| const { session } = await createAgentSession({ | ||
| sessionManager: SessionManager.inMemory(), |
| | `engine: copilot` (existing) | Uses current `copilot_harness.cjs` — unchanged | | ||
| | `engine: claude` (existing) | Uses current Claude Code flow — unchanged | | ||
| | `engine: codex` (existing) | Uses current Codex flow — unchanged | | ||
| | `engine: aw` without `harness:` block | Single-step: entire body = one Pi session prompt | | ||
| | `engine: aw` with `harness:` block | Multi-step orchestration mode | | ||
| | `engine: aw` with `harness.steps` | Explicit DAG (parallel, depends, agent assignment) | | ||
| | `engine: aw` without `harness.agents` | All steps use `engine.model` | |
| apiKey: "AW_PROXY_API_KEY", // env var name — Pi resolves it | ||
| api: "openai-completions", // api-proxy speaks OpenAI protocol |
|
@copilot when this harness is selected, GH-proxy and CLI proxy features are always enabled. You cannot disable them because it does not support MCP natively. |
… native MCP) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/01e7b2ba-9d7a-4706-a21a-f8b576ecb523 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Updated in 370d10c. The spec now documents in three places that
|
|
@copilot rewrite harness spec using the W3c-spec-writer |
|
@copilot the harness has no predefined context, all ambient imported files or skills must be declared |
- Add W3C-style title block, Abstract, Status of This Document - Add RFC 2119 Conformance section with keyword table - Add formal Terminology and Definitions section for all key terms - Restructure Architecture as normative section with numbered Design Principles - Add Harness Invocation Contract (entry point, env vars, exit codes, streams) - Restructure Workflow Definition with formal sub-schemas for each harness.* key - Add normative Overrides and Fixed Settings table (Section 6.2) - Add formal DAG Execution Model with normative algorithm (Section 7) - Restructure all 8 extensions with Requirements + non-normative examples - Move Model Resolution to its own normative section (Section 9) - Move Build/Deployment and Implementation Plan to informative section (Section 10) - Preserve full backwards-compatibility table including gh-proxy override rows - Add Security Considerations and Privacy Considerations sections - Add Normative and Informative References section - Fix duplicate Design Principle number (two items numbered 5) - Fix env var inconsistency: use AWF_API_PROXY_TOKEN consistently throughout - Make gh-proxy/cli-proxy always-on constraint a MUST NOT normative requirement - Frame all code examples as non-normative with >[!NOTE] callouts Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Clarify MUST NOT wording: 'MUST NOT honor attempts to disable' - Add normative requirement to emit a stderr warning when cli-proxy or tools.github.mode overrides are applied, aiding author debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 678615c. The spec has been fully rewritten in W3C specification format with:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…ports must be declared Agent-Logs-Url: https://github.com/github/gh-aw/sessions/8df5106a-c84d-4c72-b43d-280243d2ebae Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Added §6.4 Initial Prompt Context in fe1e2a2. The new normative section states:
|
| **MCP bridge** | ||
| : The `mcp-bridge` Pi extension (Extension 2) that translates MCP gateway tool definitions into Pi `AgentTool` instances, making them available to agent sessions without native MCP support in the Pi SDK. |
There was a problem hiding this comment.
@copilot not needed, MCP are already wrapped as clis. No MCP support needed.
There was a problem hiding this comment.
Done in 945545f. Removed Extension 2 (MCP Gateway Bridge) entirely — the spec now states that MCP tools are available to agent sessions as CLI commands on PATH via cli-proxy, so no Pi-level bridge is needed. Updated the architecture diagram, Design Principle 5, §7.3 execution summary, §10 project structure, and implementation plan accordingly. The extension count dropped from eight to seven.
| : The complete message history of a completed `AgentSession`, optionally summarized, passed as context to downstream steps. | ||
|
|
||
| **workflow document** | ||
| : A Markdown file with YAML frontmatter that declares an `engine: aw` workflow. The frontmatter **MUST** conform to the schema in [Section 6](#6-workflow-definition). |
There was a problem hiding this comment.
The front matter is preparsed by the gh-aw compiler and the aw harness will instead receive a config.json, prompt.text and agent files
There was a problem hiding this comment.
Done in 945545f. Updated the spec throughout to reflect that the gh-aw compiler pre-processes the workflow markdown and provides the harness with config.json (parsed harness config), prompt.txt (extracted prompt body), and agent files. Key changes:
- §3 "workflow document" definition updated
- §5.1 Entry Point rewritten:
node aw_harness.cjs --config <config-path> --prompt <prompt-path> [--agent-file <agent-path>] - Normative requirement added: the harness MUST NOT read or parse workflow Markdown files directly
- §6.1 intro updated to clarify the compiler emits
config.json; the harness consumes it at runtime - §6.3 Step Extraction Algorithm updated to describe reading from
config.json+prompt.txt - §7.2 normative requirements and code example updated accordingly
…onfig.json+prompt.txt Agent-Logs-Url: https://github.com/github/gh-aw/sessions/c4564ad2-de2d-4e3d-8a0f-da65187abf59 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
| where: | ||
| - `<config-path>` is the path to the compiler-generated `config.json` file containing the parsed harness configuration. | ||
| - `<prompt-path>` is the path to the compiler-generated `prompt.txt` file containing the extracted prompt body. | ||
| - `<agent-path>` is the optional path to a compiled agent file (`.md` with frontmatter stripped), provided when the workflow references an agent via `imports:`. |
There was a problem hiding this comment.
@copilot this information is in the config file, not needed
There was a problem hiding this comment.
Done in 3844198. Removed --agent-file from the invocation command and its description. The config.json already contains resolved agent file paths, so no separate flag is needed.
| | `AWF_API_PROXY_URL` | Base URL of the api-proxy OpenAI-compatible endpoint | `http://localhost:8080/v1` | | ||
| | `AWF_API_PROXY_TOKEN` | Bearer token for api-proxy authentication | *(required; no default)* | | ||
| | `AWF_MODEL_ALIASES` | JSON string containing model alias configuration | *(empty; aliases resolved by proxy)* | |
There was a problem hiding this comment.
@copilot awf uses the environment from the various providers and must connect with those directly
There was a problem hiding this comment.
Done in 3844198. Updated throughout to reflect that AWF injects provider-specific credentials (e.g., ANTHROPIC_API_KEY, OPENAI_API_KEY, GITHUB_TOKEN) into the container and the harness reads them directly. Changes: §5.2 env vars table replaced with a note about provider env vars; Design Principle 3 updated to "Direct provider connections"; Extension 1 renamed from "api-proxy Provider" to "Provider Setup"; §9 Model Resolution, §11 Security, §12 Privacy, and the architecture diagram all updated to remove the api-proxy middleman.
| > engine: | ||
| > id: aw | ||
| > model: sonnet # Model alias — api-proxy resolves | ||
| > harness: aw_harness.cjs |
| > You synthesize multiple review perspectives into a single, | ||
| > prioritized action list. | ||
| > | ||
| > steps: |
There was a problem hiding this comment.
@copilot steps are confusing because it is a github actions idiom. Use another terminology, related to processes.
There was a problem hiding this comment.
Done in 3844198. Renamed all harness execution unit references from "step/steps" to "task/tasks" throughout the spec: harness.steps → harness.tasks, <!-- harness-step: --> → <!-- harness-task: -->, "harness step" → "harness task", "step annotation" → "task annotation", section titles "Step Extraction Algorithm" → "Task Extraction Algorithm", "Step Execution Summary" → "Task Execution Summary", exit code table, observability events, checkpoint extension, backwards compatibility table, implementation plan, and project structure.
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e542825e-8ae2-4bb9-9736-f9ae2df187a5 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 355543b. Removed all subagent/multi-task machinery from the spec:
|
|
@copilot add section on context observability .
|
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/44933735-2676-4d52-a7c8-b2d7b7628f90 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 9be73be. Added four normative sub-sections to §8.6 and updated §5.4: §8.6.1 JSONL Event Stream — unchanged existing JSONL + OTel requirements. §8.6.2 Context Provenance File — the extension MUST produce §8.6.3 GitHub Actions Step Summary — when §8.6.4 Per-Turn Token Consumption Output — the extension MUST emit a human-readable line to stderr after each Also added a privacy note about the provenance file (metadata only, no raw prompt text) and expanded implementation plan item 10 to enumerate all four sub-requirements. |
|
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d0716110-540a-4ed3-b2a2-39304ecdfa79 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in cbc260c. Added §4.3 Pi SDK and OpenClaw Extension Format Compatibility (non-normative): §4.3.1 API Surface Comparison — side-by-side table mapping Pi
§4.3.2 Manifest Requirements — documents that OpenClaw plugins require §4.3.3 Compatibility Requirements — normative SHOULDs for writing Pi extensions in a way that eases future portability: isolate Pi SDK dependencies to the entry boundary, express tool definitions as plain objects, treat event source names as a configuration layer, avoid Pi-internal APIs with no OpenClaw equivalent. §4.3.4 Also updated the ToC, the §8 intro cross-reference, and added Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
…ensions Agent-Logs-Url: https://github.com/github/gh-aw/sessions/fee78d07-2ea8-4402-8baf-7257b063450c Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 4abef9b: §4.3 removed — the OpenClaw compatibility section, its ToC entry, §8 cross-reference, and informative reference are all gone. §6.1.4
§7 updated — execution code example now shows §11 Security Considerations updated — added user extension isolation paragraph noting extensions run in-process, with guidance on auditing third-party npm packages. §10.8 Implementation Plan updated — new item 5a documents the user extension loader implementation. |
|
|
||
| The `harness.budget` key is **OPTIONAL**. When present, it **MUST** contain: | ||
|
|
||
| - `max-cost-usd` (number): Maximum total cost in USD for the run. The cost-tracker extension **MUST** abort the current session if this limit is exceeded. |
There was a problem hiding this comment.
@copilot use max-effective-tokens instead as cost is not always know
| The `harness.budget` key is **OPTIONAL**. When present, it **MUST** contain: | ||
|
|
||
| - `max-cost-usd` (number): Maximum total cost in USD for the run. The cost-tracker extension **MUST** abort the current session if this limit is exceeded. | ||
| - `warn-at-percent` (number, 0–100): Percentage of `max-cost-usd` at which a steering warning **MUST** be injected. |
| A conforming implementation **MUST** source every item included in the session's initial prompt from one of the following explicitly declared origins: | ||
|
|
||
| - The Markdown body from `prompt.txt` (loaded per [Section 6.3](#63-prompt-loading)). | ||
| - The `harness.system` prompt if declared in the `harness:` frontmatter block. |
|
|
||
| **Requirements:** | ||
|
|
||
| - The extension **MUST** call `pi.registerProvider()` for each LLM provider whose credentials are present in the environment (e.g., `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GITHUB_TOKEN`). |
There was a problem hiding this comment.
@copilot look for the base url env var as well
| > } | ||
| > ``` | ||
|
|
||
| ### 8.2 Extension 2: Safe Outputs |
There was a problem hiding this comment.
@copilot safeoutouts are cli tools, no special support needed
|
|
||
| ### 8.6 Extension 6: Observability | ||
|
|
||
| **Purpose:** Emits structured event streams to stderr, writes a context provenance file for downstream analysis, renders a Markdown step summary, and reports per-turn token consumption. |
…dget/safe-outputs design Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d1715722-ea93-4860-873f-802b6fab33c5 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…mbering Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d1715722-ea93-4860-873f-802b6fab33c5 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 1af3ca4: OpenClaw section removed — §4.3 and its ToC entry, §8 cross-reference, and informative reference are gone. Pi SDK-compatible extension declaration strategy added as §6.1.4
Also fixed in the same pass per other review comments: |
AW Harness Design Plan
Design plan for the
awengine harness — a Pi SDK-based replacement forcopilot_harness.cjsthat adds multi-step DAG orchestration, multi-agent/multi-model support, context engineering, cost tracking, steering, session repair, and observability.Key Design Decisions
@mariozechner/pi-coding-agent,pi-agent-core,pi-aias the runtimeExtensionAPI.cjsinactions/setup/js/engine: awis a new choice; existing engines (copilot,claude,codex) are untouchedArchitecture
Files
specs/aw-harness.md- Full design documentThis is a design-only PR - no implementation code yet.