[subagent-optimizer] Optimize spec-enforcer — 2026-05-16

### Target Workflow

**File**: `.github/workflows/spec-enforcer.md`
**Engine**: claude
**7-day token usage**: ~4,684,153 tokens across 1 run (~4,684,153 avg/run)

### Why This Workflow

`spec-enforcer` is the highest-token Claude-engine workflow in the past 7 days that does **not** already use inline sub-agents. Its prompt body has six distinct phases (cache init, package selection, README extraction, test generation, test validation, PR/noop emission), several of which are extractive or classificatory tasks that a smaller model can handle without losing fidelity. Phase 2 also runs per package (2–3 per scheduled run, or all packages in full-sweep mode), making per-package work a strong parallelism target.

---

## Optimization 2 — Inline Sub-Agents

### LLM Expert Reasoning

- **Phase 0 PR-body parsing** is a closed-form regex extraction on a single PR body — pure pattern matching with no cross-section context required. Textbook small-model task.
- **Phase 2 README spec extraction** is structured summarization on a single file per call. The main agent only needs the extracted structure (Public API, behavioral contracts, examples, edge cases) — it does not need the full README in its own context. This work is identical across the 2–3 selected packages and can be issued in parallel.
- **Phase 4 test-output classification** parses `go test` stderr/stdout to bucket each failure into a predefined category (compile error, missing symbol, signature mismatch, behavior mismatch). Classification with a fixed taxonomy is the canonical small-model use case.
- Scoring dimensions that drove selection: high **haiku-adequacy** (extractive/classificatory work), high **independence** (each candidate consumes a single bounded input), and bonus **parallelism** for Phase 2 (multiple packages per run).
- Test generation (Phase 3) and PR-body authoring (Phase 5) were **not** selected — they require domain judgment and are the authoritative outputs of the workflow.

### Proposed Sub-Agents

#### 1. `rotation-state-recoverer` (`small`)

**Extracted task**: Parse a merged PR body to recover round-robin rotation state (last_packages list + last_index) when `rotation.json` is missing.
**Why small**: Pure regex-driven extraction over a single PR body — the heuristic "extracting specific fields from structured/semi-structured text" fires directly.
**Score**: 9/10 (independence: 3, model-adequacy: 3, parallelism: 1, size: 2)
**Estimated savings**: ~3–6% of main-model tokens/run (small but cleanly separable)

<details>
<summary>Agent definition (copy-paste ready)</summary>

```markdown
## agent: `rotation-state-recoverer`
---
description: Recover spec-enforcer rotation state from a merged PR body.
model: small
---
You receive a PR body (markdown) and the current list of eligible package names.

1. Find the line matching `^- \*\*Next packages in rotation\*\*:\s*(...)$` and capture the comma-separated package list. Tolerate surrounding whitespace.
2. Split by comma, trim each entry, discard empty entries.
3. Build a map `eligible_package -> eligible_list_index`. Scan recovered packages left-to-right; keep the index of the last recovered package that exists in the eligible map. If none match, use `-1`.
4. Output JSON only — no prose — with fields: `last_packages` (array), `last_index` (number), `last_run` (string, "unknown" if not given), `total_eligible` (number, from input).

If the line is missing or unparsable, output `{"last_index": -1, "last_packages": [], "last_run": "unknown", "total_eligible": <input>}`.
```

</details>

**Invocation change in main prompt:**

Before (inside Phase 0 "Initialize or Load", step 3):
```
3. If `rotation.json` is missing or empty, recover round-robin state from the most recently merged PR with the `pkg-specifications` label:
   - Use `gh pr list --repo ${{ github.repository }} --state merged --label pkg-specifications --limit 1 --json number,body` to find the latest merged PR in this repository
   - Parse this line from the PR body:
     - `- **Next packages in rotation**: <list>`
   - Use this matching pattern:
     - `^- \*\*Next packages in rotation\*\*:\s*([A-Za-z0-9_.]+(?:-[A-Za-z0-9_.]+)*(?:\s*,\s*[A-Za-z0-9_.]+(?:-[A-Za-z0-9_.]+)*)*)\s*$`
   ... (~40 more lines)
```

After:
```
3. If `rotation.json` is missing or empty, fetch the most recently merged PR with `gh pr list --repo ${{ github.repository }} --state merged --label pkg-specifications --limit 1 --json number,body,mergedAt`, then use the `rotation-state-recoverer` agent — pass it the PR body text and the eligible package list — to produce the rotation JSON. Write the agent's output to `rotation.json` (set `last_run` to the PR's `mergedAt` UTC date). If no matching PR exists, write the fallback state `{"last_index": -1, "last_packages": [], "last_run": "unknown", "total_eligible": N}`.
```

#### 2. `readme-spec-extractor` (`small`)

**Extracted task**: Read a single package's README.md and emit a compact structured representation of the public API contract for test generation.
**Why small**: "Summarizing a single file" + "extracting specific fields" — exactly the heuristics for small-model work. Heavy lifting (test generation) stays with the main model.
**Score**: 10/10 (independence: 3, model-adequacy: 3, parallelism: 2, size: 2)
**Estimated savings**: ~12–20% of main-model tokens/run (largest impact — runs per package and the full README never enters the main context)

<details>
<summary>Agent definition (copy-paste ready)</summary>

```markdown
## agent: `readme-spec-extractor`
---
description: Extract structured API contract from a Go package README.md.
model: small
---
You are given the full contents of a single Go package README.md.

Emit a JSON object with these fields (omit any field that the README does not document):
- `public_api`: list of `{name, kind: "func"|"type"|"const", documented_signature_or_value, behavior_summary}` items
- `behavioral_contracts`: list of short bullet strings (one obligation each)
- `usage_examples`: list of `{label, input, expected_output}` items, verbatim from the README where possible
- `design_constraints`: list of short bullet strings (thread safety, error handling, etc.)
- `edge_cases`: list of short bullet strings (documented limitations)
- `ambiguities`: list of short bullet strings — any places the spec is unclear and a test will need to make assumptions

Do not invent details that the README does not state. Output JSON only.
```

</details>

**Invocation change in main prompt:**

Before (Phase 2 "Step 1: Read the README.md" and "Step 2"):
```
### Step 1: Read the README.md

```bash
cat pkg/<package>/README.md
```

Extract from the specification:
- **Public API**: Functions, types, constants documented
- **Behavioral contracts**: What each function MUST do
- **Usage examples**: Expected input/output patterns
- **Design constraints**: Thread safety, error handling, etc.
- **Edge cases**: Documented limitations or special behavior

### Step 2: Minimal Source Code Reading
...
```

After:
```
### Step 1: Extract the specification

For each selected package, invoke the `readme-spec-extractor` agent in parallel — pass it the contents of `pkg/<package>/README.md`. Use the returned JSON as the source of truth when generating tests in Phase 3.

### Step 2: Minimal Source Code Reading
... (unchanged)
```

#### 3. `test-output-classifier` (`small`)

**Extracted task**: Parse `go build` / `go test` output and classify each failure into a fixed taxonomy.
**Why small**: Classification with a predefined category set — the heuristic "classifying items into a predefined set of categories" fires.
**Score**: 8/10 (independence: 3, model-adequacy: 3, parallelism: 1, size: 1)
**Estimated savings**: ~4–8% of main-model tokens/run (raw `go test` output never enters the main context)

<details>
<summary>Agent definition (copy-paste ready)</summary>

```markdown
## agent: `test-output-classifier`
---
description: Classify go test/go build failures into a fixed taxonomy.
model: small
---
You receive raw `go build` and `go test` output for a single package.

For each failure, emit one entry with these fields:
- `test_or_symbol`: the test function name or compile symbol
- `category`: one of `compile_error`, `missing_symbol`, `signature_mismatch`, `assertion_failure`, `panic`, `other`
- `evidence`: one verbatim line from the output that justifies the category
- `suggested_action`: one of `fix_test`, `flag_spec_mismatch`, `flag_spec_ambiguity`, `investigate`

Also emit a top-level `summary`: `{total_failures, by_category: {...}, all_passing: bool}`.

Output JSON only — no prose. If output shows all tests passing, emit `{"summary": {"total_failures": 0, "by_category": {}, "all_passing": true}, "failures": []}`.
```

</details>

**Invocation change in main prompt:**

Before (Phase 4):
```
After generating tests, validate they compile and pass:

```bash
go build ./pkg/<package>/...
go test -v -run "TestSpec" ./pkg/<package>/
```

If tests fail:
1. Re-read the specification section that the test maps to
2. Verify the test matches the specification (not implementation)
3. If the specification is ambiguous, add a `// SPEC_AMBIGUITY: <description>` comment in the test
4. If the implementation doesn't match the specification, add a `// SPEC_MISMATCH: <description>` comment and document it in the PR body
```

After:
```
After generating tests, run `go build ./pkg/<package>/...` and `go test -v -run "TestSpec" ./pkg/<package>/`, then pass both outputs to the `test-output-classifier` agent. Use the returned JSON to decide per failure: `fix_test` → revise the test against the spec; `flag_spec_ambiguity` → add `// SPEC_AMBIGUITY: <description>`; `flag_spec_mismatch` → add `// SPEC_MISMATCH: <description>` and document it in the PR body; `investigate` → re-read the spec section before deciding.
```

### Estimated Impact

| Metric | Before | After (estimated) |
|---|---|---|
| Avg tokens/run (main model) | ~4.68M | ~3.5M (~20–25% reduction) |
| Main-model context saved | — | ~1.0–1.2M tokens/run |
| Parallelism opportunity | None | 2–3 `readme-spec-extractor` calls in parallel per run |

### Implementation Steps

1. Add the three sub-agent blocks at the bottom of `.github/workflows/spec-enforcer.md`, after all workflow content.
2. Update the three prompt sections shown above to invoke the sub-agents by name.
3. Compile: `gh aw compile spec-enforcer`
4. Test: `gh workflow run spec-enforcer.yml`
5. After one scheduled run, compare token usage against the 4.68M baseline.

### References

- Optimizer run: https://github.com/github/gh-aw/actions/runs/25964590030







> Generated by [⚡ Daily Sub-Agent Optimizer](https://github.com/github/gh-aw/actions/runs/25964590030) · ● 7.1M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-subagent-optimizer%22&type=issues)
> - [x] expires  on May 23, 2026, 2:45 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[subagent-optimizer] Optimize spec-enforcer — 2026-05-16 #32639

Target Workflow

Why This Workflow

Optimization 2 — Inline Sub-Agents

LLM Expert Reasoning

Proposed Sub-Agents

1. `rotation-state-recoverer` (`small`)

2. `readme-spec-extractor` (`small`)

Step 2: Minimal Source Code Reading

Step 1: Extract the specification

Step 2: Minimal Source Code Reading

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[subagent-optimizer] Optimize spec-enforcer — 2026-05-16 #32639

Description

Target Workflow

Why This Workflow

Optimization 2 — Inline Sub-Agents

LLM Expert Reasoning

Proposed Sub-Agents

1. rotation-state-recoverer (small)

2. readme-spec-extractor (small)

Step 2: Minimal Source Code Reading

Step 1: Extract the specification

Step 2: Minimal Source Code Reading

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `rotation-state-recoverer` (`small`)

2. `readme-spec-extractor` (`small`)