Skip to content

[subagent-optimizer] Optimize spec-enforcer — 2026-05-16 #32639

@github-actions

Description

@github-actions

Target Workflow

File: .github/workflows/spec-enforcer.md
Engine: claude
7-day token usage: ~4,684,153 tokens across 1 run (~4,684,153 avg/run)

Why This Workflow

spec-enforcer is the highest-token Claude-engine workflow in the past 7 days that does not already use inline sub-agents. Its prompt body has six distinct phases (cache init, package selection, README extraction, test generation, test validation, PR/noop emission), several of which are extractive or classificatory tasks that a smaller model can handle without losing fidelity. Phase 2 also runs per package (2–3 per scheduled run, or all packages in full-sweep mode), making per-package work a strong parallelism target.


Optimization 2 — Inline Sub-Agents

LLM Expert Reasoning

  • Phase 0 PR-body parsing is a closed-form regex extraction on a single PR body — pure pattern matching with no cross-section context required. Textbook small-model task.
  • Phase 2 README spec extraction is structured summarization on a single file per call. The main agent only needs the extracted structure (Public API, behavioral contracts, examples, edge cases) — it does not need the full README in its own context. This work is identical across the 2–3 selected packages and can be issued in parallel.
  • Phase 4 test-output classification parses go test stderr/stdout to bucket each failure into a predefined category (compile error, missing symbol, signature mismatch, behavior mismatch). Classification with a fixed taxonomy is the canonical small-model use case.
  • Scoring dimensions that drove selection: high haiku-adequacy (extractive/classificatory work), high independence (each candidate consumes a single bounded input), and bonus parallelism for Phase 2 (multiple packages per run).
  • Test generation (Phase 3) and PR-body authoring (Phase 5) were not selected — they require domain judgment and are the authoritative outputs of the workflow.

Proposed Sub-Agents

1. rotation-state-recoverer (small)

Extracted task: Parse a merged PR body to recover round-robin rotation state (last_packages list + last_index) when rotation.json is missing.
Why small: Pure regex-driven extraction over a single PR body — the heuristic "extracting specific fields from structured/semi-structured text" fires directly.
Score: 9/10 (independence: 3, model-adequacy: 3, parallelism: 1, size: 2)
Estimated savings: ~3–6% of main-model tokens/run (small but cleanly separable)

Agent definition (copy-paste ready)
## agent: `rotation-state-recoverer`
---
description: Recover spec-enforcer rotation state from a merged PR body.
model: small
---
You receive a PR body (markdown) and the current list of eligible package names.

1. Find the line matching `^- \*\*Next packages in rotation\*\*:\s*(...)$` and capture the comma-separated package list. Tolerate surrounding whitespace.
2. Split by comma, trim each entry, discard empty entries.
3. Build a map `eligible_package -> eligible_list_index`. Scan recovered packages left-to-right; keep the index of the last recovered package that exists in the eligible map. If none match, use `-1`.
4. Output JSON only — no prose — with fields: `last_packages` (array), `last_index` (number), `last_run` (string, "unknown" if not given), `total_eligible` (number, from input).

If the line is missing or unparsable, output `{"last_index": -1, "last_packages": [], "last_run": "unknown", "total_eligible": <input>}`.

Invocation change in main prompt:

Before (inside Phase 0 "Initialize or Load", step 3):

3. If `rotation.json` is missing or empty, recover round-robin state from the most recently merged PR with the `pkg-specifications` label:
   - Use `gh pr list --repo ${{ github.repository }} --state merged --label pkg-specifications --limit 1 --json number,body` to find the latest merged PR in this repository
   - Parse this line from the PR body:
     - `- **Next packages in rotation**: <list>`
   - Use this matching pattern:
     - `^- \*\*Next packages in rotation\*\*:\s*([A-Za-z0-9_.]+(?:-[A-Za-z0-9_.]+)*(?:\s*,\s*[A-Za-z0-9_.]+(?:-[A-Za-z0-9_.]+)*)*)\s*$`
   ... (~40 more lines)

After:

3. If `rotation.json` is missing or empty, fetch the most recently merged PR with `gh pr list --repo ${{ github.repository }} --state merged --label pkg-specifications --limit 1 --json number,body,mergedAt`, then use the `rotation-state-recoverer` agent — pass it the PR body text and the eligible package list — to produce the rotation JSON. Write the agent's output to `rotation.json` (set `last_run` to the PR's `mergedAt` UTC date). If no matching PR exists, write the fallback state `{"last_index": -1, "last_packages": [], "last_run": "unknown", "total_eligible": N}`.

2. readme-spec-extractor (small)

Extracted task: Read a single package's README.md and emit a compact structured representation of the public API contract for test generation.
Why small: "Summarizing a single file" + "extracting specific fields" — exactly the heuristics for small-model work. Heavy lifting (test generation) stays with the main model.
Score: 10/10 (independence: 3, model-adequacy: 3, parallelism: 2, size: 2)
Estimated savings: ~12–20% of main-model tokens/run (largest impact — runs per package and the full README never enters the main context)

Agent definition (copy-paste ready)
## agent: `readme-spec-extractor`
---
description: Extract structured API contract from a Go package README.md.
model: small
---
You are given the full contents of a single Go package README.md.

Emit a JSON object with these fields (omit any field that the README does not document):
- `public_api`: list of `{name, kind: "func"|"type"|"const", documented_signature_or_value, behavior_summary}` items
- `behavioral_contracts`: list of short bullet strings (one obligation each)
- `usage_examples`: list of `{label, input, expected_output}` items, verbatim from the README where possible
- `design_constraints`: list of short bullet strings (thread safety, error handling, etc.)
- `edge_cases`: list of short bullet strings (documented limitations)
- `ambiguities`: list of short bullet strings — any places the spec is unclear and a test will need to make assumptions

Do not invent details that the README does not state. Output JSON only.

Invocation change in main prompt:

Before (Phase 2 "Step 1: Read the README.md" and "Step 2"):

### Step 1: Read the README.md

```bash
cat pkg/<package>/README.md

Extract from the specification:

  • Public API: Functions, types, constants documented
  • Behavioral contracts: What each function MUST do
  • Usage examples: Expected input/output patterns
  • Design constraints: Thread safety, error handling, etc.
  • Edge cases: Documented limitations or special behavior

Step 2: Minimal Source Code Reading

...


After:

Step 1: Extract the specification

For each selected package, invoke the readme-spec-extractor agent in parallel — pass it the contents of pkg/<package>/README.md. Use the returned JSON as the source of truth when generating tests in Phase 3.

Step 2: Minimal Source Code Reading

... (unchanged)


#### 3. `test-output-classifier` (`small`)

**Extracted task**: Parse `go build` / `go test` output and classify each failure into a fixed taxonomy.
**Why small**: Classification with a predefined category set — the heuristic "classifying items into a predefined set of categories" fires.
**Score**: 8/10 (independence: 3, model-adequacy: 3, parallelism: 1, size: 1)
**Estimated savings**: ~4–8% of main-model tokens/run (raw `go test` output never enters the main context)

<details>
<summary>Agent definition (copy-paste ready)</summary>

```markdown
## agent: `test-output-classifier`
---
description: Classify go test/go build failures into a fixed taxonomy.
model: small
---
You receive raw `go build` and `go test` output for a single package.

For each failure, emit one entry with these fields:
- `test_or_symbol`: the test function name or compile symbol
- `category`: one of `compile_error`, `missing_symbol`, `signature_mismatch`, `assertion_failure`, `panic`, `other`
- `evidence`: one verbatim line from the output that justifies the category
- `suggested_action`: one of `fix_test`, `flag_spec_mismatch`, `flag_spec_ambiguity`, `investigate`

Also emit a top-level `summary`: `{total_failures, by_category: {...}, all_passing: bool}`.

Output JSON only — no prose. If output shows all tests passing, emit `{"summary": {"total_failures": 0, "by_category": {}, "all_passing": true}, "failures": []}`.

Invocation change in main prompt:

Before (Phase 4):

After generating tests, validate they compile and pass:

```bash
go build ./pkg/<package>/...
go test -v -run "TestSpec" ./pkg/<package>/

If tests fail:

  1. Re-read the specification section that the test maps to
  2. Verify the test matches the specification (not implementation)
  3. If the specification is ambiguous, add a // SPEC_AMBIGUITY: <description> comment in the test
  4. If the implementation doesn't match the specification, add a // SPEC_MISMATCH: <description> comment and document it in the PR body

After:

After generating tests, run go build ./pkg/<package>/... and go test -v -run "TestSpec" ./pkg/<package>/, then pass both outputs to the test-output-classifier agent. Use the returned JSON to decide per failure: fix_test → revise the test against the spec; flag_spec_ambiguity → add // SPEC_AMBIGUITY: <description>; flag_spec_mismatch → add // SPEC_MISMATCH: <description> and document it in the PR body; investigate → re-read the spec section before deciding.


### Estimated Impact

| Metric | Before | After (estimated) |
|---|---|---|
| Avg tokens/run (main model) | ~4.68M | ~3.5M (~20–25% reduction) |
| Main-model context saved | — | ~1.0–1.2M tokens/run |
| Parallelism opportunity | None | 2–3 `readme-spec-extractor` calls in parallel per run |

### Implementation Steps

1. Add the three sub-agent blocks at the bottom of `.github/workflows/spec-enforcer.md`, after all workflow content.
2. Update the three prompt sections shown above to invoke the sub-agents by name.
3. Compile: `gh aw compile spec-enforcer`
4. Test: `gh workflow run spec-enforcer.yml`
5. After one scheduled run, compare token usage against the 4.68M baseline.

### References

- Optimizer run: https://github.com/github/gh-aw/actions/runs/25964590030


<!-- gh-aw-tracker-id: daily-subagent-optimizer -->




> Generated by [⚡ Daily Sub-Agent Optimizer](https://github.com/github/gh-aw/actions/runs/25964590030) · ● 7.1M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-subagent-optimizer%22&type=issues)
> - [x] expires <!-- gh-aw-expires: 2026-05-23T14:45:56.842Z --> on May 23, 2026, 2:45 PM UTC

<!-- gh-aw-agentic-workflow: Daily Sub-Agent Optimizer, gh-aw-tracker-id: daily-subagent-optimizer, engine: claude, model: auto, id: 25964590030, workflow_id: daily-subagent-optimizer, run: https://github.com/github/gh-aw/actions/runs/25964590030 -->

<!-- gh-aw-workflow-id: daily-subagent-optimizer -->
<!-- gh-aw-workflow-call-id: github/gh-aw/daily-subagent-optimizer -->

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions