Target Workflow: test-coverage-improver
Source report: #4199
Estimated cost per run: ~$1.50 (estimated; api-proxy cost not yet wired)
Total tokens per run: ~2,993K (input: 2,974K, output: 18K)
Cache hit rate: 96.3% (2,863K of 2,974K input tokens served from cache)
LLM turns: 1
Model: claude-sonnet-4.6
Current Configuration
| Setting |
Value |
| Tools loaded |
github (repos, pull_requests), bash (npm test, npm lint, node:*, jest:*, eslint:*, cat, cat:src/*.test.ts, git*, grep, head, ls, ...) |
| Tools actually used |
read (container-lifecycle.ts + test-utils), write (new test file), shell (npm run test ×3, npm run lint), safeoutputs (create_pull_request) |
| Network groups |
github only |
| Pre-agent steps |
Yes — install, build, coverage, select target, inject source/tests, list low-coverage |
| Prompt size |
~13KB template (232 lines) |
Key Finding: Pre-Step Template Injection Failed
In the analyzed run, the TARGET_FILE, SOURCE_CONTENT, TEST_CONTENT, COVERAGE_MD, and LOW_COVERAGE step outputs were not substituted into the prompt — all appeared as empty strings. The agent fell back to reading container-lifecycle.ts independently via bash tools, adding ~3 extra tool calls and ~15–25K additional conversation tokens per run.
Evidence from aw-prompts/prompt.txt (run 26811120662):
**File to improve:** `` ← should be src/docker-manager.ts
### Source: ``
` ` `typescript
← should contain full source file
` ` `
Recommendations
1. Fix pre-step output injection reliability
Estimated savings: ~20–30K tokens/run (~1.5–2% of non-cached input) + 3 fewer tool calls
The steps.target.outputs.SOURCE_CONTENT and related outputs are empty in the prompt, forcing the agent to discover and read the target file itself. This negates the entire purpose of the pre-steps.
Root cause to investigate: Check whether $GITHUB_OUTPUT heredoc syntax in the steps.target.run block is correctly flushing to disk before the prompt is rendered. The multiline SOURCE_CONTENT<<EOF heredoc may not be correctly read back when the gh-aw template engine expands ${{ steps.target.outputs.SOURCE_CONTENT }}.
Fix — add a verification step after injection:
- name: Verify injections
run: |
echo "TARGET_FILE: ${{ steps.target.outputs.TARGET_FILE }}"
[ -n "${{ steps.target.outputs.TARGET_FILE }}" ] || { echo "ERROR: TARGET_FILE empty"; exit 1; }
This makes the failure visible in CI logs rather than silently sending an empty prompt.
2. Remove pull_requests from GitHub toolsets
Estimated savings: ~6–10K tokens/run from reduced tool schemas (benefits cold-cache runs most)
The agent creates PRs via safeoutputs create_pull_request, not via GitHub MCP tools. The pull_requests toolset loads ~8 tools (create PR, list PRs, get PR diff, get files, get reviews, merge, etc.) that are never called.
Change in .github/workflows/test-coverage-improver.md:
# Before
tools:
github:
toolsets: [repos, pull_requests]
# After
tools:
github:
toolsets: [repos]
3. Restrict cat:src/*.test.ts glob to avoid bulk reads
Estimated savings: ~15–50K tokens/run if agent reads multiple test files for style reference
With 40+ test files in src/ averaging 500–800 lines (~14–20K tokens each), an agent that reads 3 files via glob adds 40–60K tokens to conversation context.
Add a note in the prompt body:
When reading existing test files for style reference, read only the test file
for the target module (`src/<target>.test.ts`). Do not glob-read all test files.
Or limit via the allowlist to the dynamically-selected file:
tools:
bash:
- "cat:src/${{ steps.target.outputs.TARGET_BASENAME }}.test.ts"
(requires adding TARGET_BASENAME output to the target-selection step)
4. Cap npm run test re-run verbosity in prompt instructions
Estimated savings: ~10–20K tokens/turn on test failure runs
Add to the workflow prompt:
For targeted test runs, always use:
`./node_modules/.bin/jest --testPathPattern=<file> --no-coverage 2>&1 | tail -60`
Run full `npm run test` at most once (final verification only).
5. Surface injected content conditionally
Estimated savings: Avoids empty code blocks that prompt re-reading behavior
Add a guard in the prompt so empty injections are visible rather than silent:
## Target File
**File:** `${{ steps.target.outputs.TARGET_FILE }}`
> ⚠️ If the file content below is empty, the pre-step failed — use `cat` to read
> `${{ steps.target.outputs.TARGET_FILE }}` directly before writing tests.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
2,993K |
~2,950K |
~1.5% |
| Non-cached input tokens |
~111K |
~75K |
~32% |
| Output tokens |
18K |
15K |
~17% |
| LLM turns |
1 |
1 |
— |
| Extra tool calls (injection failure) |
~3 |
0 |
-3 calls |
| Effective tokens |
4,252K |
~3,800K |
~11% |
Note: The 96.3% cache hit rate means the bulk of token cost is already minimized. The highest-value optimization is fixing the pre-step injection failure, which restores the intended zero-overhead target delivery and eliminates avoidable agent tool calls.
Implementation Checklist
Generated by Daily Copilot Token Optimization Advisor · sonnet46 1.8M · ◷
Target Workflow:
test-coverage-improverSource report: #4199
Estimated cost per run: ~$1.50 (estimated; api-proxy cost not yet wired)
Total tokens per run: ~2,993K (input: 2,974K, output: 18K)
Cache hit rate: 96.3% (2,863K of 2,974K input tokens served from cache)
LLM turns: 1
Model: claude-sonnet-4.6
Current Configuration
github(repos, pull_requests),bash(npm test, npm lint, node:*, jest:*, eslint:*, cat, cat:src/*.test.ts, git*, grep, head, ls, ...)read(container-lifecycle.ts + test-utils),write(new test file),shell(npm run test ×3, npm run lint),safeoutputs(create_pull_request)githubonlyKey Finding: Pre-Step Template Injection Failed
In the analyzed run, the
TARGET_FILE,SOURCE_CONTENT,TEST_CONTENT,COVERAGE_MD, andLOW_COVERAGEstep outputs were not substituted into the prompt — all appeared as empty strings. The agent fell back to readingcontainer-lifecycle.tsindependently via bash tools, adding ~3 extra tool calls and ~15–25K additional conversation tokens per run.Evidence from
aw-prompts/prompt.txt(run 26811120662):Recommendations
1. Fix pre-step output injection reliability
Estimated savings: ~20–30K tokens/run (~1.5–2% of non-cached input) + 3 fewer tool calls
The
steps.target.outputs.SOURCE_CONTENTand related outputs are empty in the prompt, forcing the agent to discover and read the target file itself. This negates the entire purpose of the pre-steps.Root cause to investigate: Check whether
$GITHUB_OUTPUTheredoc syntax in thesteps.target.runblock is correctly flushing to disk before the prompt is rendered. The multilineSOURCE_CONTENT<<EOFheredoc may not be correctly read back when the gh-aw template engine expands${{ steps.target.outputs.SOURCE_CONTENT }}.Fix — add a verification step after injection:
This makes the failure visible in CI logs rather than silently sending an empty prompt.
2. Remove
pull_requestsfrom GitHub toolsetsEstimated savings: ~6–10K tokens/run from reduced tool schemas (benefits cold-cache runs most)
The agent creates PRs via
safeoutputs create_pull_request, not via GitHub MCP tools. Thepull_requeststoolset loads ~8 tools (create PR, list PRs, get PR diff, get files, get reviews, merge, etc.) that are never called.Change in
.github/workflows/test-coverage-improver.md:3. Restrict
cat:src/*.test.tsglob to avoid bulk readsEstimated savings: ~15–50K tokens/run if agent reads multiple test files for style reference
With 40+ test files in
src/averaging 500–800 lines (~14–20K tokens each), an agent that reads 3 files via glob adds 40–60K tokens to conversation context.Add a note in the prompt body:
Or limit via the allowlist to the dynamically-selected file:
(requires adding
TARGET_BASENAMEoutput to the target-selection step)4. Cap
npm run testre-run verbosity in prompt instructionsEstimated savings: ~10–20K tokens/turn on test failure runs
Add to the workflow prompt:
5. Surface injected content conditionally
Estimated savings: Avoids empty code blocks that prompt re-reading behavior
Add a guard in the prompt so empty injections are visible rather than silent:
Expected Impact
Implementation Checklist
steps.target.outputs.TARGET_FILEis empty in the rendered prompt (heredoc GITHUB_OUTPUT flush timing)pull_requestsfromtoolsetsintest-coverage-improver.mdnpm run testre-runscat:src/*.test.tsto the dynamically-selected target filegh aw compile .github/workflows/test-coverage-improver.mdagent_usage.jsonon new run vs baseline run 26811120662