Target Workflow: build-test.md
Source report: #1768
Estimated cost per run: $4.54 (avg), up to $7.77 (worst run)
Total tokens per run: ~2,362K avg (1,222K input + 1,134K cache read + 7K output)
Cache hit rate: 48.1%
LLM turns: ~40–80 estimated (8 ecosystems × 5–10 bash turns each)
I/O ratio: 358:1
Token variance: 3.2× across 3 runs (584K → 1,208K → 1,871K input)
Current Configuration
| Setting |
Value |
| Tools loaded |
bash: [*], github: (no toolsets: — loads all ~22 tools) |
| Tools actually used |
bash (all work), safeoutputs-add_comment, safeoutputs-add_labels |
| Network groups |
defaults, github, node, go, rust, crates.io, java, dotnet, bun.sh, deno.land, jsr.io, dl.deno.land (12 groups) |
| Pre-agent steps |
None — all cloning, installs, builds, tests run in agent |
| Prompt size |
8,478 chars (~8.5KB, 8 ecosystem tasks × 2–3 projects each) |
Root Cause Analysis
The wide token variance (3.2×) and very high I/O ratio (358:1) have two main drivers:
-
The agent performs all work via bash tool calls — 8 sequential ecosystems with clone → install → build → test, each generating large stdout/stderr that accumulates in the context window. Maven (mvn compile && mvn test) and Cargo (cargo build && cargo test) are especially verbose. The 12MB agent artifact is primarily the Maven .m2/ repository cache, confirming this.
-
GitHub MCP tools loaded but never used — Without toolsets:, the GitHub MCP server loads ~22 tools. The workflow exclusively uses bash for gh repo clone, builds, and tests; and safeoutputs for the PR comment and label. The GitHub MCP tool schemas add ~13,000+ tokens per LLM turn.
Recommendations
1. Move all build execution to pre-agent steps: (eliminates ~80% of tokens)
Estimated savings: ~1,700K–1,900K tokens/run (~72–80%)
The agent currently runs ~40–80 bash turns (8 ecosystems × clone + install + build + test). Each turn adds output to the context window, compounding costs. Moving deterministic build work to pre-agent steps: reduces agent turns from ~50 to ~3–5.
Add a steps: section before the prompt body:
steps:
- name: Install Bun and Deno
run: |
curl -fsSL (bun.sh/redacted) | bash || true
export BUN_INSTALL="$HOME/.bun" && export PATH="$BUN_INSTALL/bin:$PATH"
curl -fsSL (deno.land/redacted) | sh || true
export DENO_INSTALL="$HOME/.deno" && export PATH="$DENO_INSTALL/bin:$PATH"
- name: Clone all test repositories
id: clone-repos
run: |
declare -A CLONES=(
[bun]="Mossaka/gh-aw-firewall-test-bun"
[cpp]="Mossaka/gh-aw-firewall-test-cpp"
[deno]="Mossaka/gh-aw-firewall-test-deno"
[dotnet]="Mossaka/gh-aw-firewall-test-dotnet"
[go]="Mossaka/gh-aw-firewall-test-go"
[java]="Mossaka/gh-aw-firewall-test-java"
[node]="Mossaka/gh-aw-firewall-test-node"
[rust]="Mossaka/gh-aw-firewall-test-rust"
)
for key in "\$\{!CLONES[@]}"; do
gh repo clone "\$\{CLONES[$key]}" "/tmp/test-$key" 2>&1 | tail -5 \
|| echo "CLONE_FAILED: $key"
done
- name: Run all build tests
id: build-results
run: |
# Configure Maven proxy
mkdir -p ~/.m2
cat > ~/.m2/settings.xml << 'EOF'
<settings><proxies>
<proxy><id>awf-http</id><active>true</active><protocol>http</protocol><host>squid-proxy</host><port>3128</port></proxy>
<proxy><id>awf-https</id><active>true</active><protocol>https</protocol><host>squid-proxy</host><port>3128</port></proxy>
</proxies></settings>
EOF
run_test() {
local name="$1"; local cmd="$2"
local out; out=$(eval "$cmd" 2>&1 | tail -20)
local rc=$?
echo "=== $name: exit=$rc ==="
echo "$out"
}
run_test "bun/elysia" "cd /tmp/test-bun/elysia && bun install && bun test"
run_test "bun/hono" "cd /tmp/test-bun/hono && bun install && bun test"
run_test "cpp/fmt" "cd /tmp/test-cpp/fmt && mkdir -p build && cd build && cmake .. && make"
run_test "cpp/json" "cd /tmp/test-cpp/json && mkdir -p build && cd build && cmake .. && make"
run_test "deno/oak" "cd /tmp/test-deno/oak && deno test"
run_test "deno/std" "cd /tmp/test-deno/std && deno test"
run_test "dotnet/hello" "cd /tmp/test-dotnet/hello-world && dotnet restore && dotnet build && dotnet run"
run_test "dotnet/json" "cd /tmp/test-dotnet/json-parse && dotnet restore && dotnet build && dotnet run"
run_test "go/color" "cd /tmp/test-go/color && go mod download && go test ./..."
run_test "go/env" "cd /tmp/test-go/env && go mod download && go test ./..."
run_test "go/uuid" "cd /tmp/test-go/uuid && go mod download && go test ./..."
run_test "java/gson" "cd /tmp/test-java/gson && mvn -q compile && mvn -q test"
run_test "java/caffeine" "cd /tmp/test-java/caffeine && mvn -q compile && mvn -q test"
run_test "node/clsx" "cd /tmp/test-node/clsx && npm install --quiet && npm test"
run_test "node/execa" "cd /tmp/test-node/execa && npm install --quiet && npm test"
run_test "node/p-limit" "cd /tmp/test-node/p-limit && npm install --quiet && npm test"
run_test "rust/fd" "cd /tmp/test-rust/fd && cargo build -q && cargo test -q"
run_test "rust/zoxide" "cd /tmp/test-rust/zoxide && cargo build -q && cargo test -q"
id: build-results
Then rewrite the prompt body to just interpret and post results:
# Build Test Suite
The pre-agent steps have already run all builds and tests. The results are in
the `build-results` step output (available via bash in the session).
1. Run: `cat /tmp/build-results.txt` to read the results.
2. Parse exit codes and output for each project (format: `=== name: exit=N ===`).
3. Post a single PR comment with the summary table using `safeoutputs-add_comment`.
4. If ALL tests pass, add the label `build-test` using `safeoutputs-add_labels`.
2. Remove github: tools (no toolsets needed)
Estimated savings: ~450K tokens/run (~19%)
The workflow uses only bash for all gh CLI operations and safeoutputs for PR interaction. The GitHub MCP server tools (search_issues, create_pull_request, list_commits, etc.) are never called. Loading all ~22 tools adds ~13,000 tokens per LLM turn from tool schemas.
Change in build-test.md — remove the github: block entirely:
# Before:
tools:
bash:
- "*"
github:
github-token: "$\{\{ secrets.GH_AW_GITHUB_MCP_SERVER_TOKEN }}"
# After:
tools:
bash:
- "*"
Note: The gh CLI in bash remains authenticated via the GITHUB_TOKEN env var. Cloning via gh repo clone will still work.
3. Suppress verbose build output (quick wins for remaining bash turns)
Estimated savings: ~100–200K tokens/run (~5–8%)
Even after moving to pre-steps, the agent still reads outputs for formatting. Using quiet flags on build tools reduces context bloat:
| Tool |
Current |
Optimized |
| Maven |
mvn compile && mvn test |
mvn -q compile && mvn -q test |
| Cargo |
cargo build && cargo test |
cargo build -q && cargo test -q |
| npm |
npm install |
npm install --quiet |
| cmake/make |
make |
make -s |
These flags are already included in the pre-step script in Rec 1 above.
4. Improve prompt caching with stable prefix
Estimated savings: ~100–150K tokens/run via improved cache utilization
The current cache hit rate of 48.1% could reach 65%+ by ensuring the stable system context (tool instructions, task descriptions) appears before any dynamic content (PR number, branch name, event context). Automatic OpenAI prompt caching benefits runs that execute within ~5 minutes of each other on the same prompt prefix.
The 3 analyzed runs were on different branches — this explains cache misses since the PR context in the prompt prefix differs. However, structuring the static task instructions as the first section and the dynamic PR context at the very end would maximize prefix cache reuse across unrelated runs.
No code change needed; this benefit flows naturally from Rec 1 (shorter, more stable prompt).
Expected Impact
| Metric |
Current (avg) |
Projected |
Savings |
| Total tokens/run |
~2,362K |
~500–700K |
~70–79% |
| Input tokens/run |
~1,222K |
~150–200K |
~84% |
| Cost/run |
$4.54 |
$0.60–0.90 |
~80% |
| LLM turns |
~50 est. |
~5 |
~90% |
| Token variance |
3.2× |
<1.5× |
dramatically reduced |
| Cache hit rate |
48.1% |
60%+ |
+12 pts |
Most of the savings come from Rec 1 (pre-steps) alone. Rec 2 (remove github tools) is a free 19% gain requiring only 3 lines deleted.
Implementation Checklist
Generated by Daily Copilot Token Optimization Advisor · ● 611.6K · ◷
Target Workflow:
build-test.mdSource report: #1768
Estimated cost per run: $4.54 (avg), up to $7.77 (worst run)
Total tokens per run: ~2,362K avg (1,222K input + 1,134K cache read + 7K output)
Cache hit rate: 48.1%
LLM turns: ~40–80 estimated (8 ecosystems × 5–10 bash turns each)
I/O ratio: 358:1
Token variance: 3.2× across 3 runs (584K → 1,208K → 1,871K input)
Current Configuration
bash: [*],github:(notoolsets:— loads all ~22 tools)bash(all work),safeoutputs-add_comment,safeoutputs-add_labelsdefaults, github, node, go, rust, crates.io, java, dotnet, bun.sh, deno.land, jsr.io, dl.deno.land(12 groups)Root Cause Analysis
The wide token variance (3.2×) and very high I/O ratio (358:1) have two main drivers:
The agent performs all work via bash tool calls — 8 sequential ecosystems with clone → install → build → test, each generating large stdout/stderr that accumulates in the context window. Maven (
mvn compile && mvn test) and Cargo (cargo build && cargo test) are especially verbose. The 12MB agent artifact is primarily the Maven.m2/repository cache, confirming this.GitHub MCP tools loaded but never used — Without
toolsets:, the GitHub MCP server loads ~22 tools. The workflow exclusively usesbashforgh repo clone, builds, and tests; andsafeoutputsfor the PR comment and label. The GitHub MCP tool schemas add ~13,000+ tokens per LLM turn.Recommendations
1. Move all build execution to pre-agent
steps:(eliminates ~80% of tokens)Estimated savings: ~1,700K–1,900K tokens/run (~72–80%)
The agent currently runs ~40–80 bash turns (8 ecosystems × clone + install + build + test). Each turn adds output to the context window, compounding costs. Moving deterministic build work to pre-agent
steps:reduces agent turns from ~50 to ~3–5.Add a
steps:section before the prompt body:Then rewrite the prompt body to just interpret and post results:
2. Remove
github:tools (no toolsets needed)Estimated savings: ~450K tokens/run (~19%)
The workflow uses only
bashfor allghCLI operations andsafeoutputsfor PR interaction. The GitHub MCP server tools (search_issues, create_pull_request, list_commits, etc.) are never called. Loading all ~22 tools adds ~13,000 tokens per LLM turn from tool schemas.Change in
build-test.md— remove thegithub:block entirely:3. Suppress verbose build output (quick wins for remaining bash turns)
Estimated savings: ~100–200K tokens/run (~5–8%)
Even after moving to pre-steps, the agent still reads outputs for formatting. Using quiet flags on build tools reduces context bloat:
mvn compile && mvn testmvn -q compile && mvn -q testcargo build && cargo testcargo build -q && cargo test -qnpm installnpm install --quietmakemake -sThese flags are already included in the pre-step script in Rec 1 above.
4. Improve prompt caching with stable prefix
Estimated savings: ~100–150K tokens/run via improved cache utilization
The current cache hit rate of 48.1% could reach 65%+ by ensuring the stable system context (tool instructions, task descriptions) appears before any dynamic content (PR number, branch name, event context). Automatic OpenAI prompt caching benefits runs that execute within ~5 minutes of each other on the same prompt prefix.
The 3 analyzed runs were on different branches — this explains cache misses since the PR context in the prompt prefix differs. However, structuring the static task instructions as the first section and the dynamic PR context at the very end would maximize prefix cache reuse across unrelated runs.
No code change needed; this benefit flows naturally from Rec 1 (shorter, more stable prompt).
Expected Impact
Implementation Checklist
steps:section with install, clone, and build-results steps tobuild-test.mdgithub:block fromtools:inbuild-test.mdghCLI still works in bash (it usesGITHUB_TOKENenv, not MCP token)gh aw compile .github/workflows/build-test.mdnpx tsx scripts/ci/postprocess-smoke-workflows.tsagent_usage.jsonartifact from new run vs baseline (expect ~$0.60-0.90 vs $4.54)