Target Workflow: security-guard
Note: No claude-token-usage-report issue was found; this analysis is derived directly from pre-downloaded run logs at the time of this workflow execution.
**Source (redacted) 10 runs from 2026-04-17 → 2026-04-18
Estimated cost per run: ~$0.31
Total tokens per run: ~317K
Effective (non-cached) tokens/run: ~89K
Cache hit rate: ~72% ✅ (excellent)
LLM turns: avg 7.6 (max configured: 12)
Total period cost: $3.12
Current Configuration
| Setting |
Value |
| Tools loaded |
github — pull_requests, repos toolsets only |
| Tools actually used |
get_pull_request, get_pull_request_files, get_file_contents (est.) |
| Network groups |
github only |
| Pre-agent steps |
✅ Yes — PR diff fetch + security-relevance check |
| Prompt body size |
~4,625 chars (~1,156 tokens) |
| Content after variable PR diff |
~2,547 chars (~637 tokens) — currently NOT prefix-cacheable |
Recommendations
1. Reorder prompt to maximize prefix caching
Estimated savings: ~2,800 tokens/run (~3–4%)
Problem: The prompt currently injects the variable $\{\{ steps.pr-diff.outputs.PR_FILES }} block in the middle of the prompt body. Everything that appears after a variable substitution cannot be prefix-cached by Anthropic (cache is based on a stable prefix). The "Your Task", "Security Checks", and "Output Format" sections (~637 tokens) come after the PR diff and are therefore re-charged as new input on every turn.
Fix: Move the PR diff block to the very end of the prompt, after all static instruction sections. This makes the entire instruction set (~3,900 chars) cacheable as a stable prefix.
Change in security-guard.md:
Current order:
## Repository Context
...
## Changed Files (Pre-fetched)
$\{\{ steps.pr-diff.outputs.PR_FILES }} ← breaks cache here
## Your Task ← NOT cached (637 tokens × 7.6 turns)
## Security Checks
## Output Format
Proposed order:
## Repository Context
...
## Your Task ← now fully cached after turn 1
## Security Checks
## Output Format
## Changed Files (Pre-fetched)
$\{\{ steps.pr-diff.outputs.PR_FILES }} ← variable content at the end
Estimated savings calculation:
- 637 tokens × 6.6 subsequent turns × ($3.00 − $0.30)/1M = ~$0.011/run → ~$0.11 over 10 runs
- Secondary benefit: reduces cache write cost on turn 1 since the static prefix is longer and shared
2. Reduce PR diff character cap from 8,000 to 5,000
Estimated savings: ~5,700 tokens/run (~1.8%) + reduced context window pressure
Problem: The head -c 8000 limit on the PR diff sends up to ~2,000 tokens of diff per turn. For most security-relevant PRs (touching a handful of files), 5,000 chars (~1,250 tokens) is sufficient to capture meaningful changes. Oversized diffs add context that the agent scrolls through without acting on.
Change in the pr-diff step:
# Current
| head -c 8000 || true
# Proposed
| head -c 5000 || true
Savings: ~750 tokens × 7.6 turns × $3/1M = ~$0.017/run → ~$0.17 over 10 runs
Consider adding a warning comment when the diff is truncated so the agent knows to call get_file_contents for completeness:
| { head -c 5000; echo -e "\n[DIFF TRUNCATED at 5000 chars — use get_file_contents for full context]"; } || true
3. Skip agent execution entirely for non-security PRs
Estimated savings: ~$0.31/run for any PR with zero security-critical files (~1 full run avoided)
Problem: The security-relevance step already computes security_files_changed. When it is 0, the prompt tells the agent to call noop immediately — but the agent still starts, loads tools, and uses at least 1 full LLM turn (~40K tokens) before emitting noop.
Fix: Add a workflow-level if: condition (if supported by the gh-aw engine) or a synthetic output that prevents the agent from receiving the task. Alternatively, add an explicit pre-step that writes a sentinel:
- name: Skip if no security files
id: should-run
if: github.event.pull_request.number
run: |
COUNT="$\{\{ steps.security-relevance.outputs.security_files_changed }}"
if [ "$COUNT" = "0" ]; then
echo "skip=true" >> "$GITHUB_OUTPUT"
else
echo "skip=false" >> "$GITHUB_OUTPUT"
fi
Then update the prompt header to check $\{\{ steps.should-run.outputs.skip }} — if the engine supports conditional agent execution, gate it there.
If the engine does not support job-level skipping: At minimum, restructure the noop instruction to be the very first paragraph of the prompt (before Repository Context) so the agent encounters it immediately and exits in turn 1 rather than reading the full context first.
4. Reduce max-turns from 12 to 10
Estimated savings: Cost ceiling reduction (no direct per-run savings unless limit is hit)
Average turns are 7.6. The current ceiling of 12 allows worst-case runs to reach 58% more turns than average. Reducing to 10 caps runaway sessions while still providing buffer above the average.
engine:
id: claude
max-turns: 10 # was 12; avg is 7.6
Cache Analysis
Per-run token breakdown (averages across 10 runs):
| Metric |
Tokens |
% of Total |
| Total tokens/run |
~317,468 |
100% |
| Effective (new) tokens |
~88,624 |
28% |
| Cache reads |
~228,844 |
72% |
Cache assessment: The 72% cache hit rate is strong. The static "Repository Context" and security component descriptions are clearly being cached as a prefix. The main uncached cost driver is the PR diff (variable per run) and the ~637 tokens of instructions that currently appear after it (Recommendation #1 addresses this).
Cache write amortization: Turn 1 writes the static prefix (~800 tokens × $3.75/1M = $0.003); turns 2–7.6 read it back at $0.30/1M. At 6.6 reads × 800 tokens: $0.00158 in reads, well below the write cost. The cache investment pays off around turn 2, which is favorable given avg 7.6 turns/run.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
~317K |
~311K |
~−2% |
| Effective tokens/run |
~89K |
~83K |
~−7% |
| Cost/run |
$0.312 |
~$0.285 |
~−9% |
| LLM turns (max) |
12 |
10 |
−2 ceiling |
| PR diff size |
8K chars |
5K chars |
−37% diff input |
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · ● 249.4K · ◷
Target Workflow:
security-guard**Source (redacted) 10 runs from 2026-04-17 → 2026-04-18
Estimated cost per run: ~$0.31
Total tokens per run: ~317K
Effective (non-cached) tokens/run: ~89K
Cache hit rate: ~72% ✅ (excellent)
LLM turns: avg 7.6 (max configured: 12)
Total period cost: $3.12
Current Configuration
github—pull_requests,repostoolsets onlyget_pull_request,get_pull_request_files,get_file_contents(est.)githubonlyRecommendations
1. Reorder prompt to maximize prefix caching
Estimated savings: ~2,800 tokens/run (~3–4%)
Problem: The prompt currently injects the variable
$\{\{ steps.pr-diff.outputs.PR_FILES }}block in the middle of the prompt body. Everything that appears after a variable substitution cannot be prefix-cached by Anthropic (cache is based on a stable prefix). The "Your Task", "Security Checks", and "Output Format" sections (~637 tokens) come after the PR diff and are therefore re-charged as new input on every turn.Fix: Move the PR diff block to the very end of the prompt, after all static instruction sections. This makes the entire instruction set (~3,900 chars) cacheable as a stable prefix.
Change in
security-guard.md:Current order:
Proposed order:
Estimated savings calculation:
2. Reduce PR diff character cap from 8,000 to 5,000
Estimated savings: ~5,700 tokens/run (~1.8%) + reduced context window pressure
Problem: The
head -c 8000limit on the PR diff sends up to ~2,000 tokens of diff per turn. For most security-relevant PRs (touching a handful of files), 5,000 chars (~1,250 tokens) is sufficient to capture meaningful changes. Oversized diffs add context that the agent scrolls through without acting on.Change in the
pr-diffstep:Savings: ~750 tokens × 7.6 turns × $3/1M = ~$0.017/run → ~$0.17 over 10 runs
Consider adding a warning comment when the diff is truncated so the agent knows to call
get_file_contentsfor completeness:3. Skip agent execution entirely for non-security PRs
Estimated savings: ~$0.31/run for any PR with zero security-critical files (~1 full run avoided)
Problem: The
security-relevancestep already computessecurity_files_changed. When it is0, the prompt tells the agent to callnoopimmediately — but the agent still starts, loads tools, and uses at least 1 full LLM turn (~40K tokens) before emitting noop.Fix: Add a workflow-level
if:condition (if supported by the gh-aw engine) or a synthetic output that prevents the agent from receiving the task. Alternatively, add an explicit pre-step that writes a sentinel:Then update the prompt header to check
$\{\{ steps.should-run.outputs.skip }}— if the engine supports conditional agent execution, gate it there.If the engine does not support job-level skipping: At minimum, restructure the noop instruction to be the very first paragraph of the prompt (before Repository Context) so the agent encounters it immediately and exits in turn 1 rather than reading the full context first.
4. Reduce
max-turnsfrom 12 to 10Estimated savings: Cost ceiling reduction (no direct per-run savings unless limit is hit)
Average turns are 7.6. The current ceiling of 12 allows worst-case runs to reach 58% more turns than average. Reducing to 10 caps runaway sessions while still providing buffer above the average.
Cache Analysis
Per-run token breakdown (averages across 10 runs):
Cache assessment: The 72% cache hit rate is strong. The static "Repository Context" and security component descriptions are clearly being cached as a prefix. The main uncached cost driver is the PR diff (variable per run) and the ~637 tokens of instructions that currently appear after it (Recommendation #1 addresses this).
Cache write amortization: Turn 1 writes the static prefix (~800 tokens × $3.75/1M = $0.003); turns 2–7.6 read it back at $0.30/1M. At 6.6 reads × 800 tokens: $0.00158 in reads, well below the write cost. The cache investment pays off around turn 2, which is favorable given avg 7.6 turns/run.
Expected Impact
Implementation Checklist
head -c 8000→head -c 5000in thepr-diffstep (add truncation notice)max-turns: 12→max-turns: 10in front matterif:based on step outputs to skip agent for non-security PRsgh aw compile .github/workflows/security-guard.mdnpx tsx scripts/ci/postprocess-smoke-workflows.ts