Skip to content

⚡ Claude Token Optimization2026-04-18 — Security Guard #2083

@github-actions

Description

@github-actions

Target Workflow: security-guard

Note: No claude-token-usage-report issue was found; this analysis is derived directly from pre-downloaded run logs at the time of this workflow execution.

**Source (redacted) 10 runs from 2026-04-17 → 2026-04-18
Estimated cost per run: ~$0.31
Total tokens per run: ~317K
Effective (non-cached) tokens/run: ~89K
Cache hit rate: ~72% ✅ (excellent)
LLM turns: avg 7.6 (max configured: 12)
Total period cost: $3.12


Current Configuration

Setting Value
Tools loaded githubpull_requests, repos toolsets only
Tools actually used get_pull_request, get_pull_request_files, get_file_contents (est.)
Network groups github only
Pre-agent steps ✅ Yes — PR diff fetch + security-relevance check
Prompt body size ~4,625 chars (~1,156 tokens)
Content after variable PR diff ~2,547 chars (~637 tokens) — currently NOT prefix-cacheable

Recommendations

1. Reorder prompt to maximize prefix caching

Estimated savings: ~2,800 tokens/run (~3–4%)

Problem: The prompt currently injects the variable $\{\{ steps.pr-diff.outputs.PR_FILES }} block in the middle of the prompt body. Everything that appears after a variable substitution cannot be prefix-cached by Anthropic (cache is based on a stable prefix). The "Your Task", "Security Checks", and "Output Format" sections (~637 tokens) come after the PR diff and are therefore re-charged as new input on every turn.

Fix: Move the PR diff block to the very end of the prompt, after all static instruction sections. This makes the entire instruction set (~3,900 chars) cacheable as a stable prefix.

Change in security-guard.md:

Current order:

## Repository Context
...
## Changed Files (Pre-fetched)
$\{\{ steps.pr-diff.outputs.PR_FILES }}   ← breaks cache here

## Your Task                            ← NOT cached (637 tokens × 7.6 turns)
## Security Checks
## Output Format

Proposed order:

## Repository Context
...
## Your Task                            ← now fully cached after turn 1
## Security Checks
## Output Format

## Changed Files (Pre-fetched)
$\{\{ steps.pr-diff.outputs.PR_FILES }}   ← variable content at the end

Estimated savings calculation:

  • 637 tokens × 6.6 subsequent turns × ($3.00 − $0.30)/1M = ~$0.011/run → ~$0.11 over 10 runs
  • Secondary benefit: reduces cache write cost on turn 1 since the static prefix is longer and shared

2. Reduce PR diff character cap from 8,000 to 5,000

Estimated savings: ~5,700 tokens/run (~1.8%) + reduced context window pressure

Problem: The head -c 8000 limit on the PR diff sends up to ~2,000 tokens of diff per turn. For most security-relevant PRs (touching a handful of files), 5,000 chars (~1,250 tokens) is sufficient to capture meaningful changes. Oversized diffs add context that the agent scrolls through without acting on.

Change in the pr-diff step:

# Current
| head -c 8000 || true

# Proposed
| head -c 5000 || true

Savings: ~750 tokens × 7.6 turns × $3/1M = ~$0.017/run → ~$0.17 over 10 runs

Consider adding a warning comment when the diff is truncated so the agent knows to call get_file_contents for completeness:

| { head -c 5000; echo -e "\n[DIFF TRUNCATED at 5000 chars — use get_file_contents for full context]"; } || true

3. Skip agent execution entirely for non-security PRs

Estimated savings: ~$0.31/run for any PR with zero security-critical files (~1 full run avoided)

Problem: The security-relevance step already computes security_files_changed. When it is 0, the prompt tells the agent to call noop immediately — but the agent still starts, loads tools, and uses at least 1 full LLM turn (~40K tokens) before emitting noop.

Fix: Add a workflow-level if: condition (if supported by the gh-aw engine) or a synthetic output that prevents the agent from receiving the task. Alternatively, add an explicit pre-step that writes a sentinel:

- name: Skip if no security files
  id: should-run
  if: github.event.pull_request.number
  run: |
    COUNT="$\{\{ steps.security-relevance.outputs.security_files_changed }}"
    if [ "$COUNT" = "0" ]; then
      echo "skip=true" >> "$GITHUB_OUTPUT"
    else
      echo "skip=false" >> "$GITHUB_OUTPUT"
    fi

Then update the prompt header to check $\{\{ steps.should-run.outputs.skip }} — if the engine supports conditional agent execution, gate it there.

If the engine does not support job-level skipping: At minimum, restructure the noop instruction to be the very first paragraph of the prompt (before Repository Context) so the agent encounters it immediately and exits in turn 1 rather than reading the full context first.


4. Reduce max-turns from 12 to 10

Estimated savings: Cost ceiling reduction (no direct per-run savings unless limit is hit)

Average turns are 7.6. The current ceiling of 12 allows worst-case runs to reach 58% more turns than average. Reducing to 10 caps runaway sessions while still providing buffer above the average.

engine:
  id: claude
  max-turns: 10   # was 12; avg is 7.6

Cache Analysis

Per-run token breakdown (averages across 10 runs):

Metric Tokens % of Total
Total tokens/run ~317,468 100%
Effective (new) tokens ~88,624 28%
Cache reads ~228,844 72%

Cache assessment: The 72% cache hit rate is strong. The static "Repository Context" and security component descriptions are clearly being cached as a prefix. The main uncached cost driver is the PR diff (variable per run) and the ~637 tokens of instructions that currently appear after it (Recommendation #1 addresses this).

Cache write amortization: Turn 1 writes the static prefix (~800 tokens × $3.75/1M = $0.003); turns 2–7.6 read it back at $0.30/1M. At 6.6 reads × 800 tokens: $0.00158 in reads, well below the write cost. The cache investment pays off around turn 2, which is favorable given avg 7.6 turns/run.


Expected Impact

Metric Current Projected Savings
Total tokens/run ~317K ~311K ~−2%
Effective tokens/run ~89K ~83K ~−7%
Cost/run $0.312 ~$0.285 ~−9%
LLM turns (max) 12 10 −2 ceiling
PR diff size 8K chars 5K chars −37% diff input

Implementation Checklist

  • Reorder prompt: move "Your Task", "Security Checks", "Output Format" sections before "Changed Files" block
  • Change head -c 8000head -c 5000 in the pr-diff step (add truncation notice)
  • Change max-turns: 12max-turns: 10 in front matter
  • Investigate whether gh-aw supports job-level if: based on step outputs to skip agent for non-security PRs
  • Recompile: gh aw compile .github/workflows/security-guard.md
  • Post-process: npx tsx scripts/ci/postprocess-smoke-workflows.ts
  • Verify CI passes on PR
  • Compare token usage on new run vs this baseline (~317K tokens/run, $0.312/run)

Generated by Daily Claude Token Optimization Advisor · ● 249.4K ·

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions