⚡ Claude Token Optimization2026-04-18 — Security Guard

## Target Workflow: `security-guard`

> **Note:** No `claude-token-usage-report` issue was found; this analysis is derived directly from pre-downloaded run logs at the time of this workflow execution.

**Source (redacted) 10 runs from 2026-04-17 → 2026-04-18  
**Estimated cost per run:** ~$0.31  
**Total tokens per run:** ~317K  
**Effective (non-cached) tokens/run:** ~89K  
**Cache hit rate:** ~72% ✅ (excellent)  
**LLM turns:** avg 7.6 (max configured: 12)  
**Total period cost:** $3.12

---

## Current Configuration

| Setting | Value |
|---------|-------|
| Tools loaded | `github` — `pull_requests`, `repos` toolsets only |
| Tools actually used | `get_pull_request`, `get_pull_request_files`, `get_file_contents` (est.) |
| Network groups | `github` only |
| Pre-agent steps | ✅ Yes — PR diff fetch + security-relevance check |
| Prompt body size | ~4,625 chars (~1,156 tokens) |
| Content after variable PR diff | ~2,547 chars (~637 tokens) — currently NOT prefix-cacheable |

---

## Recommendations

### 1. Reorder prompt to maximize prefix caching

**Estimated savings:** ~2,800 tokens/run (~3–4%)

**Problem:** The prompt currently injects the variable `$\{\{ steps.pr-diff.outputs.PR_FILES }}` block in the *middle* of the prompt body. Everything that appears **after** a variable substitution cannot be prefix-cached by Anthropic (cache is based on a stable prefix). The "Your Task", "Security Checks", and "Output Format" sections (~637 tokens) come *after* the PR diff and are therefore re-charged as new input on every turn.

**Fix:** Move the PR diff block to the **very end** of the prompt, after all static instruction sections. This makes the entire instruction set (~3,900 chars) cacheable as a stable prefix.

**Change in `security-guard.md`:**

Current order:
```markdown
## Repository Context
...
## Changed Files (Pre-fetched)
$\{\{ steps.pr-diff.outputs.PR_FILES }}   ← breaks cache here

## Your Task                            ← NOT cached (637 tokens × 7.6 turns)
## Security Checks
## Output Format
```

Proposed order:
```markdown
## Repository Context
...
## Your Task                            ← now fully cached after turn 1
## Security Checks
## Output Format

## Changed Files (Pre-fetched)
$\{\{ steps.pr-diff.outputs.PR_FILES }}   ← variable content at the end
```

**Estimated savings calculation:**
- 637 tokens × 6.6 subsequent turns × ($3.00 − $0.30)/1M = ~$0.011/run → ~$0.11 over 10 runs
- Secondary benefit: reduces cache write cost on turn 1 since the static prefix is longer and shared

---

### 2. Reduce PR diff character cap from 8,000 to 5,000

**Estimated savings:** ~5,700 tokens/run (~1.8%) + reduced context window pressure

**Problem:** The `head -c 8000` limit on the PR diff sends up to ~2,000 tokens of diff per turn. For most security-relevant PRs (touching a handful of files), 5,000 chars (~1,250 tokens) is sufficient to capture meaningful changes. Oversized diffs add context that the agent scrolls through without acting on.

**Change in the `pr-diff` step:**

```bash
# Current
| head -c 8000 || true

# Proposed
| head -c 5000 || true
```

**Savings:** ~750 tokens × 7.6 turns × $3/1M = ~$0.017/run → ~$0.17 over 10 runs

Consider adding a warning comment when the diff is truncated so the agent knows to call `get_file_contents` for completeness:
```bash
| { head -c 5000; echo -e "\n[DIFF TRUNCATED at 5000 chars — use get_file_contents for full context]"; } || true
```

---

### 3. Skip agent execution entirely for non-security PRs

**Estimated savings:** ~$0.31/run for any PR with zero security-critical files (~1 full run avoided)

**Problem:** The `security-relevance` step already computes `security_files_changed`. When it is `0`, the prompt tells the agent to call `noop` immediately — but the agent still starts, loads tools, and uses at least 1 full LLM turn (~40K tokens) before emitting noop.

**Fix:** Add a workflow-level `if:` condition (if supported by the gh-aw engine) or a synthetic output that prevents the agent from receiving the task. Alternatively, add an explicit pre-step that writes a sentinel:

```yaml
- name: Skip if no security files
  id: should-run
  if: github.event.pull_request.number
  run: |
    COUNT="$\{\{ steps.security-relevance.outputs.security_files_changed }}"
    if [ "$COUNT" = "0" ]; then
      echo "skip=true" >> "$GITHUB_OUTPUT"
    else
      echo "skip=false" >> "$GITHUB_OUTPUT"
    fi
```

Then update the prompt header to check `$\{\{ steps.should-run.outputs.skip }}` — if the engine supports conditional agent execution, gate it there.

**If the engine does not support job-level skipping:** At minimum, restructure the noop instruction to be the very first paragraph of the prompt (before Repository Context) so the agent encounters it immediately and exits in turn 1 rather than reading the full context first.

---

### 4. Reduce `max-turns` from 12 to 10

**Estimated savings:** Cost ceiling reduction (no direct per-run savings unless limit is hit)

Average turns are 7.6. The current ceiling of 12 allows worst-case runs to reach 58% more turns than average. Reducing to 10 caps runaway sessions while still providing buffer above the average.

```yaml
engine:
  id: claude
  max-turns: 10   # was 12; avg is 7.6
```

---

## Cache Analysis

Per-run token breakdown (averages across 10 runs):

| Metric | Tokens | % of Total |
|--------|-------:|----------:|
| Total tokens/run | ~317,468 | 100% |
| Effective (new) tokens | ~88,624 | 28% |
| Cache reads | ~228,844 | 72% |

**Cache assessment:** The 72% cache hit rate is strong. The static "Repository Context" and security component descriptions are clearly being cached as a prefix. The main uncached cost driver is the PR diff (variable per run) and the ~637 tokens of instructions that currently appear after it (Recommendation #1 addresses this).

**Cache write amortization:** Turn 1 writes the static prefix (~800 tokens × $3.75/1M = $0.003); turns 2–7.6 read it back at $0.30/1M. At 6.6 reads × 800 tokens: $0.00158 in reads, well below the write cost. The cache investment pays off around turn 2, which is favorable given avg 7.6 turns/run.

---

## Expected Impact

| Metric | Current | Projected | Savings |
|--------|---------|-----------|---------|
| Total tokens/run | ~317K | ~311K | ~−2% |
| Effective tokens/run | ~89K | ~83K | ~−7% |
| Cost/run | $0.312 | ~$0.285 | ~−9% |
| LLM turns (max) | 12 | 10 | −2 ceiling |
| PR diff size | 8K chars | 5K chars | −37% diff input |

---

## Implementation Checklist

- [ ] Reorder prompt: move "Your Task", "Security Checks", "Output Format" sections before "Changed Files" block
- [ ] Change `head -c 8000` → `head -c 5000` in the `pr-diff` step (add truncation notice)
- [ ] Change `max-turns: 12` → `max-turns: 10` in front matter
- [ ] Investigate whether gh-aw supports job-level `if:` based on step outputs to skip agent for non-security PRs
- [ ] Recompile: `gh aw compile .github/workflows/security-guard.md`
- [ ] Post-process: `npx tsx scripts/ci/postprocess-smoke-workflows.ts`
- [ ] Verify CI passes on PR
- [ ] Compare token usage on new run vs this baseline (~317K tokens/run, $0.312/run)




> Generated by [Daily Claude Token Optimization Advisor](https://github.com/github/gh-aw-firewall/actions/runs/24612977751/agentic_workflow) · ● 249.4K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fclaude-token-optimizer%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Claude Token Optimization2026-04-18 — Security Guard #2083

Target Workflow: `security-guard`

Current Configuration

Recommendations

1. Reorder prompt to maximize prefix caching

2. Reduce PR diff character cap from 8,000 to 5,000

3. Skip agent execution entirely for non-security PRs

4. Reduce `max-turns` from 12 to 10

Cache Analysis

Expected Impact

Implementation Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Setting	Value
Tools loaded	`github` — `pull_requests`, `repos` toolsets only
Tools actually used	`get_pull_request`, `get_pull_request_files`, `get_file_contents` (est.)
Network groups	`github` only
Pre-agent steps	✅ Yes — PR diff fetch + security-relevance check
Prompt body size	~4,625 chars (~1,156 tokens)
Content after variable PR diff	~2,547 chars (~637 tokens) — currently NOT prefix-cacheable

Metric	Tokens	% of Total
Total tokens/run	~317,468	100%
Effective (new) tokens	~88,624	28%
Cache reads	~228,844	72%

Metric	Current	Projected	Savings
Total tokens/run	~317K	~311K	~−2%
Effective tokens/run	~89K	~83K	~−7%
Cost/run	$0.312	~$0.285	~−9%
LLM turns (max)	12	10	−2 ceiling
PR diff size	8K chars	5K chars	−37% diff input

⚡ Claude Token Optimization2026-04-18 — Security Guard #2083

Description

Target Workflow: security-guard

Current Configuration

Recommendations

1. Reorder prompt to maximize prefix caching

2. Reduce PR diff character cap from 8,000 to 5,000

3. Skip agent execution entirely for non-security PRs

4. Reduce max-turns from 12 to 10

Cache Analysis

Expected Impact

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Target Workflow: `security-guard`

4. Reduce `max-turns` from 12 to 10