[copilot-token-optimizer] CI Cleaner deep audit: recurring failures, optimization recommendations

## Overview

Deep audit of the **CI Cleaner** (`hourly-ci-cleaner`) workflow covering all available runs across the last ~14 days (and historical data back to January 2026). This analysis was performed by inspecting workflow source files, compiled lock files, failure issues, and successful PRs — since direct `gh aw logs` / `gh aw audit` requires authentication not available in this context.

---

## 1. Run Inventory

### Recent runs (last 14 days) — from failure issues

| Run ID | Date | Outcome | Failure Category |
|--------|------|---------|-----------------|
| 23984950167 | ~Apr 4–5, 2026 | Requested audit target | Unknown (no issue filed yet) |
| 23973398525 | Apr 4, 2026 | ❌ Failed | No Safe Outputs |
| 23915974830 | Apr 2, 2026 | ❌ Failed | No Safe Outputs |
| 23846\* (PR) | Apr 1, 2026 | ✅ PR created | Successful fix |
| 23505275817 | Mar 24, 2026 | ❌ Failed | No Safe Outputs |
| 23209503810 | Mar 17, 2026 | ❌ Failed | Protected Files blocked |
| 22917473293 | Mar 10, 2026 | ❌ Failed | E003: >100 files in PR |
| 22498067371 | Feb 27, 2026 | ❌ Failed | Code Push Failed |
| 22073317637 | Feb 16, 2026 | ❌ Failed | Unclassified failure |
| 21506393373 | Jan 30, 2026 | ❌ Failed | No Safe Outputs |

### Successful PRs created by CI Cleaner (all time)

| PR | Date | What was fixed |
|----|------|---------------|
| #24559 | Apr 4 | Add missing mocks to parse_mcp_gateway_log test |
| #24033 | Apr 2 | Fix JSDoc type annotation in parse_mcp_gateway_log.cjs |
| #23846 | Apr 1 | Update golden files for awf v0.25.6 + mcpg v0.2.11 |
| #23419 | Mar 29 | Update wasm golden files for v0.25.3 downgrade |
| #22624 | Mar 24 | Update wasm golden files for aw_context description change |
| #22486 | Mar 23 | Update wasm golden files for gh-aw-mcpg v0.2.1 upgrade |
| #20566 | Mar 11 | Update test expectations for download-artifact v8.0.1 |
| #18456 | Feb 26 | Fix verbose flag + update action pins count |
| #17253 | Feb 20 | Fix GitHub App token for GITHUB_MCP_SERVER_TOKEN |
| #15735 | Feb 14 | Fix Go version + golangci-lint config |
| #11830 | Jan 26 | Format package-lock.json |
| #11375 | Jan 22 | Fix linting issues and test failures |

**Overall failure rate: ~43%** (9 failure issues vs 12 successful PRs, not counting early-exit noop runs)

---

## 2. Workflow Configuration

### Current Setup

```yaml
engine: copilot
schedule: '15 6,18 * * *'   # Twice daily, 6am & 6pm UTC
timeout-minutes: 45
tools:
  github: { toolsets: [default] }
  bash: ["*"]
  edit: {}
sandbox:
  agent:
    mounts:
      - /usr/bin/make → make (ro)
      - /usr/bin/go → go (ro)
      - /usr/local/bin/node (ro)
      - /usr/local/bin/npm (ro)
      - /usr/local/lib/node_modules (ro)
      - /opt/hostedtoolcache/go (ro)
safe-outputs:
  create-pull-request:
    expires: 2d
    title-prefix: "[ca] "
    protected-files: fallback-to-issue
  missing-tool: {}
```

**Token budget target (from scratchpad/token-budget-guidelines.md):**
- Target: 68K–90K tokens/run
- Alert threshold: >120K
- Critical threshold: >150K

**⚠️ Important constraint: Copilot engine does NOT support `max-turns`** — only `timeout-minutes: 45` limits runaway sessions.

---

## 3. Failure Pattern Analysis

### Pattern A — No Safe Outputs (4 occurrences, most critical)

**Affected runs:** 23973398525, 23915974830, 23505275817, 21506393373

The agent job completes (exit 0) but neither `noop` nor `create_pull_request` is ever called. This triggers the `[aw] CI Cleaner failed` issue automatically.

**Root causes hypothesised:**
1. The agent encounters an unrecoverable error mid-task and exits early without reaching the exit protocol
2. The Copilot agent's context window fills up and it terminates without calling the mandatory exit tool
3. The agent calls a safe-output tool that silently fails (MCP connection drop)

**Impact:** Creates noise issues even when no change was needed; leaves CI state ambiguous.

### Pattern B — E003: Pull Request Too Large (2 occurrences)

**Affected run:** 22917473293 (166 files in one PR!)

When the agent runs `make recompile`, it regenerates **all 40+ `.lock.yml` files** across the repository, even if only 1–2 workflow `.md` files changed. Combined with Go file changes, the PR easily exceeds the 100-file hard limit.

**Example:** Run 22917473293 — fix needed ~5 files but `make recompile` regenerated 166 total.

### Pattern C — Protected Files (1 occurrence)

**Affected run:** 23209503810

The agent ran `make recompile`, which regenerated `.github/workflows/daily-safe-output-integrator.lock.yml` — a protected file. The `fallback-to-issue` config is present, but may not have been at the time of this run, or the issue was filed before the config was in place.

**Status:** Likely resolved by current `protected-files: fallback-to-issue` config.

### Pattern D — Code Push Failed / Permission Errors (2 occurrences)

**Affected runs:** 22498067371, 22073317637

Generic push/PR creation failures — likely transient token permission issues or branch protection edge cases.

---

## 4. Configured vs. Actually Used Tools

### Configured
| Tool | Config |
|------|--------|
| `github` | `toolsets: [default]` → repos, issues, pull_requests, etc. |
| `bash` | `["*"]` → all commands |
| `edit` | file editing |
| `noop` | via safe-outputs MCP |
| `create_pull_request` | via safe-outputs MCP |
| `missing-tool` | via safe-outputs MCP |

### Most-Used (inferred from PR content)
1. **`bash`** — `make fmt`, `make lint`, `make test-unit`, `make recompile`, `git` — every run
2. **`edit`** — File edits on golden files (`.golden`), Go files, `.cjs` files — ~10 successful PRs
3. **`create_pull_request`** (safe-outputs) — 12 successful PRs
4. **`github`** — Reading CI run status, listing workflow runs — every run (via `check_ci_status` job)
5. **`noop`** (safe-outputs) — Should be called on passing CI; unknown how often

### Most Common Fix Types (from PR analysis)
| Fix Type | Count | % of successful runs |
|----------|-------|---------------------|
| Golden/wasm test file updates | 7 | 58% |
| Linting/formatting fixes | 2 | 17% |
| Test expectation updates | 1 | 8% |
| Go compatibility fixes | 1 | 8% |
| Package/dependency fixes | 1 | 8% |

---

## 5. Missing Tools / MCP Failures

From the collected failure issues:
- No `missing-tool` type failures were observed in the failure issues (the `missing-tool` safe-output type was never triggered)
- All failures were either: no safe output generated, protected files, or push errors
- The GitHub MCP toolset (`default`) appears sufficient for the workflow's needs

---

## 6. Cache Efficiency

No direct token cache data available (requires `gh aw audit` with auth). From the token budget guidelines:
- CI Cleaner is one of the **lowest-cost** monitored workflows (68K–90K target vs. 300K–1.59M for heavier workflows)
- Most fixes are small and deterministic (golden file updates, formatting)
- **Theoretical cache potential is high** since the repo's Go source rarely changes dramatically between twice-daily runs

---

## 7. Optimization Recommendations

### 🔴 High Priority

#### Rec 1: Fix "No Safe Outputs" — add a final fallback assertion in the prompt

The agent must always call `noop` or `create_pull_request` but fails ~44% of the time. Strengthen enforcement:

```markdown
## ABSOLUTE FINAL RULE (cannot be skipped)

Before your response ends — no matter what happened — you MUST call one of:
- `create_pull_request` if you changed any files
- `noop` if you changed nothing

**If you are about to end without calling a safe-output tool, call `noop` right now.**
```

Consider also adding a workflow-level fallback: if the agent job exits 0 with no safe-output, auto-trigger a noop issue instead of a failure issue.

#### Rec 2: Scope-limit `make recompile` to prevent E003 failures

The agent should only recompile if workflow `.md` files were modified:

```markdown
**Recompile only when necessary:**
- Run `git diff --name-only | grep '\.md$'` to check if any workflow files changed
- If NO .md files changed, **SKIP `make recompile`** entirely
- If .md files changed, run `make recompile` but then check: `git diff --name-only | wc -l`
- If more than 50 files changed, something is wrong — stop and call `noop` instead of creating a 166-file PR
```

### 🟡 Medium Priority

#### Rec 3: Add `max-turns` via engine switch to Claude for better token budget control

Since Copilot doesn't support `max-turns`, consider switching to Claude with a hard turn limit for predictable token spend:

```yaml
engine:
  id: claude
  max-turns: 20   # Enough for fmt→lint→test→recompile cycle
```

Per token-budget-guidelines, CI Cleaner's target is 68K–90K — well within Claude's economics.

#### Rec 4: Add explicit file-count check before PR creation

Add a bash step or prompt instruction to guard against oversized PRs:

```bash
# Check before creating PR
CHANGED=$(git diff --cached --name-only | wc -l)
if [ "$CHANGED" -gt 80 ]; then
  echo "Too many files changed ($CHANGED). Calling noop instead."
  # call noop with explanation
fi
```

#### Rec 5: Only checkout main before starting, verify CI actually fails

The early-exit guard (`check_ci_status`) works well. To reduce token spend when CI is borderline/flapping, add a secondary check at the start of the agent prompt:

```markdown
## Verify CI status (re-check before proceeding)

Run: `gh run list --workflow=ci.yml --branch=main --limit=3 --json conclusion,status`

If the most recent 2 completed runs are both "success", call `noop` immediately — CI has self-healed.
```

### 🟢 Low Priority

#### Rec 6: Reduce `make deps-dev` calls

The agent instructions mention `make agent-finish` which takes 10–15 minutes and includes `make deps-dev`. The workflow already installs deps in the `steps:` section. Ensure the agent prompt explicitly says NOT to re-run `make deps-dev` or `make agent-finish` unless absolutely necessary.

#### Rec 7: Add run-specific context to the agent prompt

Currently the agent gets `ci_run_id` but doesn't automatically download the CI failure logs. Add a setup step that pre-fetches the failed job logs and injects them into the prompt context — this would cut tool call iterations needed to diagnose the failure.

---

## 8. Workflow Source (full content: `hourly-ci-cleaner.md`)

<details>
<summary>View full workflow source</summary>

The workflow is at `.github/workflows/hourly-ci-cleaner.md` with imports from `.github/agents/ci-cleaner.agent.md`.

**Key characteristics:**
- **Schedule:** `15 6,18 * * *` (twice daily, 6am & 6pm UTC)
- **Engine:** `copilot` (agent: `ci-cleaner`)  
- **Timeout:** 45 minutes
- **Early-exit guard:** `check_ci_status` job checks last CI run on `main`; agent only fires if `ci_needs_fix == 'true'`
- **Safe outputs:** `create-pull-request` (expires: 2d, prefix: `[ca] `, protected-files: fallback-to-issue), `missing-tool`
- **Four core tasks:** `make fmt` → `make lint` → `make test-unit` → `make recompile`

</details>

---

## 9. Summary Statistics

| Metric | Value |
|--------|-------|
| Total failure issues filed | 9 |
| Total successful PRs created | 12 |
| Estimated failure rate | ~43% |
| Most common failure | No Safe Outputs (44% of failures) |
| Most common fix | Golden file updates (58% of successes) |
| Token budget target | 68K–90K/run |
| Max-turns support | ❌ Not available (Copilot engine) |
| Hard timeout | 45 minutes |
| Scheduled frequency | 2× daily |
| Tools configured | github (default), bash (*), edit, safe-outputs |
| Missing tool reports | 0 observed |

---

*Note: Direct `gh aw audit` per-run telemetry (token counts, turn counts, tool call frequencies) was unavailable in the current environment (requires `gh auth login`). This analysis is based on workflow source inspection, failure issues, and merged PRs. For exact token/turn data, run:*
```bash
gh aw logs hourly-ci-cleaner -c 20 --start-date -14d --json
gh aw audit 23984950167
```







> Generated by [Copilot Token Usage Optimizer](https://github.com/github/gh-aw/actions/runs/23990500498/agentic_workflow) · ● 11.6M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fcopilot-token-optimizer%22&type=issues)
> - [x] expires  on Apr 12, 2026, 12:44 AM UTC

Tool	Config
`github`	`toolsets: [default]` → repos, issues, pull_requests, etc.
`bash`	`["*"]` → all commands
`edit`	file editing
`noop`	via safe-outputs MCP
`create_pull_request`	via safe-outputs MCP
`missing-tool`	via safe-outputs MCP

Run ID	Date	Outcome	Failure Category
23984950167	~Apr 4–5, 2026	Requested audit target	Unknown (no issue filed yet)
23973398525	Apr 4, 2026	❌ Failed	No Safe Outputs
23915974830	Apr 2, 2026	❌ Failed	No Safe Outputs
23846* (PR)	Apr 1, 2026	✅ PR created	Successful fix
23505275817	Mar 24, 2026	❌ Failed	No Safe Outputs
23209503810	Mar 17, 2026	❌ Failed	Protected Files blocked
22917473293	Mar 10, 2026	❌ Failed	E003: >100 files in PR
22498067371	Feb 27, 2026	❌ Failed	Code Push Failed
22073317637	Feb 16, 2026	❌ Failed	Unclassified failure
21506393373	Jan 30, 2026	❌ Failed	No Safe Outputs

PR	Date	What was fixed
#24559	Apr 4	Add missing mocks to parse_mcp_gateway_log test
#24033	Apr 2	Fix JSDoc type annotation in parse_mcp_gateway_log.cjs
#23846	Apr 1	Update golden files for awf v0.25.6 + mcpg v0.2.11
#23419	Mar 29	Update wasm golden files for v0.25.3 downgrade
#22624	Mar 24	Update wasm golden files for aw_context description change
#22486	Mar 23	Update wasm golden files for gh-aw-mcpg v0.2.1 upgrade
#20566	Mar 11	Update test expectations for download-artifact v8.0.1
#18456	Feb 26	Fix verbose flag + update action pins count
#17253	Feb 20	Fix GitHub App token for GITHUB_MCP_SERVER_TOKEN
#15735	Feb 14	Fix Go version + golangci-lint config
#11830	Jan 26	Format package-lock.json
#11375	Jan 22	Fix linting issues and test failures

Fix Type	Count	% of successful runs
Golden/wasm test file updates	7	58%
Linting/formatting fixes	2	17%
Test expectation updates	1	8%
Go compatibility fixes	1	8%
Package/dependency fixes	1	8%

Metric	Value
Total failure issues filed	9
Total successful PRs created	12
Estimated failure rate	~43%
Most common failure	No Safe Outputs (44% of failures)
Most common fix	Golden file updates (58% of successes)
Token budget target	68K–90K/run
Max-turns support	❌ Not available (Copilot engine)
Hard timeout	45 minutes
Scheduled frequency	2× daily
Tools configured	github (default), bash (*), edit, safe-outputs
Missing tool reports	0 observed

[copilot-token-optimizer] CI Cleaner deep audit: recurring failures, optimization recommendations #24622

Description

Overview

1. Run Inventory

Recent runs (last 14 days) — from failure issues

Successful PRs created by CI Cleaner (all time)

2. Workflow Configuration

Current Setup

3. Failure Pattern Analysis

Pattern A — No Safe Outputs (4 occurrences, most critical)

Pattern B — E003: Pull Request Too Large (2 occurrences)

Pattern C — Protected Files (1 occurrence)

Pattern D — Code Push Failed / Permission Errors (2 occurrences)

4. Configured vs. Actually Used Tools

Configured

Most-Used (inferred from PR content)

Most Common Fix Types (from PR analysis)

5. Missing Tools / MCP Failures

6. Cache Efficiency

7. Optimization Recommendations

🔴 High Priority

Rec 1: Fix "No Safe Outputs" — add a final fallback assertion in the prompt

Rec 2: Scope-limit make recompile to prevent E003 failures

🟡 Medium Priority

Rec 3: Add max-turns via engine switch to Claude for better token budget control

Rec 4: Add explicit file-count check before PR creation

Rec 5: Only checkout main before starting, verify CI actually fails

🟢 Low Priority

Rec 6: Reduce make deps-dev calls

Rec 7: Add run-specific context to the agent prompt

8. Workflow Source (full content: hourly-ci-cleaner.md)

9. Summary Statistics

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Rec 2: Scope-limit `make recompile` to prevent E003 failures

Rec 3: Add `max-turns` via engine switch to Claude for better token budget control

Rec 6: Reduce `make deps-dev` calls

8. Workflow Source (full content: `hourly-ci-cleaner.md`)