-
Notifications
You must be signed in to change notification settings - Fork 328
[copilot-token-optimizer] CI Cleaner deep audit: recurring failures, optimization recommendations #24622
Description
Overview
Deep audit of the CI Cleaner (hourly-ci-cleaner) workflow covering all available runs across the last ~14 days (and historical data back to January 2026). This analysis was performed by inspecting workflow source files, compiled lock files, failure issues, and successful PRs — since direct gh aw logs / gh aw audit requires authentication not available in this context.
1. Run Inventory
Recent runs (last 14 days) — from failure issues
| Run ID | Date | Outcome | Failure Category |
|---|---|---|---|
| 23984950167 | ~Apr 4–5, 2026 | Requested audit target | Unknown (no issue filed yet) |
| 23973398525 | Apr 4, 2026 | ❌ Failed | No Safe Outputs |
| 23915974830 | Apr 2, 2026 | ❌ Failed | No Safe Outputs |
| 23846* (PR) | Apr 1, 2026 | ✅ PR created | Successful fix |
| 23505275817 | Mar 24, 2026 | ❌ Failed | No Safe Outputs |
| 23209503810 | Mar 17, 2026 | ❌ Failed | Protected Files blocked |
| 22917473293 | Mar 10, 2026 | ❌ Failed | E003: >100 files in PR |
| 22498067371 | Feb 27, 2026 | ❌ Failed | Code Push Failed |
| 22073317637 | Feb 16, 2026 | ❌ Failed | Unclassified failure |
| 21506393373 | Jan 30, 2026 | ❌ Failed | No Safe Outputs |
Successful PRs created by CI Cleaner (all time)
| PR | Date | What was fixed |
|---|---|---|
| #24559 | Apr 4 | Add missing mocks to parse_mcp_gateway_log test |
| #24033 | Apr 2 | Fix JSDoc type annotation in parse_mcp_gateway_log.cjs |
| #23846 | Apr 1 | Update golden files for awf v0.25.6 + mcpg v0.2.11 |
| #23419 | Mar 29 | Update wasm golden files for v0.25.3 downgrade |
| #22624 | Mar 24 | Update wasm golden files for aw_context description change |
| #22486 | Mar 23 | Update wasm golden files for gh-aw-mcpg v0.2.1 upgrade |
| #20566 | Mar 11 | Update test expectations for download-artifact v8.0.1 |
| #18456 | Feb 26 | Fix verbose flag + update action pins count |
| #17253 | Feb 20 | Fix GitHub App token for GITHUB_MCP_SERVER_TOKEN |
| #15735 | Feb 14 | Fix Go version + golangci-lint config |
| #11830 | Jan 26 | Format package-lock.json |
| #11375 | Jan 22 | Fix linting issues and test failures |
Overall failure rate: ~43% (9 failure issues vs 12 successful PRs, not counting early-exit noop runs)
2. Workflow Configuration
Current Setup
engine: copilot
schedule: '15 6,18 * * *' # Twice daily, 6am & 6pm UTC
timeout-minutes: 45
tools:
github: { toolsets: [default] }
bash: ["*"]
edit: {}
sandbox:
agent:
mounts:
- /usr/bin/make → make (ro)
- /usr/bin/go → go (ro)
- /usr/local/bin/node (ro)
- /usr/local/bin/npm (ro)
- /usr/local/lib/node_modules (ro)
- /opt/hostedtoolcache/go (ro)
safe-outputs:
create-pull-request:
expires: 2d
title-prefix: "[ca] "
protected-files: fallback-to-issue
missing-tool: {}Token budget target (from scratchpad/token-budget-guidelines.md):
- Target: 68K–90K tokens/run
- Alert threshold: >120K
- Critical threshold: >150K
max-turns — only timeout-minutes: 45 limits runaway sessions.
3. Failure Pattern Analysis
Pattern A — No Safe Outputs (4 occurrences, most critical)
Affected runs: 23973398525, 23915974830, 23505275817, 21506393373
The agent job completes (exit 0) but neither noop nor create_pull_request is ever called. This triggers the [aw] CI Cleaner failed issue automatically.
Root causes hypothesised:
- The agent encounters an unrecoverable error mid-task and exits early without reaching the exit protocol
- The Copilot agent's context window fills up and it terminates without calling the mandatory exit tool
- The agent calls a safe-output tool that silently fails (MCP connection drop)
Impact: Creates noise issues even when no change was needed; leaves CI state ambiguous.
Pattern B — E003: Pull Request Too Large (2 occurrences)
Affected run: 22917473293 (166 files in one PR!)
When the agent runs make recompile, it regenerates all 40+ .lock.yml files across the repository, even if only 1–2 workflow .md files changed. Combined with Go file changes, the PR easily exceeds the 100-file hard limit.
Example: Run 22917473293 — fix needed ~5 files but make recompile regenerated 166 total.
Pattern C — Protected Files (1 occurrence)
Affected run: 23209503810
The agent ran make recompile, which regenerated .github/workflows/daily-safe-output-integrator.lock.yml — a protected file. The fallback-to-issue config is present, but may not have been at the time of this run, or the issue was filed before the config was in place.
Status: Likely resolved by current protected-files: fallback-to-issue config.
Pattern D — Code Push Failed / Permission Errors (2 occurrences)
Affected runs: 22498067371, 22073317637
Generic push/PR creation failures — likely transient token permission issues or branch protection edge cases.
4. Configured vs. Actually Used Tools
Configured
| Tool | Config |
|---|---|
github |
toolsets: [default] → repos, issues, pull_requests, etc. |
bash |
["*"] → all commands |
edit |
file editing |
noop |
via safe-outputs MCP |
create_pull_request |
via safe-outputs MCP |
missing-tool |
via safe-outputs MCP |
Most-Used (inferred from PR content)
bash—make fmt,make lint,make test-unit,make recompile,git— every runedit— File edits on golden files (.golden), Go files,.cjsfiles — ~10 successful PRscreate_pull_request(safe-outputs) — 12 successful PRsgithub— Reading CI run status, listing workflow runs — every run (viacheck_ci_statusjob)noop(safe-outputs) — Should be called on passing CI; unknown how often
Most Common Fix Types (from PR analysis)
| Fix Type | Count | % of successful runs |
|---|---|---|
| Golden/wasm test file updates | 7 | 58% |
| Linting/formatting fixes | 2 | 17% |
| Test expectation updates | 1 | 8% |
| Go compatibility fixes | 1 | 8% |
| Package/dependency fixes | 1 | 8% |
5. Missing Tools / MCP Failures
From the collected failure issues:
- No
missing-tooltype failures were observed in the failure issues (themissing-toolsafe-output type was never triggered) - All failures were either: no safe output generated, protected files, or push errors
- The GitHub MCP toolset (
default) appears sufficient for the workflow's needs
6. Cache Efficiency
No direct token cache data available (requires gh aw audit with auth). From the token budget guidelines:
- CI Cleaner is one of the lowest-cost monitored workflows (68K–90K target vs. 300K–1.59M for heavier workflows)
- Most fixes are small and deterministic (golden file updates, formatting)
- Theoretical cache potential is high since the repo's Go source rarely changes dramatically between twice-daily runs
7. Optimization Recommendations
🔴 High Priority
Rec 1: Fix "No Safe Outputs" — add a final fallback assertion in the prompt
The agent must always call noop or create_pull_request but fails ~44% of the time. Strengthen enforcement:
## ABSOLUTE FINAL RULE (cannot be skipped)
Before your response ends — no matter what happened — you MUST call one of:
- `create_pull_request` if you changed any files
- `noop` if you changed nothing
**If you are about to end without calling a safe-output tool, call `noop` right now.**Consider also adding a workflow-level fallback: if the agent job exits 0 with no safe-output, auto-trigger a noop issue instead of a failure issue.
Rec 2: Scope-limit make recompile to prevent E003 failures
The agent should only recompile if workflow .md files were modified:
**Recompile only when necessary:**
- Run `git diff --name-only | grep '\.md$'` to check if any workflow files changed
- If NO .md files changed, **SKIP `make recompile`** entirely
- If .md files changed, run `make recompile` but then check: `git diff --name-only | wc -l`
- If more than 50 files changed, something is wrong — stop and call `noop` instead of creating a 166-file PR🟡 Medium Priority
Rec 3: Add max-turns via engine switch to Claude for better token budget control
Since Copilot doesn't support max-turns, consider switching to Claude with a hard turn limit for predictable token spend:
engine:
id: claude
max-turns: 20 # Enough for fmt→lint→test→recompile cyclePer token-budget-guidelines, CI Cleaner's target is 68K–90K — well within Claude's economics.
Rec 4: Add explicit file-count check before PR creation
Add a bash step or prompt instruction to guard against oversized PRs:
# Check before creating PR
CHANGED=$(git diff --cached --name-only | wc -l)
if [ "$CHANGED" -gt 80 ]; then
echo "Too many files changed ($CHANGED). Calling noop instead."
# call noop with explanation
fiRec 5: Only checkout main before starting, verify CI actually fails
The early-exit guard (check_ci_status) works well. To reduce token spend when CI is borderline/flapping, add a secondary check at the start of the agent prompt:
## Verify CI status (re-check before proceeding)
Run: `gh run list --workflow=ci.yml --branch=main --limit=3 --json conclusion,status`
If the most recent 2 completed runs are both "success", call `noop` immediately — CI has self-healed.🟢 Low Priority
Rec 6: Reduce make deps-dev calls
The agent instructions mention make agent-finish which takes 10–15 minutes and includes make deps-dev. The workflow already installs deps in the steps: section. Ensure the agent prompt explicitly says NOT to re-run make deps-dev or make agent-finish unless absolutely necessary.
Rec 7: Add run-specific context to the agent prompt
Currently the agent gets ci_run_id but doesn't automatically download the CI failure logs. Add a setup step that pre-fetches the failed job logs and injects them into the prompt context — this would cut tool call iterations needed to diagnose the failure.
8. Workflow Source (full content: hourly-ci-cleaner.md)
View full workflow source
The workflow is at .github/workflows/hourly-ci-cleaner.md with imports from .github/agents/ci-cleaner.agent.md.
Key characteristics:
- Schedule:
15 6,18 * * *(twice daily, 6am & 6pm UTC) - Engine:
copilot(agent:ci-cleaner) - Timeout: 45 minutes
- Early-exit guard:
check_ci_statusjob checks last CI run onmain; agent only fires ifci_needs_fix == 'true' - Safe outputs:
create-pull-request(expires: 2d, prefix:[ca], protected-files: fallback-to-issue),missing-tool - Four core tasks:
make fmt→make lint→make test-unit→make recompile
9. Summary Statistics
| Metric | Value |
|---|---|
| Total failure issues filed | 9 |
| Total successful PRs created | 12 |
| Estimated failure rate | ~43% |
| Most common failure | No Safe Outputs (44% of failures) |
| Most common fix | Golden file updates (58% of successes) |
| Token budget target | 68K–90K/run |
| Max-turns support | ❌ Not available (Copilot engine) |
| Hard timeout | 45 minutes |
| Scheduled frequency | 2× daily |
| Tools configured | github (default), bash (*), edit, safe-outputs |
| Missing tool reports | 0 observed |
Note: Direct gh aw audit per-run telemetry (token counts, turn counts, tool call frequencies) was unavailable in the current environment (requires gh auth login). This analysis is based on workflow source inspection, failure issues, and merged PRs. For exact token/turn data, run:
gh aw logs hourly-ci-cleaner -c 20 --start-date -14d --json
gh aw audit 23984950167Generated by Copilot Token Usage Optimizer · ● 11.6M · ◷
- expires on Apr 12, 2026, 12:44 AM UTC