Skip to content

[copilot-token-optimizer] CI Cleaner deep audit: recurring failures, optimization recommendations #24622

@github-actions

Description

@github-actions

Overview

Deep audit of the CI Cleaner (hourly-ci-cleaner) workflow covering all available runs across the last ~14 days (and historical data back to January 2026). This analysis was performed by inspecting workflow source files, compiled lock files, failure issues, and successful PRs — since direct gh aw logs / gh aw audit requires authentication not available in this context.


1. Run Inventory

Recent runs (last 14 days) — from failure issues

Run ID Date Outcome Failure Category
23984950167 ~Apr 4–5, 2026 Requested audit target Unknown (no issue filed yet)
23973398525 Apr 4, 2026 ❌ Failed No Safe Outputs
23915974830 Apr 2, 2026 ❌ Failed No Safe Outputs
23846* (PR) Apr 1, 2026 ✅ PR created Successful fix
23505275817 Mar 24, 2026 ❌ Failed No Safe Outputs
23209503810 Mar 17, 2026 ❌ Failed Protected Files blocked
22917473293 Mar 10, 2026 ❌ Failed E003: >100 files in PR
22498067371 Feb 27, 2026 ❌ Failed Code Push Failed
22073317637 Feb 16, 2026 ❌ Failed Unclassified failure
21506393373 Jan 30, 2026 ❌ Failed No Safe Outputs

Successful PRs created by CI Cleaner (all time)

PR Date What was fixed
#24559 Apr 4 Add missing mocks to parse_mcp_gateway_log test
#24033 Apr 2 Fix JSDoc type annotation in parse_mcp_gateway_log.cjs
#23846 Apr 1 Update golden files for awf v0.25.6 + mcpg v0.2.11
#23419 Mar 29 Update wasm golden files for v0.25.3 downgrade
#22624 Mar 24 Update wasm golden files for aw_context description change
#22486 Mar 23 Update wasm golden files for gh-aw-mcpg v0.2.1 upgrade
#20566 Mar 11 Update test expectations for download-artifact v8.0.1
#18456 Feb 26 Fix verbose flag + update action pins count
#17253 Feb 20 Fix GitHub App token for GITHUB_MCP_SERVER_TOKEN
#15735 Feb 14 Fix Go version + golangci-lint config
#11830 Jan 26 Format package-lock.json
#11375 Jan 22 Fix linting issues and test failures

Overall failure rate: ~43% (9 failure issues vs 12 successful PRs, not counting early-exit noop runs)


2. Workflow Configuration

Current Setup

engine: copilot
schedule: '15 6,18 * * *'   # Twice daily, 6am & 6pm UTC
timeout-minutes: 45
tools:
  github: { toolsets: [default] }
  bash: ["*"]
  edit: {}
sandbox:
  agent:
    mounts:
      - /usr/bin/make → make (ro)
      - /usr/bin/go → go (ro)
      - /usr/local/bin/node (ro)
      - /usr/local/bin/npm (ro)
      - /usr/local/lib/node_modules (ro)
      - /opt/hostedtoolcache/go (ro)
safe-outputs:
  create-pull-request:
    expires: 2d
    title-prefix: "[ca] "
    protected-files: fallback-to-issue
  missing-tool: {}

Token budget target (from scratchpad/token-budget-guidelines.md):

  • Target: 68K–90K tokens/run
  • Alert threshold: >120K
  • Critical threshold: >150K

⚠️ Important constraint: Copilot engine does NOT support max-turns — only timeout-minutes: 45 limits runaway sessions.


3. Failure Pattern Analysis

Pattern A — No Safe Outputs (4 occurrences, most critical)

Affected runs: 23973398525, 23915974830, 23505275817, 21506393373

The agent job completes (exit 0) but neither noop nor create_pull_request is ever called. This triggers the [aw] CI Cleaner failed issue automatically.

Root causes hypothesised:

  1. The agent encounters an unrecoverable error mid-task and exits early without reaching the exit protocol
  2. The Copilot agent's context window fills up and it terminates without calling the mandatory exit tool
  3. The agent calls a safe-output tool that silently fails (MCP connection drop)

Impact: Creates noise issues even when no change was needed; leaves CI state ambiguous.

Pattern B — E003: Pull Request Too Large (2 occurrences)

Affected run: 22917473293 (166 files in one PR!)

When the agent runs make recompile, it regenerates all 40+ .lock.yml files across the repository, even if only 1–2 workflow .md files changed. Combined with Go file changes, the PR easily exceeds the 100-file hard limit.

Example: Run 22917473293 — fix needed ~5 files but make recompile regenerated 166 total.

Pattern C — Protected Files (1 occurrence)

Affected run: 23209503810

The agent ran make recompile, which regenerated .github/workflows/daily-safe-output-integrator.lock.yml — a protected file. The fallback-to-issue config is present, but may not have been at the time of this run, or the issue was filed before the config was in place.

Status: Likely resolved by current protected-files: fallback-to-issue config.

Pattern D — Code Push Failed / Permission Errors (2 occurrences)

Affected runs: 22498067371, 22073317637

Generic push/PR creation failures — likely transient token permission issues or branch protection edge cases.


4. Configured vs. Actually Used Tools

Configured

Tool Config
github toolsets: [default] → repos, issues, pull_requests, etc.
bash ["*"] → all commands
edit file editing
noop via safe-outputs MCP
create_pull_request via safe-outputs MCP
missing-tool via safe-outputs MCP

Most-Used (inferred from PR content)

  1. bashmake fmt, make lint, make test-unit, make recompile, git — every run
  2. edit — File edits on golden files (.golden), Go files, .cjs files — ~10 successful PRs
  3. create_pull_request (safe-outputs) — 12 successful PRs
  4. github — Reading CI run status, listing workflow runs — every run (via check_ci_status job)
  5. noop (safe-outputs) — Should be called on passing CI; unknown how often

Most Common Fix Types (from PR analysis)

Fix Type Count % of successful runs
Golden/wasm test file updates 7 58%
Linting/formatting fixes 2 17%
Test expectation updates 1 8%
Go compatibility fixes 1 8%
Package/dependency fixes 1 8%

5. Missing Tools / MCP Failures

From the collected failure issues:

  • No missing-tool type failures were observed in the failure issues (the missing-tool safe-output type was never triggered)
  • All failures were either: no safe output generated, protected files, or push errors
  • The GitHub MCP toolset (default) appears sufficient for the workflow's needs

6. Cache Efficiency

No direct token cache data available (requires gh aw audit with auth). From the token budget guidelines:

  • CI Cleaner is one of the lowest-cost monitored workflows (68K–90K target vs. 300K–1.59M for heavier workflows)
  • Most fixes are small and deterministic (golden file updates, formatting)
  • Theoretical cache potential is high since the repo's Go source rarely changes dramatically between twice-daily runs

7. Optimization Recommendations

🔴 High Priority

Rec 1: Fix "No Safe Outputs" — add a final fallback assertion in the prompt

The agent must always call noop or create_pull_request but fails ~44% of the time. Strengthen enforcement:

## ABSOLUTE FINAL RULE (cannot be skipped)

Before your response ends — no matter what happened — you MUST call one of:
- `create_pull_request` if you changed any files
- `noop` if you changed nothing

**If you are about to end without calling a safe-output tool, call `noop` right now.**

Consider also adding a workflow-level fallback: if the agent job exits 0 with no safe-output, auto-trigger a noop issue instead of a failure issue.

Rec 2: Scope-limit make recompile to prevent E003 failures

The agent should only recompile if workflow .md files were modified:

**Recompile only when necessary:**
- Run `git diff --name-only | grep '\.md$'` to check if any workflow files changed
- If NO .md files changed, **SKIP `make recompile`** entirely
- If .md files changed, run `make recompile` but then check: `git diff --name-only | wc -l`
- If more than 50 files changed, something is wrong — stop and call `noop` instead of creating a 166-file PR

🟡 Medium Priority

Rec 3: Add max-turns via engine switch to Claude for better token budget control

Since Copilot doesn't support max-turns, consider switching to Claude with a hard turn limit for predictable token spend:

engine:
  id: claude
  max-turns: 20   # Enough for fmt→lint→test→recompile cycle

Per token-budget-guidelines, CI Cleaner's target is 68K–90K — well within Claude's economics.

Rec 4: Add explicit file-count check before PR creation

Add a bash step or prompt instruction to guard against oversized PRs:

# Check before creating PR
CHANGED=$(git diff --cached --name-only | wc -l)
if [ "$CHANGED" -gt 80 ]; then
  echo "Too many files changed ($CHANGED). Calling noop instead."
  # call noop with explanation
fi

Rec 5: Only checkout main before starting, verify CI actually fails

The early-exit guard (check_ci_status) works well. To reduce token spend when CI is borderline/flapping, add a secondary check at the start of the agent prompt:

## Verify CI status (re-check before proceeding)

Run: `gh run list --workflow=ci.yml --branch=main --limit=3 --json conclusion,status`

If the most recent 2 completed runs are both "success", call `noop` immediately — CI has self-healed.

🟢 Low Priority

Rec 6: Reduce make deps-dev calls

The agent instructions mention make agent-finish which takes 10–15 minutes and includes make deps-dev. The workflow already installs deps in the steps: section. Ensure the agent prompt explicitly says NOT to re-run make deps-dev or make agent-finish unless absolutely necessary.

Rec 7: Add run-specific context to the agent prompt

Currently the agent gets ci_run_id but doesn't automatically download the CI failure logs. Add a setup step that pre-fetches the failed job logs and injects them into the prompt context — this would cut tool call iterations needed to diagnose the failure.


8. Workflow Source (full content: hourly-ci-cleaner.md)

View full workflow source

The workflow is at .github/workflows/hourly-ci-cleaner.md with imports from .github/agents/ci-cleaner.agent.md.

Key characteristics:

  • Schedule: 15 6,18 * * * (twice daily, 6am & 6pm UTC)
  • Engine: copilot (agent: ci-cleaner)
  • Timeout: 45 minutes
  • Early-exit guard: check_ci_status job checks last CI run on main; agent only fires if ci_needs_fix == 'true'
  • Safe outputs: create-pull-request (expires: 2d, prefix: [ca] , protected-files: fallback-to-issue), missing-tool
  • Four core tasks: make fmtmake lintmake test-unitmake recompile

9. Summary Statistics

Metric Value
Total failure issues filed 9
Total successful PRs created 12
Estimated failure rate ~43%
Most common failure No Safe Outputs (44% of failures)
Most common fix Golden file updates (58% of successes)
Token budget target 68K–90K/run
Max-turns support ❌ Not available (Copilot engine)
Hard timeout 45 minutes
Scheduled frequency 2× daily
Tools configured github (default), bash (*), edit, safe-outputs
Missing tool reports 0 observed

Note: Direct gh aw audit per-run telemetry (token counts, turn counts, tool call frequencies) was unavailable in the current environment (requires gh auth login). This analysis is based on workflow source inspection, failure issues, and merged PRs. For exact token/turn data, run:

gh aw logs hourly-ci-cleaner -c 20 --start-date -14d --json
gh aw audit 23984950167

Generated by Copilot Token Usage Optimizer · ● 11.6M ·

  • expires on Apr 12, 2026, 12:44 AM UTC

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions