Skip to content

[aw-failures] Fix: GitHub App installation rate limit burst — 10 workflows failed in 36-minute window (2026-04-30 11:05–11:41 UTC) #29318

@github-actions

Description

@github-actions

Problem Statement

On 2026-04-30 between 11:05 and 11:41 UTC, 10 agentic workflow runs failed due to GitHub App installation rate limit exhaustion. The failures were caused by a concentrated burst of scheduled workflows all starting around 11:00–11:10 UTC, collectively saturating the GitHub App API call quota within minutes.

This is the dominant failure cluster in the 6-hour window (10/13 failures traced to this cause).

Affected Workflows & Runs

Workflow Engine Run ID Failure Point
AI Moderator N/A §25163385305 Lock issue
AI Moderator N/A §25163332906 Lock issue
AI Moderator N/A §25162561867 Lock issue
Instructions Janitor Claude Code §25162158074 PR creation + fallback issue
Daily AstroStyleLite Markdown Spellcheck Claude Code §25162377257 PR creation
Daily Community Attribution Updater Copilot CLI §25161944698 PR creation + fallback issue
Developer Documentation Consolidator Claude Code §25162557165 PR creation + fallback issue
Daily AW Cross-Repo Compile Check Claude Code §25162144631 create_issue in safe_outputs (after 4 retries)
Daily Rendering Scripts Verifier Claude Code §25162697356 Auto guard policy determination
Daily Fact About gh-aw Codex §25163325014 Auto guard policy determination

Root Cause

The GitHub App installation (used for all workflow API calls) has a rate limit that was exhausted during the burst. Representative error:

API rate limit exceeded for installation. 
Request ID: 1200:1EDC6B:765CDF5:1D4E1C81:69F34022
Timestamp: 2026-04-30 11:20:46 UTC

The rate limit hit multiple API operations:

  • POST /repos/{owner}/{repo}/issues (safe_outputs create_issue — 4 retries, all failed)
  • POST /repos/{owner}/{repo}/pulls (PR creation with fallback to issue — both failed)
  • PATCH /repos/{owner}/{repo}/issues/{number} (lock issue)
  • GitHub MCP guard policy determination (GET on installation context)
  • GET /repos/{owner}/{repo} (fetch default branch before push)
  • GraphQL signed push fallback

The burst window corresponds to the 11:00–11:10 UTC scheduled trigger time for approximately 15–20 workflows running concurrently.

Impact

  • 10 workflow failures in a 36-minute window; all work lost for those runs
  • Workflows that pushed code (Instructions Janitor, Markdown Spellcheck, etc.) could not deliver their PRs
  • The rate limit self-recovered after ~40 minutes; subsequent runs in the same hour succeeded
  • Daily AW Cross-Repo Compile Check agent completed its full analysis (33 turns, $1.75) but could not post results — work was discarded

Proposed Remediation

Option A (Recommended): Stagger scheduled workflow start times

  • Spread the 11:00 UTC batch across a 20–30 minute window using jitter in cron: expressions
  • This reduces the API burst from ~20 concurrent triggers to ~2-3 per minute
  • Example: change 0 11 * * * schedules to */3 11 * * * variants or random-minute offsets

Option B: Increase safe_outputs retry count + backoff for rate limit errors

  • Current retry: 4 attempts with ~1 minute gap — not sufficient for installation-level rate resets
  • Extend to 8 retries with exponential backoff (up to 10 minutes) specifically for 429/rate-limit errors

Option C: App-level investigation

  • Verify the GitHub App installation's rate limit tier is appropriate for the current number of concurrent workflows
  • Check if the App can be upgraded to a higher rate limit tier or if request batching is possible

Success Criteria

  • Zero rate-limit failures in the next 6-hour monitoring window after schedule staggering is applied
  • OR: safe_outputs retry succeeds under load without workflow failure

References

Parent: #29232

References:

  • §25162158074 (Instructions Janitor — rate limit on PR + fallback)
  • §25162144631 (Daily AW Cross-Repo Compile Check — safe_outputs rate limit after 4 retries)
  • §25163385305 (AI Moderator — lock issue rate limit)

Generated by [aw] Failure Investigator (6h) · ● 464.2K ·

  • expires on May 7, 2026, 1:24 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions