Skip to content

fix: ignore GitHub API rate limit errors in update and health commands#23032

Merged
pelikhan merged 1 commit intomainfrom
copilot/ignore-rate-limiting-failures
Mar 26, 2026
Merged

fix: ignore GitHub API rate limit errors in update and health commands#23032
pelikhan merged 1 commit intomainfrom
copilot/ignore-rate-limiting-failures

Conversation

Copy link
Contributor

Copilot AI commented Mar 26, 2026

CI jobs for update and health-smoke-copilot were failing with HTTP 403 "API rate limit exceeded" — a transient infrastructure condition that should not mark CI as broken.

Root causes

  • update job: UpdateWorkflows propagated a fatal error when all per-workflow failures were rate-limit related ("no workflows were successfully updated"), even though there was nothing actionable to fix.
  • health-smoke-copilot job: RunHealth surfaced the fetchWorkflowRuns rate-limit error directly, causing gh aw health --json to exit non-zero and produce no JSON output (breaking downstream jq parsing).

Changes

  • pkg/gitutil/gitutil.go — Added IsRateLimitError(errMsg string) bool detecting "API rate limit exceeded", "rate limit exceeded", and "secondary rate limit" patterns (case-insensitive).

  • pkg/cli/update_workflows.go — Added allFailuresAreRateLimited helper. UpdateWorkflows now returns nil with a warning instead of a fatal error when every workflow failure is rate-limit related.

  • pkg/cli/health_command.goRunHealth detects rate-limit errors from fetchWorkflowRuns, prints a warning, and in --json mode emits a zero-run WorkflowHealth struct so callers can still parse the output cleanly.

  • pkg/gitutil/gitutil_test.go — Table-driven tests for IsRateLimitError and IsAuthError.

@pelikhan
Copy link
Collaborator

@copilot if you hit github rate limit in the rate limit test, assume it is canceled.

Copy link
Contributor Author

Copilot AI commented Mar 26, 2026

@copilot if you hit github rate limit in the rate limit test, assume it is canceled.

Already handled by commit 04eaf16. Both affected jobs now treat rate limit errors as cancelled (non-fatal):

  • update job: allFailuresAreRateLimited — if every workflow update failure was caused by rate limiting, UpdateWorkflows returns nil with a warning instead of propagating the error.
  • health-smoke-copilot job: RunHealth detects the rate limit from fetchWorkflowRuns, prints a warning, and exits 0. In --json mode it emits a valid zero-run WorkflowHealth JSON so the downstream jq structure validations still pass.

Copilot AI requested a review from pelikhan March 26, 2026 04:10
@pelikhan pelikhan marked this pull request as ready for review March 26, 2026 04:11
Copilot AI review requested due to automatic review settings March 26, 2026 04:11
@pelikhan pelikhan merged commit 524b7f2 into main Mar 26, 2026
130 of 138 checks passed
@pelikhan pelikhan deleted the copilot/ignore-rate-limiting-failures branch March 26, 2026 04:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts CLI behavior to treat GitHub API rate limiting as a transient condition so CI jobs don’t fail or produce unusable output when the API is temporarily throttling requests.

Changes:

  • Add a shared gitutil.IsRateLimitError helper to detect common GitHub rate-limit error strings.
  • Update update workflow updates to be non-fatal when all per-workflow failures are rate-limit related.
  • Update health to warn and exit cleanly on rate limit; in detailed --json mode, emit a zero-run JSON payload.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pkg/gitutil/gitutil.go Adds IsRateLimitError helper for consistent rate-limit detection.
pkg/gitutil/gitutil_test.go Adds table-driven unit tests for IsRateLimitError and IsAuthError.
pkg/cli/update_workflows.go Treats “all failed due to rate limiting” as non-fatal; adds helper to detect that condition.
pkg/cli/health_command.go Handles rate-limit errors from fetchWorkflowRuns by warning and exiting cleanly; emits detailed JSON when applicable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +143 to +145
if config.JSONOutput && config.WorkflowName != "" {
// Emit an empty-run JSON structure so callers can still parse the output.
return displayDetailedHealth(nil, config)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the rate-limit branch, RunHealth only emits JSON when config.WorkflowName != "" (detailed view). If --json is used without a workflow (summary mode), this path returns nil without writing any JSON to stdout, which breaks callers that rely on --json always producing parseable output. Consider emitting an empty HealthSummary via outputHealthJSON(...) when config.JSONOutput is true, regardless of whether a workflow name was provided (while still printing the warning to stderr).

Suggested change
if config.JSONOutput && config.WorkflowName != "" {
// Emit an empty-run JSON structure so callers can still parse the output.
return displayDetailedHealth(nil, config)
if config.JSONOutput {
if config.WorkflowName != "" {
// Emit an empty-run JSON structure for the specific workflow so callers can still parse the output.
return displayDetailedHealth(nil, config)
}
// In summary mode, emit an empty HealthSummary so JSON output is always parseable.
if err := outputHealthJSON(&HealthSummary{}); err != nil {
return err
}

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +146
if gitutil.IsRateLimitError(err.Error()) {
// Rate limiting is a transient infrastructure condition, not a code error.
// Warn and exit cleanly so CI jobs are not marked as failed.
fmt.Fprintln(os.Stderr, console.FormatWarningMessage("Skipping health check: GitHub API rate limit exceeded"))
if config.JSONOutput && config.WorkflowName != "" {
// Emit an empty-run JSON structure so callers can still parse the output.
return displayDetailedHealth(nil, config)
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new rate-limit handling in RunHealth is important for CI stability, but there isn’t a unit test covering the new behavior (rate-limit error => non-fatal exit, and in --json mode still emitting valid JSON). Consider adding a test that forces fetchWorkflowRuns/listWorkflowRunsWithPagination to return a rate-limit error (e.g., via dependency injection through a package-level function var) and asserts stdout contains valid JSON.

Copilot uses AI. Check for mistakes.
Comment on lines 67 to +90
if len(successfulUpdates) == 0 {
// If all failures were due to GitHub API rate limiting, treat as non-fatal.
// Rate limiting is a transient infrastructure condition, not a code error.
if len(failedUpdates) > 0 && allFailuresAreRateLimited(failedUpdates) {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage("All workflow updates skipped due to GitHub API rate limiting"))
return nil
}
return errors.New("no workflows were successfully updated")
}

return nil
}

// allFailuresAreRateLimited returns true if every failed workflow update was caused
// by a GitHub API rate limit error. Used to distinguish transient rate-limiting
// (non-fatal) from genuine update failures (fatal).
func allFailuresAreRateLimited(failures []updateFailure) bool {
for _, f := range failures {
if !gitutil.IsRateLimitError(f.Error) {
return false
}
}
return true
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateWorkflows now returns nil when all workflow update failures are rate-limit related. Since this changes the command’s exit behavior, it would be good to add a focused unit test for allFailuresAreRateLimited (and/or the len(successfulUpdates)==0 branch) to lock in the intended semantics for “all rate-limited” vs “mixed failures”.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants