Skip to content

Honor rate-limit Retry-After across Google/Anthropic/GitHub SDKs#2565

Merged
hiroshinishio merged 1 commit intomainfrom
wes
Apr 21, 2026
Merged

Honor rate-limit Retry-After across Google/Anthropic/GitHub SDKs#2565
hiroshinishio merged 1 commit intomainfrom
wes

Conversation

@hiroshinishio
Copy link
Copy Markdown
Collaborator

@hiroshinishio hiroshinishio commented Apr 21, 2026

Summary

When the Gemini free-tier quota trips (429 RESOURCE_EXHAUSTED), the API returns a "Please retry in N.NNNs" hint in the error message body. Previously chat_with_google just raised, cascaded through chat_with_modelchat_with_agenthandle_webhook_event, and every Lambda in the incident window died on Sentry (AGENT-3K5/3K6/3K7/3K8/36M/36Q, the gitautoai/website run on 2026-04-20 16:23 UTC).

This PR generalizes rate-limit handling across every SDK we talk to.

  • New utils/error/get_rate_limit_retry_after.py dispatches per error type:
    • requests.HTTPError → GitHub headers (X-RateLimit-Remaining=0 + X-RateLimit-Reset primary, Retry-After secondary) via parse_github_rate_limit_headers, or generic Retry-After via parse_retry_after_header
    • Anthropic APIStatusError (status_code=429) → retry-after header
    • Google ClientError (code=429) → "Please retry in N.NNNs" in message body via parse_google_retry_in_message
  • handle_exceptions calls it from both the requests.HTTPError and generic Exception branches. If a delay comes back AND the existing TRANSIENT_MAX_ATTEMPTS budget allows, it sleeps the honored delay and continues the retry loop. No upper cap — should_bail at the handler layer already enforces Lambda timeout.
  • The old handle_github_rate_limit and the api_type=="github" special case in handle_http_error are gone. Both are now just input shapes for the same generic extractor.
  • Test fixture (fixtures/real_google_429.txt) is the verbatim CloudWatch log line from the gitautoai/website incident (Please retry in 59.739387544s), preserved with full details[] payload.

Social Media Post (GitAuto)

Rate-limit 429s from Google/Anthropic/GitHub now retry cleanly instead of crashing the Lambda

  • New helper pulls the retry-after delay from any SDK's error shape (Google's message body, Anthropic's retry-after header, GitHub's X-RateLimit-Reset)
  • handle_exceptions sleeps the honored delay and retries under the existing transient-retry budget
  • Old GitHub-specific rate-limit path folded into the generic handler, one code path for every SDK

Social Media Post (Wes)

The Gemini free-tier 429 takes a different shape than GitHub's 429, which takes a different shape than Anthropic's 429, and I had a per-SDK handler for exactly zero of them. Wrote one extractor that dispatches on error type, hooked it into the existing retry loop, honored whatever delay the server suggested. Three error shapes, one path, no more 429s in Sentry.

get_rate_limit_retry_after extracts a retry-after delay from any SDK's rate-limit error — Gemini's "Please retry in N.NNNs" message body, GitHub's X-RateLimit-Reset/Retry-After headers, Anthropic's retry-after header. handle_exceptions calls it in both the HTTPError and generic Exception paths, sleeps the honored delay, and retries under the existing TRANSIENT_MAX_ATTEMPTS budget. Per-SDK parsers live in their own files. The old handle_github_rate_limit / handle_http_error github-special-case is gone — replaced by the generic path. Test fixture is the verbatim CloudWatch log from the gitautoai/website incident on 2026-04-20 16:23 UTC.
@hiroshinishio hiroshinishio self-assigned this Apr 21, 2026
@hiroshinishio hiroshinishio merged commit 40b7fba into main Apr 21, 2026
1 check passed
@hiroshinishio hiroshinishio deleted the wes branch April 21, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant