Skip to content

fix(preprod): correctly attribute rate limiting errors#106580

Merged
trevor-e merged 4 commits intomasterfrom
telkins/status-check-rate-limit
Jan 20, 2026
Merged

fix(preprod): correctly attribute rate limiting errors#106580
trevor-e merged 4 commits intomasterfrom
telkins/status-check-rate-limit

Conversation

@trevor-e
Copy link
Member

Ultimately doesn't change any behavior since we were still retrying off the old error type, but should let us better attribute how often rate limiting is happening. In our backend logs you can see these come back as 403 which triggers some ApiForbidden error logic in our base integrations code:

File "/usr/src/sentry/src/sentry/shared_integrations/client/base.py", line 280, in _request
    raise ApiError.from_response(error_resp, url=full_url) from e
sentry.shared_integrations.exceptions.ApiForbiddenError: {"message":"API rate limit exceeded for installation ID XXX. If you reach out to GitHub Support for help, please include the request ID XXXX and timestamp XXXX UTC. For more on scraping GitHub and how it may affect your rights, please review our Terms of Service (https://docs.github.com/en/site-policy/github-terms/github-terms-of-service)","documentation_url":"https://docs.github.com/rest/overview/rate-limits-for-the-rest-api","status":"403"}"

@trevor-e trevor-e requested a review from a team as a code owner January 20, 2026 18:25
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 20, 2026
except ApiForbiddenError as e:
lifecycle.record_failure(e)
error_message = str(e).lower()
# Github uses 403 codes for some rate limiting errors which are captured as ApiForbiddenError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's dumb lol

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

namespace=preprod_tasks,
processing_deadline_duration=30,
retry=Retry(times=3, delay=60, ignore=(IntegrationConfigurationError,)),
retry=Retry(times=3, delay=60, on=(ApiRateLimitedError,)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The change from ignore to on in the Retry configuration prevents retries on transient network errors like timeouts, allowing retries only for ApiRateLimitedError.
Severity: HIGH

Suggested Fix

Add common transient API errors to the on tuple in the Retry configuration. For example: retry=Retry(times=3, delay=60, on=(ApiRateLimitedError, ApiTimeoutError, ApiHostError, ApiConnectionResetError, ApiRetryError)). This will restore the previous behavior of retrying on temporary network failures.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/preprod/vcs/status_checks/size/tasks.py#L72

Potential issue: The task's retry configuration was changed from a denylist approach
using `ignore=(IntegrationConfigurationError,)` to an allowlist approach with
`on=(ApiRateLimitedError,)`. This change means that transient network errors such as
`ApiTimeoutError`, `ApiHostError`, and `ApiConnectionResetError` will no longer trigger
a retry. Previously, these errors would be retried. With the new code, any temporary
network issue will cause the task to fail immediately and permanently, which could lead
to missed status checks in production.

Did we get this right? 👍 / 👎 to inform future reviews.

@trevor-e trevor-e merged commit 3703666 into master Jan 20, 2026
66 checks passed
@trevor-e trevor-e deleted the telkins/status-check-rate-limit branch January 20, 2026 20:38
@github-actions github-actions bot locked and limited conversation to collaborators Feb 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants