ref(scm): Update GitHub Provider with Access to Raw Response Instance#111192
ref(scm): Update GitHub Provider with Access to Raw Response Instance#111192
Conversation
This reverts commit ec30a79.
tests/sentry/scm/integration/test_github_provider_integration.py
Outdated
Show resolved
Hide resolved
| if not isinstance(response, dict) or ("data" not in response and "errors" not in response): | ||
| raise SCMProviderException("GraphQL response is not in expected format") | ||
|
|
||
| def catch_provider_exception(fn): | ||
| @functools.wraps(fn) | ||
| def wrapper(*args, **kwargs): | ||
| try: | ||
| return fn(*args, **kwargs) | ||
| except ApiError as e: | ||
| raise SCMProviderException(str(e)) from e | ||
| errors = response.get("errors", []) | ||
| if errors and not response.get("data"): | ||
| err_message = "\n".join(e.get("message", "") for e in errors) | ||
| raise SCMProviderException(err_message) | ||
|
|
||
| return wrapper | ||
| return response.get("data", {}) |
There was a problem hiding this comment.
Bug: The graphql() method always raises an exception because it checks isinstance(response, dict) on a requests.Response object without first parsing it as JSON.
Severity: CRITICAL
Suggested Fix
The response object should be parsed into a dictionary by calling response.json() before it is used. The type and content checks should be performed on this new dictionary variable.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/scm/private/providers/github.py#L231-L239
Potential issue: The `graphql()` method in `GitHubProviderApiClient` receives a
`requests.Response` object from `self.post()`, but then immediately checks if the
response is a dictionary with `isinstance(response, dict)`. Since `self.post()` is
configured to return a raw `requests.Response` object, this check will always fail. As a
result, the method will unconditionally raise an `SCMProviderException`, making it and
any feature that relies on it, such as `minimize_comment()`, non-functional at runtime.
Did we get this right? 👍 / 👎 to inform future reviews.
Backend Test FailuresFailures on
|
### Summary - Introduces a dynamic, per-org rate limiter (DynamicRateLimiter + RedisRateLimitProvider) backed by Redis that reads GitHub's x-ratelimit-limit, x-ratelimit-used, and x-ratelimit-reset response headers to eagerly throttle requests before hitting provider limits. - Adds a referrer allocation system where specific referrers (e.g. emerge) get a dedicated percentage of the rate-limit quota, with remaining capacity shared across all other callers. - Removes dead code: old ratelimits.backend-based is_rate_limited/is_rate_limited_with_allocation_policy helpers, unused REACTION_MAP, catch_provider_exception decorator, and stale encode_ratelimit_key utilities. ### How it works 1. GitHubProviderApiClient.request() makes the HTTP call and, on every response containing rate-limit headers, calls DynamicRateLimiter.update_rate_limit_meta() to sync Sentry's Redis state with GitHub's reported capacity/usage. 2. Before each request, GitHubProviderApiClient.is_rate_limited(referrer) checks the referrer's dedicated allocation first, then falls back to the shared pool. Both checks call DynamicRateLimiter.is_rate_limited() which atomically increments usage counters in Redis via a pipeline. 3. The rate limiter fails open — if no limit has been cached yet (first request for an org), the request proceeds and the limit is populated from the response. ### Test plan - Unit tests for DynamicRateLimiter covering: allocated/shared quota exhaustion, fail-open on missing limits, capacity caching, update_rate_limit_meta with matching/mismatched windows, shared usage floor at zero. - Integration tests for RedisRateLimitProvider verifying Redis pipeline behavior: get_and_set_rate_limit (INCR + TTL), get_accounted_usage (multi-GET sum), set_key_values (SET with/without expiration).
| def is_rate_limited(self, referrer: Referrer) -> bool: | ||
| """Return true if access to the resource has been blocked.""" | ||
| # If the referrer has allocated quota and that quota has not been exhausted we eagerly | ||
| # exit by returning false. Otherwise we consume from the shared quota pool. | ||
| if ( | ||
| referrer in self.rate_limiter.referrer_allocation | ||
| and not self.rate_limiter.is_rate_limited(referrer) | ||
| ): | ||
| return False | ||
| else: | ||
| return self.rate_limiter.is_rate_limited("shared") |
There was a problem hiding this comment.
Bug: The graphql method incorrectly checks the type of the response object before parsing it as JSON, causing it to always raise an exception and fail.
Severity: CRITICAL
Suggested Fix
Move the response.json() call to before the type check. The check should be performed on the parsed JSON data, not the requests.Response object. For example: response_data = response.json() followed by if not isinstance(response_data, dict) or ....
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/scm/private/providers/github.py#L171-L181
Potential issue: In the `graphql` method, the `self.post()` call returns a
`requests.Response` object, not a dictionary. The subsequent check `isinstance(response,
dict)` will always evaluate to `False`, causing the condition `if not
isinstance(response, dict) or ...` to always be true. This unconditionally raises an
`SCMProviderException` with the message "GraphQL response is not in expected format". As
a result, the logic to parse the JSON response and handle GraphQL errors is never
reached, and any call to this method, such as from `minimize_comment`, will fail.
Backend Test FailuresFailures on
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Refactors the GitHub SCM provider layer to make HTTP requests directly via
_requestand work with rawrequests.Responseobjects instead of delegating to high-level methods onGitHubApiClient.Why: The provider needs access to response headers (ETag, Last-Modified) to support conditional requests and pagination. The
existing
GitHubApiClientmethods return parsed JSON dicts, discarding all response metadata. Rather than modifying every client method to optionally return headers, this moves the provider to use raw responses directly.What changed:
GitHubProviderApiClient, a thin wrapper aroundGitHubApiClient._requestthat returns rawrequests.Responseobjects. It handles pagination params, conditional request headers (If-None-Match,If-Modified-Since), and error translation (ApiError→SCMProviderException) in one place.GitHubProvidermethods now construct their own API paths and call throughGitHubProviderApiClientinstead of delegating to named methods onGitHubApiClient(e.g.get_branch,create_git_ref,list_pull_requests, etc.).map_actionand newmap_paginated_actionhelpers extractResponseMeta(etag, last_modified) from response headers and computenext_cursorfor pagination.GitHubBaseClientthat were only used by the SCM provider (GraphQL queries, git ref/tree/blob CRUD, PR management, comment deletion, etc.). These are now inlined in the provider.MINIMIZE_COMMENT_MUTATIONGraphQL query to the provider since it's the only consumer.force_raise_for_statusparameter toBaseApiClient._requestso raw responses still get status checks.GitHubProviderApiClientrequest boundary instead of asserting onGitHubApiClientmethod delegation. Tests now verify the HTTP method, path, and payload sent to the API rather than which client method was called.No behavior changes for callers of the SCM provider interface.