Skip to content

ref(scm): Update GitHub Provider with Access to Raw Response Instance#111192

Merged
cmanallen merged 62 commits intomasterfrom
cmanallen/scm-rate-limits
Mar 26, 2026
Merged

ref(scm): Update GitHub Provider with Access to Raw Response Instance#111192
cmanallen merged 62 commits intomasterfrom
cmanallen/scm-rate-limits

Conversation

@cmanallen
Copy link
Member

@cmanallen cmanallen commented Mar 20, 2026

Refactors the GitHub SCM provider layer to make HTTP requests directly via _request and work with raw requests.Response objects instead of delegating to high-level methods on GitHubApiClient.

Why: The provider needs access to response headers (ETag, Last-Modified) to support conditional requests and pagination. The
existing GitHubApiClient methods return parsed JSON dicts, discarding all response metadata. Rather than modifying every client method to optionally return headers, this moves the provider to use raw responses directly.

What changed:

  • Introduced GitHubProviderApiClient, a thin wrapper around GitHubApiClient._request that returns raw requests.Response objects. It handles pagination params, conditional request headers (If-None-Match, If-Modified-Since), and error translation (ApiErrorSCMProviderException) in one place.
  • All GitHubProvider methods now construct their own API paths and call through GitHubProviderApiClient instead of delegating to named methods on GitHubApiClient (e.g. get_branch, create_git_ref, list_pull_requests, etc.).
  • map_action and new map_paginated_action helpers extract ResponseMeta (etag, last_modified) from response headers and compute next_cursor for pagination.
  • Removed ~30 methods from GitHubBaseClient that were only used by the SCM provider (GraphQL queries, git ref/tree/blob CRUD, PR management, comment deletion, etc.). These are now inlined in the provider.
  • Moved the MINIMIZE_COMMENT_MUTATION GraphQL query to the provider since it's the only consumer.
  • Added force_raise_for_status parameter to BaseApiClient._request so raw responses still get status checks.
  • Rewrote unit tests to mock at the GitHubProviderApiClient request boundary instead of asserting on GitHubApiClient method delegation. Tests now verify the HTTP method, path, and payload sent to the API rather than which client method was called.

No behavior changes for callers of the SCM provider interface.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 20, 2026
Comment on lines +231 to +239
if not isinstance(response, dict) or ("data" not in response and "errors" not in response):
raise SCMProviderException("GraphQL response is not in expected format")

def catch_provider_exception(fn):
@functools.wraps(fn)
def wrapper(*args, **kwargs):
try:
return fn(*args, **kwargs)
except ApiError as e:
raise SCMProviderException(str(e)) from e
errors = response.get("errors", [])
if errors and not response.get("data"):
err_message = "\n".join(e.get("message", "") for e in errors)
raise SCMProviderException(err_message)

return wrapper
return response.get("data", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The graphql() method always raises an exception because it checks isinstance(response, dict) on a requests.Response object without first parsing it as JSON.
Severity: CRITICAL

Suggested Fix

The response object should be parsed into a dictionary by calling response.json() before it is used. The type and content checks should be performed on this new dictionary variable.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/scm/private/providers/github.py#L231-L239

Potential issue: The `graphql()` method in `GitHubProviderApiClient` receives a
`requests.Response` object from `self.post()`, but then immediately checks if the
response is a dictionary with `isinstance(response, dict)`. Since `self.post()` is
configured to return a raw `requests.Response` object, this check will always fail. As a
result, the method will unconditionally raise an `SCMProviderException`, making it and
any feature that relies on it, such as `minimize_comment()`, non-functional at runtime.

Did we get this right? 👍 / 👎 to inform future reviews.

@github-actions
Copy link
Contributor

Backend Test Failures

Failures on 103d9ac in this run:

tests/sentry/scm/integration/test_github_provider_integration.py::TestGitHubProviderIntegration::test_api_403_error_raises_scm_provider_exceptionlog
tests/sentry/scm/integration/test_github_provider_integration.py:445: in test_api_403_error_raises_scm_provider_exception
    self.provider.get_pull_request("1")
src/sentry/scm/private/providers/github.py:294: in get_pull_request
    return map_action(response, map_pull_request)
src/sentry/scm/private/providers/github.py:1039: in map_action
    "data": fn(raw),
src/sentry/scm/private/providers/github.py:1022: in map_pull_request
    id=str(raw["id"]),
E   KeyError: 'id'
tests/sentry/scm/integration/test_github_provider_integration.py::TestGitHubProviderIntegration::test_api_422_error_raises_scm_provider_exceptionlog
tests/sentry/scm/integration/test_github_provider_integration.py:458: in test_api_422_error_raises_scm_provider_exception
    self.provider.create_pull_request(
src/sentry/scm/private/providers/github.py:671: in create_pull_request
    return map_action(response, map_pull_request)
src/sentry/scm/private/providers/github.py:1039: in map_action
    "data": fn(raw),
src/sentry/scm/private/providers/github.py:1022: in map_pull_request
    id=str(raw["id"]),
E   KeyError: 'id'
tests/sentry/scm/integration/test_github_provider_integration.py::TestGitHubProviderIntegration::test_api_500_error_raises_scm_provider_exceptionlog
tests/sentry/scm/integration/test_github_provider_integration.py:432: in test_api_500_error_raises_scm_provider_exception
    self.provider.get_issue_comments("42")
src/sentry/scm/private/providers/github.py:273: in get_issue_comments
    return map_paginated_action(pagination, response, lambda r: [map_comment(c) for c in r])
src/sentry/scm/private/providers/github.py:1057: in map_paginated_action
    "data": fn(raw),
src/sentry/scm/private/providers/github.py:273: in <lambda>
    return map_paginated_action(pagination, response, lambda r: [map_comment(c) for c in r])
src/sentry/scm/private/providers/github.py:870: in map_comment
    id=str(raw["id"]),
E   TypeError: string indices must be integers, not 'str'
tests/sentry/scm/integration/test_github_provider_integration.py::TestGitHubProviderIntegration::test_api_error_raises_scm_provider_exceptionlog
tests/sentry/scm/integration/test_github_provider_integration.py:419: in test_api_error_raises_scm_provider_exception
    self.provider.get_issue_comments("42")
src/sentry/scm/private/providers/github.py:273: in get_issue_comments
    return map_paginated_action(pagination, response, lambda r: [map_comment(c) for c in r])
src/sentry/scm/private/providers/github.py:1057: in map_paginated_action
    "data": fn(raw),
src/sentry/scm/private/providers/github.py:273: in <lambda>
    return map_paginated_action(pagination, response, lambda r: [map_comment(c) for c in r])
src/sentry/scm/private/providers/github.py:870: in map_comment
    id=str(raw["id"]),
E   TypeError: string indices must be integers, not 'str'
tests/sentry/scm/integration/test_github_provider_integration.py::TestGitHubProviderIntegration::test_get_issue_comment_reactionslog
src/sentry/shared_integrations/client/base.py:264: in _request
    resp: Response = session.send(finalized_request, **session_settings)
.venv/lib/python3.13/site-packages/requests/sessions.py:703: in send
    r = adapter.send(request, **kwargs)
.venv/lib/python3.13/site-packages/responses/__init__.py:1104: in unbound_on_send
    return self._on_request(adapter, request, *a, **kwargs)
.venv/lib/python3.13/site-packages/responses/__init__.py:1046: in _on_request
    raise response
E   requests.exceptions.ConnectionError: Connection refused by Responses - the call doesn't match any registered mock.
E   
E   Request: 
E   - GET https://api.github.com/repos/test-org/test-repo/issues/comments/42/reactions?per_page=50&page=1
E   
E   Available matches:
E   - GET https://api.github.com/repos/test-org/test-repo/issues/comments/42/reactions?per_page=100 Query string doesn't match. {page: 1, per_page: 50} doesn't match {per_page: 100}

The above exception was the direct cause of the following exception:
tests/sentry/scm/integration/test_github_provider_integration.py:288: in test_get_issue_comment_reactions
    reactions = self.provider.get_issue_comment_reactions("1", "42")
src/sentry/scm/private/providers/github.py:326: in get_issue_comment_reactions
    response = self.client.get(
src/sentry/scm/private/providers/github.py:200: in get
    return self.request("GET", path=path, params=params, headers=headers)
src/sentry/scm/private/providers/github.py:185: in request
    return self.client._request(
src/sentry/shared_integrations/client/base.py:273: in _request
    raise ApiHostError.from_exception(e) from e
E   sentry.shared_integrations.exceptions.ApiHostError: Unable to reach host: api.github.com
tests/sentry/scm/integration/test_github_provider_integration.py::TestGitHubProviderIntegration::test_get_issue_reactionslog
src/sentry/shared_integrations/client/base.py:264: in _request
    resp: Response = session.send(finalized_request, **session_settings)
.venv/lib/python3.13/site-packages/requests/sessions.py:703: in send
    r = adapter.send(request, **kwargs)
.venv/lib/python3.13/site-packages/responses/__init__.py:1104: in unbound_on_send
    return self._on_request(adapter, request, *a, **kwargs)
.venv/lib/python3.13/site-packages/responses/__init__.py:1046: in _on_request
    raise response
E   requests.exceptions.ConnectionError: Connection refused by Responses - the call doesn't match any registered mock.
E   
E   Request: 
E   - GET https://api.github.com/repos/test-org/test-repo/issues/42/reactions?per_page=50&page=1
E   
E   Available matches:
E   - GET https://api.github.com/repos/test-org/test-repo/issues/42/reactions?per_page=100 Query string doesn't match. {page: 1, per_page: 50} doesn't match {per_page: 100}

The above exception was the direct cause of the following exception:
tests/sentry/scm/integration/test_github_provider_integration.py:364: in test_get_issue_reactions
    reactions = self.provider.get_issue_reactions("42")
src/sentry/scm/private/providers/github.py:376: in get_issue_reactions
    response = self.client.get(
src/sentry/scm/private/providers/github.py:200: in get
    return self.request("GET", path=path, params=params, headers=headers)
src/sentry/scm/private/providers/github.py:185: in request
    return self.client._request(
src/sentry/shared_integrations/client/base.py:273: in _request
    raise ApiHostError.from_exception(e) from e
E   sentry.shared_integrations.exceptions.ApiHostError: Unable to reach host: api.github.com
tests/sentry/scm/integration/test_github_provider_integration.py::TestGitHubProviderIntegration::test_get_pull_request_uses_conditional_request_headerslog
tests/sentry/scm/integration/test_github_provider_integration.py:142: in test_get_pull_request_uses_conditional_request_headers
    assert responses.calls[0].request.headers["If-None-Match"] == '"etag-123"'
.venv/lib/python3.13/site-packages/requests/structures.py:52: in __getitem__
    return self._store[key.lower()][1]
E   KeyError: 'if-none-match'

### Summary

- Introduces a dynamic, per-org rate limiter (DynamicRateLimiter +
RedisRateLimitProvider) backed by Redis that reads GitHub's
x-ratelimit-limit, x-ratelimit-used, and x-ratelimit-reset response
headers to eagerly throttle requests before hitting provider limits.
- Adds a referrer allocation system where specific referrers (e.g.
emerge) get a dedicated percentage of the rate-limit quota, with
remaining capacity shared across all other callers.
- Removes dead code: old ratelimits.backend-based
is_rate_limited/is_rate_limited_with_allocation_policy helpers, unused
REACTION_MAP, catch_provider_exception decorator, and stale
encode_ratelimit_key utilities.

### How it works

1. GitHubProviderApiClient.request() makes the HTTP call and, on every
response containing rate-limit headers, calls
DynamicRateLimiter.update_rate_limit_meta() to sync Sentry's Redis state
with GitHub's reported capacity/usage.
2. Before each request,
GitHubProviderApiClient.is_rate_limited(referrer) checks the referrer's
dedicated allocation first, then falls back to the shared pool. Both
checks call DynamicRateLimiter.is_rate_limited() which atomically
increments usage counters in Redis via a pipeline.
3. The rate limiter fails open — if no limit has been cached yet (first
request for an org), the request proceeds and the limit is populated
from the response.

### Test plan

- Unit tests for DynamicRateLimiter covering: allocated/shared quota
exhaustion, fail-open on missing limits, capacity caching,
update_rate_limit_meta with matching/mismatched windows, shared usage
floor at zero.
- Integration tests for RedisRateLimitProvider verifying Redis pipeline
behavior: get_and_set_rate_limit (INCR + TTL), get_accounted_usage
(multi-GET sum), set_key_values (SET with/without expiration).
@cmanallen cmanallen requested review from a team as code owners March 25, 2026 18:10
Comment on lines +171 to +181
def is_rate_limited(self, referrer: Referrer) -> bool:
"""Return true if access to the resource has been blocked."""
# If the referrer has allocated quota and that quota has not been exhausted we eagerly
# exit by returning false. Otherwise we consume from the shared quota pool.
if (
referrer in self.rate_limiter.referrer_allocation
and not self.rate_limiter.is_rate_limited(referrer)
):
return False
else:
return self.rate_limiter.is_rate_limited("shared")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The graphql method incorrectly checks the type of the response object before parsing it as JSON, causing it to always raise an exception and fail.
Severity: CRITICAL

Suggested Fix

Move the response.json() call to before the type check. The check should be performed on the parsed JSON data, not the requests.Response object. For example: response_data = response.json() followed by if not isinstance(response_data, dict) or ....

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/scm/private/providers/github.py#L171-L181

Potential issue: In the `graphql` method, the `self.post()` call returns a
`requests.Response` object, not a dictionary. The subsequent check `isinstance(response,
dict)` will always evaluate to `False`, causing the condition `if not
isinstance(response, dict) or ...` to always be true. This unconditionally raises an
`SCMProviderException` with the message "GraphQL response is not in expected format". As
a result, the logic to parse the JSON response and handle GraphQL errors is never
reached, and any call to this method, such as from `minimize_comment`, will fail.

@github-actions
Copy link
Contributor

Backend Test Failures

Failures on dab6c3c in this run:

tests/sentry/taskworker/test_config.py::test_all_instrumented_tasks_registeredlog
tests/sentry/taskworker/test_config.py:120: in test_all_instrumented_tasks_registered
    raise AssertionError(
E   AssertionError: Found 1 module(s) with @instrumented_task that are NOT registered in TASKWORKER_IMPORTS.
E   These tasks will not be discovered by the taskworker in production!
E   
E   Missing modules:
E     - sentry.workflow_engine.tasks.cleanup
E   
E   Add these to TASKWORKER_IMPORTS in src/sentry/conf/server.py

Copy link
Collaborator

@jacquev6 jacquev6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@cmanallen cmanallen merged commit 15ae4f1 into master Mar 26, 2026
106 checks passed
@cmanallen cmanallen deleted the cmanallen/scm-rate-limits branch March 26, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants