feat(integrations): Prevent fetching thousands of commits#113526
feat(integrations): Prevent fetching thousands of commits#113526
Conversation
Add a configurable max commit cap for GitHub compare ranges so the fetch_commits task can bound per-commit metadata fanout in large ranges. The provider now truncates to the most recent commits before patch-set hydration and logs when truncation occurs, and tests cover cap behavior including disablement with 0. Co-Authored-By: Codex 5.3 <noreply@openai.com> Made-with: Cursor
Adjust the compare payload helper annotation so mypy accepts the nested commit dictionary structure used in the GitHub repository tests. Co-Authored-By: Codex 5.3 <noreply@openai.com> Made-with: Cursor
| register( | ||
| "github-app.fetch-commits.max-compare-commits", | ||
| type=Int, | ||
| default=500, |
There was a problem hiding this comment.
Even 500 commits is too many but we will start here.
| flags=FLAG_AUTOMATOR_MODIFIABLE, | ||
| ) | ||
|
|
||
| register("metric_alerts.extended_max_subscriptions", default=1250, flags=FLAG_AUTOMATOR_MODIFIABLE) |
There was a problem hiding this comment.
Unrelated changes. Just formatting.
| self, log_info: mock.MagicMock | ||
| ) -> None: | ||
| client = mock.Mock() | ||
| client.compare_commits.return_value = self._build_compare_commit_payload(600) |
| self.repository, "xyz123", "abcdef" | ||
| ) | ||
|
|
||
| assert len(result) == 500 |
| installation = mock.Mock() | ||
|
|
||
| with ( | ||
| self.options({"github-app.fetch-commits.max-compare-commits": 2}), |
|
|
||
| assert [commit["id"] for commit in result] == [ | ||
| commits[-2]["sha"], | ||
| commits[-1]["sha"], |
There was a problem hiding this comment.
Only two commits got included.
| installation = mock.Mock() | ||
|
|
||
| with ( | ||
| self.options({"github-app.fetch-commits.max-compare-commits": 0}), |
There was a problem hiding this comment.
Setting it to 0 is the same as using the old uncapped approach.
| ) | ||
| commits = commits[-max_compare_commits:] |
There was a problem hiding this comment.
Bug: If max_compare_commits is set to a negative number, the commit list is sliced incorrectly (commits[1:]), removing the oldest commit instead of capping the list size.
Severity: LOW
Suggested Fix
Update the conditional check to ensure max_compare_commits is a positive integer before using it for slicing. For example: if max_compare_commits and max_compare_commits > 0 and len(commits) > max_compare_commits:.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/integrations/github/repository.py#L119-L120
Potential issue: The `github-app.fetch-commits.max-compare-commits` option lacks
validation to prevent negative values. If a negative value like `-1` is configured, the
check `if max_compare_commits` evaluates to true. The code then attempts to truncate the
commit list with `commits[-max_compare_commits:]`. With a value of `-1`, this becomes
`commits[-(-1):]` or `commits[1:]`, which incorrectly removes the oldest commit from the
list rather than capping the list to a maximum size. This can happen if an operator or
an automated system misconfigures the option, as there are no bounds checks.
Did we get this right? 👍 / 👎 to inform future reviews.
Add a configurable cap for GitHub compare-commit ranges used by
fetch_commits.Large compare ranges currently fan out one
get_commitcall per commit to buildpatch sets, which creates expensive long-tail tasks. This change adds a
github-app.fetch-commits.max-compare-commitsoption (default500) and appliesit in
GitHubRepositoryProvider.fetch_commits_for_compare_rangebefore_format_commitsso the cap reduces the actual fanout work.I considered moving the cap into the task layer, but applying it in the GitHub
provider is the safest point because it guarantees truncation happens before
patch-set hydration and keeps existing cache behavior unchanged.
Made with Cursor
You can see SENTRY-44HA for an issue fetching thousands of commits and then the task timing out.
You can open these logs to see how many fetch_commits tasks fetch more than 500 commits.