Skip to content

feat(integrations): Prevent fetching thousands of commits#113526

Merged
armenzg merged 2 commits intomasterfrom
feat/github-compare-commit-cap
Apr 21, 2026
Merged

feat(integrations): Prevent fetching thousands of commits#113526
armenzg merged 2 commits intomasterfrom
feat/github-compare-commit-cap

Conversation

@armenzg
Copy link
Copy Markdown
Member

@armenzg armenzg commented Apr 21, 2026

Add a configurable cap for GitHub compare-commit ranges used by fetch_commits.

Large compare ranges currently fan out one get_commit call per commit to build
patch sets, which creates expensive long-tail tasks. This change adds a
github-app.fetch-commits.max-compare-commits option (default 500) and applies
it in GitHubRepositoryProvider.fetch_commits_for_compare_range before
_format_commits so the cap reduces the actual fanout work.

I considered moving the cap into the task layer, but applying it in the GitHub
provider is the safest point because it guarantees truncation happens before
patch-set hydration and keeps existing cache behavior unchanged.

Made with Cursor

You can see SENTRY-44HA for an issue fetching thousands of commits and then the task timing out.

You can open these logs to see how many fetch_commits tasks fetch more than 500 commits.

armenzg and others added 2 commits April 21, 2026 07:57
Add a configurable max commit cap for GitHub compare ranges so the
fetch_commits task can bound per-commit metadata fanout in large ranges.

The provider now truncates to the most recent commits before patch-set
hydration and logs when truncation occurs, and tests cover cap behavior
including disablement with 0.

Co-Authored-By: Codex 5.3 <noreply@openai.com>
Made-with: Cursor
Adjust the compare payload helper annotation so mypy accepts the nested
commit dictionary structure used in the GitHub repository tests.

Co-Authored-By: Codex 5.3 <noreply@openai.com>
Made-with: Cursor
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 21, 2026
@armenzg armenzg self-assigned this Apr 21, 2026
@armenzg armenzg changed the title feat(github): cap compare-commits fanout with option feat(github): Prevent fetching thousands of commits Apr 21, 2026
register(
"github-app.fetch-commits.max-compare-commits",
type=Int,
default=500,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even 500 commits is too many but we will start here.

flags=FLAG_AUTOMATOR_MODIFIABLE,
)

register("metric_alerts.extended_max_subscriptions", default=1250, flags=FLAG_AUTOMATOR_MODIFIABLE)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated changes. Just formatting.

self, log_info: mock.MagicMock
) -> None:
client = mock.Mock()
client.compare_commits.return_value = self._build_compare_commit_payload(600)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be too many.

self.repository, "xyz123", "abcdef"
)

assert len(result) == 500
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It got capped.

installation = mock.Mock()

with (
self.options({"github-app.fetch-commits.max-compare-commits": 2}),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capping at two.


assert [commit["id"] for commit in result] == [
commits[-2]["sha"],
commits[-1]["sha"],
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two commits got included.

installation = mock.Mock()

with (
self.options({"github-app.fetch-commits.max-compare-commits": 0}),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting it to 0 is the same as using the old uncapped approach.

@armenzg armenzg changed the title feat(github): Prevent fetching thousands of commits feat(integrations): Prevent fetching thousands of commits Apr 21, 2026
@armenzg armenzg marked this pull request as ready for review April 21, 2026 13:37
@armenzg armenzg requested review from a team as code owners April 21, 2026 13:37
Comment on lines +119 to +120
)
commits = commits[-max_compare_commits:]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: If max_compare_commits is set to a negative number, the commit list is sliced incorrectly (commits[1:]), removing the oldest commit instead of capping the list size.
Severity: LOW

Suggested Fix

Update the conditional check to ensure max_compare_commits is a positive integer before using it for slicing. For example: if max_compare_commits and max_compare_commits > 0 and len(commits) > max_compare_commits:.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/integrations/github/repository.py#L119-L120

Potential issue: The `github-app.fetch-commits.max-compare-commits` option lacks
validation to prevent negative values. If a negative value like `-1` is configured, the
check `if max_compare_commits` evaluates to true. The code then attempts to truncate the
commit list with `commits[-max_compare_commits:]`. With a value of `-1`, this becomes
`commits[-(-1):]` or `commits[1:]`, which incorrectly removes the oldest commit from the
list rather than capping the list to a maximum size. This can happen if an operator or
an automated system misconfigures the option, as there are no bounds checks.

Did we get this right? 👍 / 👎 to inform future reviews.

@armenzg armenzg merged commit 2857ba9 into master Apr 21, 2026
82 checks passed
@armenzg armenzg deleted the feat/github-compare-commit-cap branch April 21, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants