Skip to content

fix(integrations): Eliminate N+1 query in SCM webhooks for commit authors#113926

Open
sentry[bot] wants to merge 1 commit into
masterfrom
seer/fix/scm-webhook-n1-authors
Open

fix(integrations): Eliminate N+1 query in SCM webhooks for commit authors#113926
sentry[bot] wants to merge 1 commit into
masterfrom
seer/fix/scm-webhook-n1-authors

Conversation

@sentry
Copy link
Copy Markdown
Contributor

@sentry sentry Bot commented Apr 24, 2026

This PR addresses an N+1 query performance issue in both GitLab and GitHub push webhooks.

Problem:
Previously, when processing a push event with multiple commits, the author.preload_users() method was called for every commit within the processing loop. This led to an N+1 query pattern where for N commits, N RPC calls to user_service.get_many_by_email and N database queries to sentry_organizationmember were made, even if many commits shared the same author.

Solution:

  1. Refactor CommitAuthor.preload_users(): The method was updated to respect its internal self.users cache. It now checks if self.users is already populated and returns early if so, preventing unnecessary re-fetches.
  2. Optimize GitLab PushEventWebhook: The call to author.preload_users() in src/sentry/integrations/gitlab/webhooks.py was moved to occur only when a CommitAuthor object is first created or retrieved and added to the authors dictionary. This ensures preload_users() is called at most once per unique author within a single webhook event.
  3. Optimize GitHub PushEventWebhook: The same optimization was applied to src/sentry/integrations/github/webhook.py to fix the identical N+1 pattern.

Impact:
This change significantly reduces the number of RPC calls and database queries during SCM push webhook processing. For a push event with N commits and M unique authors, the number of user_service.get_many_by_email RPC calls and sentry_organizationmember DB queries will drop from N to M, leading to improved performance and reduced load on the system.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

Fixes SENTRY-5BVS

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 24, 2026
@mrduncan mrduncan added the Trigger: getsentry tests Once code is reviewed: apply label to PR to trigger getsentry tests label Apr 24, 2026
@JoshFerge JoshFerge marked this pull request as ready for review May 4, 2026 20:10
@JoshFerge JoshFerge requested review from a team as code owners May 4, 2026 20:10
@JoshFerge JoshFerge self-assigned this May 4, 2026
Comment on lines 598 to +601

if author:
author.preload_users()
else:
author = authors[author_email]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The refactored logic may fail to call author.preload_users() for authors with resolved anonymous GitHub emails, leaving author.users as None.
Severity: MEDIUM

Suggested Fix

Move the author.preload_users() call to execute unconditionally for any resolved author after the if/elif/else block, mirroring the original code's behavior. For instance, add if author: author.preload_users() after the block to ensure user data is always preloaded.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/integrations/github/webhook.py#L598-L601

Potential issue: In the `PushEventWebhook`, the `author.preload_users()` method is not
called for `CommitAuthor` instances when their email is resolved from an anonymous
GitHub email. The code adds the resolved author to the `authors` dictionary, which then
causes the `elif author_email not in authors:` condition (which contains the
`preload_users()` call) to be false. The `else` branch is then taken, but it does not
call `preload_users()`. This leaves `author.users` as `None`, which can lead to
unexpected behavior or errors in downstream code that expects this data to be populated,
a state that was handled in the original implementation.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this branch also preload users?

@getsantry
Copy link
Copy Markdown
Contributor

getsantry Bot commented May 29, 2026

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you add the label WIP, I will leave it alone unless WIP is removed ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@getsantry getsantry Bot added the Stale label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components Stale Trigger: getsentry tests Once code is reviewed: apply label to PR to trigger getsentry tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants