fix(integrations): Eliminate N+1 query in SCM webhooks for commit authors#113926
fix(integrations): Eliminate N+1 query in SCM webhooks for commit authors#113926sentry[bot] wants to merge 1 commit into
Conversation
|
|
||
| if author: | ||
| author.preload_users() | ||
| else: | ||
| author = authors[author_email] |
There was a problem hiding this comment.
Bug: The refactored logic may fail to call author.preload_users() for authors with resolved anonymous GitHub emails, leaving author.users as None.
Severity: MEDIUM
Suggested Fix
Move the author.preload_users() call to execute unconditionally for any resolved author after the if/elif/else block, mirroring the original code's behavior. For instance, add if author: author.preload_users() after the block to ensure user data is always preloaded.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/integrations/github/webhook.py#L598-L601
Potential issue: In the `PushEventWebhook`, the `author.preload_users()` method is not
called for `CommitAuthor` instances when their email is resolved from an anonymous
GitHub email. The code adds the resolved author to the `authors` dictionary, which then
causes the `elif author_email not in authors:` condition (which contains the
`preload_users()` call) to be false. The `else` branch is then taken, but it does not
call `preload_users()`. This leaves `author.users` as `None`, which can lead to
unexpected behavior or errors in downstream code that expects this data to be populated,
a state that was handled in the original implementation.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Should this branch also preload users?
|
This pull request has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you add the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
This PR addresses an N+1 query performance issue in both GitLab and GitHub push webhooks.
Problem:
Previously, when processing a push event with multiple commits, the
author.preload_users()method was called for every commit within the processing loop. This led to an N+1 query pattern where for N commits, N RPC calls touser_service.get_many_by_emailand N database queries tosentry_organizationmemberwere made, even if many commits shared the same author.Solution:
CommitAuthor.preload_users(): The method was updated to respect its internalself.userscache. It now checks ifself.usersis already populated and returns early if so, preventing unnecessary re-fetches.PushEventWebhook: The call toauthor.preload_users()insrc/sentry/integrations/gitlab/webhooks.pywas moved to occur only when aCommitAuthorobject is first created or retrieved and added to theauthorsdictionary. This ensurespreload_users()is called at most once per unique author within a single webhook event.PushEventWebhook: The same optimization was applied tosrc/sentry/integrations/github/webhook.pyto fix the identical N+1 pattern.Impact:
This change significantly reduces the number of RPC calls and database queries during SCM push webhook processing. For a push event with N commits and M unique authors, the number of
user_service.get_many_by_emailRPC calls andsentry_organizationmemberDB queries will drop from N to M, leading to improved performance and reduced load on the system.Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Fixes SENTRY-5BVS