ref(seer): Add GitLab code-review web hooks#116317
Conversation
…/github.com/getsentry/sentry into cmanallen/scm-pull-request-open-close-hooks
suejungshin
left a comment
There was a problem hiding this comment.
Is there a flag we can add somewhere so we confirm this pathway works for a couple orgs before unleashing the firehose of all gitlab webhooks?
Also nit that the PR title's reference to using event stream is out of date now. Too bad that it wasn't able to work there - out of curiosity what were the blockers there?
|
|
||
| def _build_webhook_tags( | ||
| github_event: GithubWebhookType, github_event_action: str | ||
| github_event: GithubWebhookType | str, github_event_action: str |
There was a problem hiding this comment.
Is it worth renaming these arguments to be provider-agnostic? i.e. event and event_action? Or maybe just a type alias on str to indicate the possible GitLab WebhookType values?
| sections = full_name.split("/") | ||
| if len(sections) < 2: | ||
| raise ValueError(f"Invalid repository name format: {full_name}") | ||
| return sections[0], "/".join(sections[1:]) |
There was a problem hiding this comment.
Oh dang oof, I didn't realize GitLab paths can be nested arbitrarily deep, e.g. <org>/<team>/<subteam>/<repo>, where the repo name is only unique to the deepest level.
I *think that doesn't matter on the application end within seer, but I'll double check (i.e., places we use repo name in apis instead of ids and where /s may not be appropriately accounted for)
There was a problem hiding this comment.
Claude says there are 2 spots within evals only but otherwise we're fine here. Here's a ticket for low priority update the evals for gitlab (we don't have any gitlab examples in evals nor will we any time soon son this is pretty low prio)
- Production paths don't use bare name — they go through full_name (bug_prediction_step.py:247,269,390, repo_client.py:1362) or numeric
external_id (:259,309). And f"{owner}/{name}" round-trips the same regardless of where we split, so this doesn't change GitHub behavior
either way.
- Where the split actually matters is the few spots using bare name/owner: the eval _sqlite_db_path(repo.name) (:238) and
head=f"{repo_owner}:{ref}" (repo_client.py:2623). If we're splitting nested GitLab paths, do we want name to be just the project there?
- Unrelated nit: get_repo_client(repo_name=...) at :390 is being passed full_name — confusing param name.
billyvg
left a comment
There was a problem hiding this comment.
There's a few spots where we reference "github" but it's been changed to be provider agnostic, something to clean-up later on. clanker found one potential security issue but otherwise lgtm
| path = repo.config.get("path") | ||
| if not path: | ||
| raise ValueError(f"GitLab repository {repo.id} is missing config['path']") | ||
| owner, name = _split_full_name(path) |
There was a problem hiding this comment.
Security: the repo is authorized by project.id but addressed to Seer by path_with_namespace — two independent payload fields that can disagree.
The owner/name forwarded to Seer here come from repo.config["path"], which MergeEventWebhook.update_repo_data() overwrites from event["project"]["path_with_namespace"] on every delivery. But the repo was only authorized via get_repo(), which matches on external_id = "{instance}:{project_id}" (i.e. event["project"]["id"]). Those are separate, independently attacker-influenced fields in the same payload.
So a single forged delivery (anyone holding the integration's webhook secret) for a legitimately-connected repo (project.id 123 → passes the external_id check) can carry path_with_namespace: "other-group/private-repo". That value is persisted into config["path"] and then becomes the owner/name Seer operates on — using this org's integration token. It's a confused-deputy: the field that authorizes and the field that directs the outbound action are decoupled, so Seer can be pointed at (and post review output on) a project the connected-repo path never authorized, bounded only by the integration token's scope.
update_repo_data() is pre-existing, but this change is what gives config["path"] a security-sensitive sink (an outbound action via Seer under the org token), which elevates the impact of that trust.
Suggested mitigations, most robust first:
- Verify
path_with_namespaceis consistent with the authorizedproject.idbefore trusting it (or before persisting it inupdate_repo_data), rather than overwriting stored state from each delivery. - Or derive
owner/namefrom an identity bound to the authorizedexternal_id, not a free-text path field. - At minimum, document that Seer's project addressing inherits the integration token's scope and that
config["path"]is attacker-influenced.
Non-blocking if the GitLab integration token scope is known to be narrow, but worth a deliberate decision since it decouples authorization from the action target.
There was a problem hiding this comment.
I believe this is a non-issue.
- This is an authenticated endpoint. You need the secret to make a request.
- A poisoned owner/repo can only access repos within the integration's scope.
- We pass the external_id to seer which I believe in Seer is the only way we interact with the SCM.
|
Oh I think one thing we're missing is the 👀 for code review, we should make a linear ticket for that |
| if is_closed: | ||
| validated = SeerCodeReviewTaskRequestForPrClosed.parse_obj(payload) |
There was a problem hiding this comment.
Bug: GitLab MR close/merge events are silently dropped if the repository's integration_id is None in the database, due to a Pydantic validation failure.
Severity: HIGH
Suggested Fix
Make the integration_id field optional in the SeerCodeReviewRepoForPrClosed Pydantic model to align with the payload generation logic and prevent the validation error. Alternatively, add a pre-check to filter out repositories without an integration_id earlier, with proper metrics.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/seer/code_review/webhooks/merge_request.py#L363-L364
Potential issue: When a GitLab repository has a `NULL` `integration_id` in the database,
`pr-closed` events for that repository will be silently dropped. The payload generation
logic omits the `integration_id` if it's `None`, but the `SeerCodeReviewRepoForPrClosed`
Pydantic model requires it. This causes a `ValidationError` during parsing, which is
caught and logged as `INVALID_PAYLOAD`, preventing the event from being processed. This
is a functional regression for tracking GitLab MR close/merge events for affected
repositories.
Also affects:
src/sentry/seer/code_review/utils.py:398~401
| if action not in CLOSE_ACTIONS: | ||
| if ( | ||
| object_attributes.get("draft") is True | ||
| or object_attributes.get("work_in_progress") is True | ||
| ): | ||
| return |
There was a problem hiding this comment.
Bug: UPDATE events with new commits on draft GitLab MRs are silently dropped without recording record_webhook_filtered metrics, breaking the observability funnel.
Severity: MEDIUM
Suggested Fix
Before returning for a draft MR update, call record_webhook_filtered with an appropriate reason, such as DRAFT_MR. This will ensure that the dropped event is correctly tracked in metrics, providing visibility into the filtering logic.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/seer/code_review/webhooks/merge_request.py#L254-L259
Potential issue: When an `UPDATE` webhook event with new commits (`oldrev` is present)
is received for a draft GitLab merge request, the event is silently dropped. The code
checks if the MR is a draft and returns early without recording any metrics. This breaks
the observability funnel, as the `record_webhook_received` metric is fired but the
corresponding `record_webhook_filtered` metric is not. This makes it impossible to track
why these valid events are being ignored. This behavior is inconsistent with how other
filtered events are handled.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ae51393. Configure here.
| event=event, | ||
| organization=organization, | ||
| repo=repo, | ||
| ) |
There was a problem hiding this comment.
Code review handler blocked by unrelated PR persistence failures
Medium Severity
self._handle() is placed after the PR persistence try/except block, so a KeyError on fields like merge_commit_sha, title, created_at, or source_branch — none of which the code review handler needs — prevents handle_merge_request_event from ever running. Similarly, raise Http404() on line 388 exits before _handle. The code review handler independently handles incomplete payloads gracefully (checks last_commit.id and returns), but is never given the chance to run when PR persistence encounters a missing key. This couples the code review path to the PR persistence path unnecessarily. For "close"/"merge" events, merge_commit_sha may legitimately be absent from older GitLab instances, silently blocking the pr-closed notification to Seer.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ae51393. Configure here.


Wire GitLab merge requests into Seer code review. This uses the old web hook handler format rather than the new event listeners format. I struggled too much with the listeners. I'd rather ship the old system than deal with it. It can be ported later if/when the new system is at parity.
Slop
What changed:
WEBHOOK_EVENT_PROCESSORStuple + shared_handleloop (withEVENT_TYPEenforcement) so multiple handlers can run per event. PR persistence stays inline inMergeEventWebhook.__call__; only the code-review handler runs in the (error-isolated) processor loop.Known limitation
Code review won't fire in production until GitLab OrganizationContributors are seeded — the billing preflight currently filters every MR with ORG_CONTRIBUTOR_NOT_FOUND (GitHub seeds these via track_contributor_seat; GitLab doesn't yet). Documented in the module docstring.