Skip to content

ref(seer): Add GitLab code-review web hooks#116317

Merged
cmanallen merged 31 commits into
masterfrom
cmanallen/scm-pull-request-open-close-hooks
May 29, 2026
Merged

ref(seer): Add GitLab code-review web hooks#116317
cmanallen merged 31 commits into
masterfrom
cmanallen/scm-pull-request-open-close-hooks

Conversation

@cmanallen
Copy link
Copy Markdown
Member

@cmanallen cmanallen commented May 27, 2026

Wire GitLab merge requests into Seer code review. This uses the old web hook handler format rather than the new event listeners format. I struggled too much with the listeners. I'd rather ship the old system than deal with it. It can be ported later if/when the new system is at parity.

Slop

What changed:

  • Processor architecture on GitlabWebhook — adds a WEBHOOK_EVENT_PROCESSORS tuple + shared _handle loop (with EVENT_TYPE enforcement) so multiple handlers can run per event. PR persistence stays inline in MergeEventWebhook.__call__; only the code-review handler runs in the (error-isolated) processor loop.
  • New handle_merge_request_event handler (seer/code_review/webhooks/merge_request.py) — runs preflight checks, maps GitLab actions to triggers, and enqueues the review:
    • open / un-drafted update → ON_READY_FOR_REVIEW; update with new commits → ON_NEW_COMMIT; close / merge → PR-closed. Everything else (metadata-only edits, drafts unsupported actions) is filtered.
    • GitLab has no ready_for_review action, so un-drafting is detected from the changes payload.
    • De-dupes redeliveries (and the per-org dispatch fan-out) via a short Redis TTL key.
    • Routes to the provider-agnostic /v1/scm_code_review/{review-request,pr-closed} Seer endpoints (GitLab isn't supported by the PyGithub-based /v1/code_review/* routes).
  • Provider-aware repo definitions (seer/code_review/utils.py) — split GitHub/GitLab builders. GitLab derives owner/name from config["path"] (not the display name) and is_private from project.visibility_level.
  • Metrics generalized to accept string event names so the funnel metrics work for merge_request.

Known limitation

Code review won't fire in production until GitLab OrganizationContributors are seeded — the billing preflight currently filters every MR with ORG_CONTRIBUTOR_NOT_FOUND (GitHub seeds these via track_contributor_seat; GitLab doesn't yet). Documented in the module docstring.

@cmanallen cmanallen requested review from a team as code owners May 27, 2026 16:53
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 27, 2026
Comment thread src/sentry/seer/code_review/webhooks/listeners.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/listeners.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/handlers.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/listeners.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/listeners.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/listeners.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/listeners.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/listeners.py Outdated
Comment thread src/sentry/scm/types.py Outdated
Comment thread src/sentry/integrations/gitlab/webhooks.py
Comment thread src/sentry/integrations/gitlab/webhooks.py
Comment thread src/sentry/integrations/gitlab/webhooks.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/merge_request.py
Comment thread src/sentry/seer/code_review/utils.py
Comment thread src/sentry/seer/code_review/webhooks/merge_request.py Outdated
Copy link
Copy Markdown
Contributor

@suejungshin suejungshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a flag we can add somewhere so we confirm this pathway works for a couple orgs before unleashing the firehose of all gitlab webhooks?

Also nit that the PR title's reference to using event stream is out of date now. Too bad that it wasn't able to work there - out of curiosity what were the blockers there?

Comment thread src/sentry/seer/code_review/webhooks/merge_request.py

def _build_webhook_tags(
github_event: GithubWebhookType, github_event_action: str
github_event: GithubWebhookType | str, github_event_action: str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth renaming these arguments to be provider-agnostic? i.e. event and event_action? Or maybe just a type alias on str to indicate the possible GitLab WebhookType values?

sections = full_name.split("/")
if len(sections) < 2:
raise ValueError(f"Invalid repository name format: {full_name}")
return sections[0], "/".join(sections[1:])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dang oof, I didn't realize GitLab paths can be nested arbitrarily deep, e.g. <org>/<team>/<subteam>/<repo>, where the repo name is only unique to the deepest level.

I *think that doesn't matter on the application end within seer, but I'll double check (i.e., places we use repo name in apis instead of ids and where /s may not be appropriately accounted for)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says there are 2 spots within evals only but otherwise we're fine here. Here's a ticket for low priority update the evals for gitlab (we don't have any gitlab examples in evals nor will we any time soon son this is pretty low prio)

  - Production paths don't use bare name — they go through full_name (bug_prediction_step.py:247,269,390, repo_client.py:1362) or numeric
  external_id (:259,309). And f"{owner}/{name}" round-trips the same regardless of where we split, so this doesn't change GitHub behavior
  either way.
  - Where the split actually matters is the few spots using bare name/owner: the eval _sqlite_db_path(repo.name) (:238) and
  head=f"{repo_owner}:{ref}" (repo_client.py:2623). If we're splitting nested GitLab paths, do we want name to be just the project there?
  - Unrelated nit: get_repo_client(repo_name=...) at :390 is being passed full_name — confusing param name.

Copy link
Copy Markdown
Member

@billyvg billyvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few spots where we reference "github" but it's been changed to be provider agnostic, something to clean-up later on. clanker found one potential security issue but otherwise lgtm

Comment thread src/sentry/integrations/gitlab/webhooks.py Outdated
Comment thread src/sentry/integrations/gitlab/webhooks.py Outdated
Comment thread src/sentry/seer/code_review/webhooks/merge_request.py Outdated
Comment on lines +431 to +434
path = repo.config.get("path")
if not path:
raise ValueError(f"GitLab repository {repo.id} is missing config['path']")
owner, name = _split_full_name(path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: the repo is authorized by project.id but addressed to Seer by path_with_namespace — two independent payload fields that can disagree.

The owner/name forwarded to Seer here come from repo.config["path"], which MergeEventWebhook.update_repo_data() overwrites from event["project"]["path_with_namespace"] on every delivery. But the repo was only authorized via get_repo(), which matches on external_id = "{instance}:{project_id}" (i.e. event["project"]["id"]). Those are separate, independently attacker-influenced fields in the same payload.

So a single forged delivery (anyone holding the integration's webhook secret) for a legitimately-connected repo (project.id 123 → passes the external_id check) can carry path_with_namespace: "other-group/private-repo". That value is persisted into config["path"] and then becomes the owner/name Seer operates on — using this org's integration token. It's a confused-deputy: the field that authorizes and the field that directs the outbound action are decoupled, so Seer can be pointed at (and post review output on) a project the connected-repo path never authorized, bounded only by the integration token's scope.

update_repo_data() is pre-existing, but this change is what gives config["path"] a security-sensitive sink (an outbound action via Seer under the org token), which elevates the impact of that trust.

Suggested mitigations, most robust first:

  • Verify path_with_namespace is consistent with the authorized project.id before trusting it (or before persisting it in update_repo_data), rather than overwriting stored state from each delivery.
  • Or derive owner/name from an identity bound to the authorized external_id, not a free-text path field.
  • At minimum, document that Seer's project addressing inherits the integration token's scope and that config["path"] is attacker-influenced.

Non-blocking if the GitLab integration token scope is known to be narrow, but worth a deliberate decision since it decouples authorization from the action target.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a non-issue.

  1. This is an authenticated endpoint. You need the secret to make a request.
  2. A poisoned owner/repo can only access repos within the integration's scope.
  3. We pass the external_id to seer which I believe in Seer is the only way we interact with the SCM.

@billyvg
Copy link
Copy Markdown
Member

billyvg commented May 29, 2026

Oh I think one thing we're missing is the 👀 for code review, we should make a linear ticket for that

Comment thread src/sentry/seer/code_review/webhooks/merge_request.py
Comment thread src/sentry/seer/code_review/webhooks/merge_request.py
@cmanallen cmanallen changed the title ref(seer): Use scm event stream for pull-request events ref(seer): Add GitLab code-review web hooks May 29, 2026
Comment on lines +363 to +364
if is_closed:
validated = SeerCodeReviewTaskRequestForPrClosed.parse_obj(payload)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: GitLab MR close/merge events are silently dropped if the repository's integration_id is None in the database, due to a Pydantic validation failure.
Severity: HIGH

Suggested Fix

Make the integration_id field optional in the SeerCodeReviewRepoForPrClosed Pydantic model to align with the payload generation logic and prevent the validation error. Alternatively, add a pre-check to filter out repositories without an integration_id earlier, with proper metrics.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/seer/code_review/webhooks/merge_request.py#L363-L364

Potential issue: When a GitLab repository has a `NULL` `integration_id` in the database,
`pr-closed` events for that repository will be silently dropped. The payload generation
logic omits the `integration_id` if it's `None`, but the `SeerCodeReviewRepoForPrClosed`
Pydantic model requires it. This causes a `ValidationError` during parsing, which is
caught and logged as `INVALID_PAYLOAD`, preventing the event from being processed. This
is a functional regression for tracking GitLab MR close/merge events for affected
repositories.

Also affects:

  • src/sentry/seer/code_review/utils.py:398~401

Comment on lines +254 to +259
if action not in CLOSE_ACTIONS:
if (
object_attributes.get("draft") is True
or object_attributes.get("work_in_progress") is True
):
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: UPDATE events with new commits on draft GitLab MRs are silently dropped without recording record_webhook_filtered metrics, breaking the observability funnel.
Severity: MEDIUM

Suggested Fix

Before returning for a draft MR update, call record_webhook_filtered with an appropriate reason, such as DRAFT_MR. This will ensure that the dropped event is correctly tracked in metrics, providing visibility into the filtering logic.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/seer/code_review/webhooks/merge_request.py#L254-L259

Potential issue: When an `UPDATE` webhook event with new commits (`oldrev` is present)
is received for a draft GitLab merge request, the event is silently dropped. The code
checks if the MR is a draft and returns early without recording any metrics. This breaks
the observability funnel, as the `record_webhook_received` metric is fired but the
corresponding `record_webhook_filtered` metric is not. This makes it impossible to track
why these valid events are being ignored. This behavior is inconsistent with how other
filtered events are handled.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ae51393. Configure here.

event=event,
organization=organization,
repo=repo,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review handler blocked by unrelated PR persistence failures

Medium Severity

self._handle() is placed after the PR persistence try/except block, so a KeyError on fields like merge_commit_sha, title, created_at, or source_branch — none of which the code review handler needs — prevents handle_merge_request_event from ever running. Similarly, raise Http404() on line 388 exits before _handle. The code review handler independently handles incomplete payloads gracefully (checks last_commit.id and returns), but is never given the chance to run when PR persistence encounters a missing key. This couples the code review path to the PR persistence path unnecessarily. For "close"/"merge" events, merge_commit_sha may legitimately be absent from older GitLab instances, silently blocking the pr-closed notification to Seer.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ae51393. Configure here.

@cmanallen cmanallen merged commit fd08d69 into master May 29, 2026
85 checks passed
@cmanallen cmanallen deleted the cmanallen/scm-pull-request-open-close-hooks branch May 29, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants