Skip to content

fix(code-review): Add cache to dedupe github webhook events#107734

Open
suejung-sentry wants to merge 3 commits intomasterfrom
sshin/dedupe-gh-webhooks
Open

fix(code-review): Add cache to dedupe github webhook events#107734
suejung-sentry wants to merge 3 commits intomasterfrom
sshin/dedupe-gh-webhooks

Conversation

@suejung-sentry
Copy link
Contributor

@suejung-sentry suejung-sentry commented Feb 5, 2026

This PR handles webhook delivery deduplication by introducing redis idempotency keys for the github webhook id.

GitHub guarantees "at-least-once" delivery so may send duplicate webhooks. We have seen anecdotally that seer can receive multiple requests for a single commit (from a pull_request.synchronize event) within 500 milliseconds of each other (redash).

It's unclear whether GitHub is delivering the webhook twice or something in our control-->regional forwarding queues is causing redelivery. In any case, it seems likely that the same payload is getting processed with the same github webhook id. So use that as the idempotency key.

I considered whether we should go for a lock instead. The downside of that is the lock would release after the function returns, which may happen sooner than the 500 milliseconds we are currently seeing dupes in.

So instead in this PR, we just say any webhook with the same webhook id delivered within the same 20 second window are not replayed and re-forwarded to seer.

I chose 20 second TTL to cover the 500 milliseconds period and any errant github retry+backoff behavior.

Redis should be able to handle this load which is one SET per github webhook that makes it past our "preflight" feature enablement filters (a max hour is around 2,000 code reviews, so say 1 request per second in a peak hour). Also the keys are small with a short TTL.

Closes CW-673

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 5, 2026
@suejung-sentry suejung-sentry changed the title fix(code-review): Add lock to dedupe github webhook events fix(code-review): Add cache to dedupe github webhook events Feb 5, 2026


def _get_webhook_seen_cluster() -> RedisCluster[str] | StrictRedis[str]:
return redis_clusters.get("default")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used similar pattern as done here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding references to the patterns you use 👍🏻


logger = logging.getLogger(__name__)

WEBHOOK_SEEN_TTL_SECONDS = 20
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked 20 seconds because per the redash, if there ever is a dupe, they happen within 500 milliseconds of each other. I thought 20 seconds would comfortably cover that and any errant github redelivery behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the source and speaking out your reasoning. Makes sense 👍🏻

@linear
Copy link

linear bot commented Feb 6, 2026

@suejung-sentry suejung-sentry marked this pull request as ready for review February 6, 2026 03:07
@suejung-sentry suejung-sentry requested a review from a team as a code owner February 6, 2026 03:07
Copy link
Contributor

@vaind vaind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not familiar with this part of the codebase, this looks reasonable to me and saves quite a few runs 👍 . Would be good to get another set of eyes on this though.

Comment on lines +1125 to +1127
if github_delivery_id is not None:
github_delivery_id = str(github_delivery_id)
sentry_sdk.set_extra("github_delivery_id", github_delivery_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add to context if it's None?

Suggested change
if github_delivery_id is not None:
github_delivery_id = str(github_delivery_id)
sentry_sdk.set_extra("github_delivery_id", github_delivery_id)
if github_delivery_id is not None:
github_delivery_id = str(github_delivery_id)
sentry_sdk.set_extra("github_delivery_id", github_delivery_id)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could still have the filter in Sentry if you want to find events without it.

For the record, this header will never be missing (unless GitHub introduces a bug in their code).

Comment on lines +134 to +138
def test_same_delivery_id_second_seen_skipped(self) -> None:
"""
Two deliveries with the same id, one after the other: the first marks seen and runs;
the second is already seen (key exists), so only one handler run.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both this test and test_webhook_already_seen_handler_not_invoked?


def __call__(self, event: Mapping[str, Any], **kwargs: Any) -> None:
github_event = kwargs["github_event"]
github_delivery_id = kwargs.get("github_delivery_id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As reference:

X-GitHub-Delivery: A globally unique identifier (GUID) to identify the event.

Comment on lines +1125 to +1127
if github_delivery_id is not None:
github_delivery_id = str(github_delivery_id)
sentry_sdk.set_extra("github_delivery_id", github_delivery_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could still have the filter in Sentry if you want to find events without it.

For the record, this header will never be missing (unless GitHub introduces a bug in their code).



def _get_webhook_seen_cluster() -> RedisCluster[str] | StrictRedis[str]:
return redis_clusters.get("default")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding references to the patterns you use 👍🏻


logger = logging.getLogger(__name__)

WEBHOOK_SEEN_TTL_SECONDS = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the source and speaking out your reasoning. Makes sense 👍🏻

delivery_id = f"already-seen-{uuid4()}"
cluster = redis_clusters.get("default")
seen_key = f"{WEBHOOK_SEEN_KEY_PREFIX}{delivery_id}"
cluster.set(seen_key, "1", ex=WEBHOOK_SEEN_TTL_SECONDS, nx=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more realistic test would be to simply call handle_webhook_event twice.
That way if the code managing the cluster changes the test would be exercise the logic within handle_webhook_event.

Perhaps delete this test and use test_same_delivery_id_second_seen_skipped since it does what I mention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants