-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
perf(issues): improve adjacent_events query #79365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Condition( | ||
| Column(DATASETS[dataset][Columns.TIMESTAMP.value.alias]), | ||
| Op.LT, | ||
| event.datetime + timedelta(days=100), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 100 days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sentry/src/sentry/eventstore/snuba/backend.py
Line 577 in 55415ea
| prev_filter.start = event.datetime - timedelta(days=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK carried forward 👍🏻
Right now the a `adjacent_events` query does a really big scan in order to find the adjacent events given an event id. It does this because the old query does not specify an order by which can take advantage of the primary key `ORDER BY (project_id, toStartOfDay(timestamp), primary_hash, cityHash64(event_id))` This PR adds a new function `get_adjacent_event_ids_snql` which uses SnQL and adds a order by clause. We observe in the snuba admin tool this new query takes only 2 seconds and scans hundreds of MBs of data as opposed to the old query which scans 10+ GB on large issues. `get_adjacent_event_ids_snql` is not general purpose as its only used in one place. see ticket below for more info: Fixes getsentry/team-issues#42
* master: (288 commits) feat(metrics): Register MRI for spans/count_per_root_project (#78992) feat(dynamic-sampling): Settings for sample rate (#79341) Revert "feat(sentry-sdk): Enable HTTP2 transport" (#79391) fix(feedback): keep oldest date_added for duplicate user reports (#79387) chore(issue-stream): Remove tooltip for Unhandled (#79385) chore(autofix): Show banner if gen AI consent is given, even if no feature flag (#79362) chore(autofix+copilot) Allow autofix without FF if gen AI consent given (#79361) Fixes VULN-50 by enforcing option (#79384) perf(issues): improve adjacent_events query (#79365) feat(issues): Add anchor links back to issue sections (#79333) fix(issue-views): Make tab bar take up entire row (#79383) chore(issues): Add additional metrics for ownership matching (#79302) feat(insights): create screen rendering module (#79192) fix(issues): Avoid streamline issue layout rerenders (#79327) ref(performance): Add missing types to performance widgets (#79301) chore(issue-views): Add translation wrapper to aria label (#79320) chore(issue-stream): Reduce font size of title and message (#79378) feat(insights): update headers and breadcrumbs on frontend domain view (#78945) feat(insights): add view trends button to ai overview (#78611) ref(rr6): Remove unused param (#79379) ...
Right now the a `adjacent_events` query does a really big scan in order to find the adjacent events given an event id. It does this because the old query does not specify an order by which can take advantage of the primary key `ORDER BY (project_id, toStartOfDay(timestamp), primary_hash, cityHash64(event_id))` This PR adds a new function `get_adjacent_event_ids_snql` which uses SnQL and adds a order by clause. We observe in the snuba admin tool this new query takes only 2 seconds and scans hundreds of MBs of data as opposed to the old query which scans 10+ GB on large issues. `get_adjacent_event_ids_snql` is not general purpose as its only used in one place. see ticket below for more info: Fixes getsentry/team-issues#42
fixes the adjacent queries introduced in #79365 -- needed to adjust the conditions a bit. adds a new test to ensure correct behavior.
Right now the a
adjacent_eventsquery does a really big scan in order to find the adjacent events given an event id. It does this because the old query does not specify an order by which can take advantage of the primary keyORDER BY (project_id, toStartOfDay(timestamp), primary_hash, cityHash64(event_id))This PR adds a new function
get_adjacent_event_ids_snqlwhich uses SnQL and adds a order by clause. We observe in the snuba admin tool this new query takes only 2 seconds and scans hundreds of MBs of data as opposed to the old query which scans 10+ GB on large issues.get_adjacent_event_ids_snqlis not general purpose as its only used in one place.see ticket below for more info:
Fixes https://github.com/getsentry/team-issues/issues/42