Skip to content

Conversation

@snigdhas
Copy link
Contributor

When applying query filters from Postgres, pre-filtering the results before passing the query to Snuba can break if it hits the max size that ClickHouse is able to process (sentry issue). This also manifests as a possible bug in the serializer, though that's more of a red herring since ClickHouse won't be able to handle the query either way.

To work around this, I'm applying the pattern used in the PostgresSnubaQueryExecutor to determine whether to pre- or post-filter the results based on the number of group_ids in the filter. If the number of groups in the pre-filter exceeds the max_candidates, we'll apply the filter after the Snuba query in a post-filtering step right before the pagination happens.

The tests confirm that the is:linked and is:unlinked queries work as a pre- and post-filter dependent on the value configured in snuba.search.max-pre-snuba-candidates.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 18, 2024
@snigdhas snigdhas changed the title feat(issue-search): Add max_candidates to push Postgres to post-filtering if needed feat(issue-search): Check max_candidates to push Postgres fields to post-filtering Jul 18, 2024
@codecov
Copy link

codecov bot commented Jul 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.18%. Comparing base (b403ecc) to head (684b430).
Report is 87 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #74531       +/-   ##
===========================================
+ Coverage   56.96%   78.18%   +21.22%     
===========================================
  Files        6459     6691      +232     
  Lines      285436   299601    +14165     
  Branches    49078    51568     +2490     
===========================================
+ Hits       162597   234258    +71661     
+ Misses     118412    59019    -59393     
- Partials     4427     6324     +1897     
Files Coverage Δ
src/sentry/search/snuba/executors.py 93.98% <100.00%> (+43.50%) ⬆️

... and 2117 files with indirect coverage changes

@snigdhas snigdhas force-pushed the snigdha/post-filtering branch from 311fdec to e0738cc Compare July 19, 2024 19:00
@snigdhas snigdhas marked this pull request as ready for review July 19, 2024 20:47
@snigdhas snigdhas requested a review from a team as a code owner July 19, 2024 20:47
Copy link
Contributor

@vartec vartec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I've added some suggestion to simplify the logic a bit.

having = []
# if we need to prefetch from postgres, we add filter by the group ids
if group_ids_to_pass_to_snuba is not None:
if group_ids_to_pass_to_snuba is not None and not too_many_candidates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going back to the comment above, if you reset group_ids_to_pass_to_snuba to None when too_many_candidates, not need to change anything here

Comment on lines 1668 to 1674
if (
group_ids_to_pass_to_snuba is not None
and len(group_ids_to_pass_to_snuba) > max_candidates
):
metrics.incr("snuba.search.too_many_candidates", skip_internal=False)
too_many_candidates = True
group_ids_to_pass_to_snuba = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (
group_ids_to_pass_to_snuba is not None
and len(group_ids_to_pass_to_snuba) > max_candidates
):
metrics.incr("snuba.search.too_many_candidates", skip_internal=False)
too_many_candidates = True
group_ids_to_pass_to_snuba = []
if (too_many_candidates := (len(group_ids_to_pass_to_snuba) > max_candidates)):
metrics.incr("snuba.search.too_many_candidates", skip_internal=False)
group_ids_to_pass_to_snuba = None

You can keep this within same if that starts in line 1663, because that's the only way you'll have any candidates.

Reseting to None when there are too many candidates will simplify some checks below.

@snigdhas snigdhas force-pushed the snigdha/post-filtering branch from ed20a65 to 684b430 Compare July 22, 2024 18:01
@snigdhas snigdhas merged commit 9255afa into master Jul 22, 2024
@snigdhas snigdhas deleted the snigdha/post-filtering branch July 22, 2024 19:03
@github-actions github-actions bot locked and limited conversation to collaborators Aug 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants