Skip to content

Conversation

@shashjar
Copy link
Member

@shashjar shashjar commented Jan 5, 2026

Adds EAP double-reading support for the query_groups_past_counts function used in escalating issue detection.

Changes:

  • Implemented _query_groups_past_counts_eap that queries hourly event counts grouped by project_id and group_id
  • Updated query_groups_past_counts to use the double-read pattern with EAPOccurrencesComparator

Ran some test queries to compare results on local devserver:

============================================================
EAP Query Groups Past Counts - Manual Test
============================================================

============================================================
Testing organization: Sentry (id=1)
============================================================

Found 19 groups to test
Group IDs: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10]...
Error groups: 19, Other escalatable groups: 19

--- Testing Snuba implementation ---
Snuba returned 4 results
Sample result: {'project_id': 3, 'group_id': 8, 'hourBucket': '2026-01-05T23:00:00+00:00', 'count()': 11}

--- Testing EAP implementation ---
23:53:57 [INFO] sentry.utils.snuba_rpc: Running a EndpointTimeSeries RPC query (rpc_query={'meta': {'organizationId': '1', 'referrer': 'sentry.issues.escalating', 'projectIds': ['1', '2', '3'], 'startTimestamp': '2025-12-29T23:00:00Z', 'endTimestamp': '2026-01-06T00:00:00Z', 'traceItemType': 'TRACE_ITEM_TYPE_OCCURRENCE', 'downsampledStorageConfig': {'mode': 'MODE_NORMAL'}}, 'filter': {'andFilter': {'filters': [{'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '19'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '18'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '17'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '12'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '11'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '10'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '9'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '8'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '16'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '15'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '14'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '13'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '2'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '7'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '6'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '5'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '4'}}}, {'orFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '3'}}}, {'comparisonFilter': {'key': {'type': 'TYPE_INT', 'name': 'group_id'}, 'op': 'OP_EQUALS', 'value': {'valInt': '1'}}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}]}}, {'andFilter': {'filters': [{'comparisonFilter': {'key': {'type': 'TYPE_DOUBLE', 'name': 'sentry.timestamp'}, 'op': 'OP_GREATER_THAN_OR_EQUALS', 'value': {'valInt': '1767052437'}}}, {'comparisonFilter': {'key': {'type': 'TYPE_DOUBLE', 'name': 'sentry.timestamp'}, 'op': 'OP_LESS_THAN', 'value': {'valInt': '1767657237'}}}]}}]}}, 'granularitySecs': '3600', 'groupBy': [{'type': 'TYPE_INT', 'name': 'sentry.project_id'}, {'type': 'TYPE_INT', 'name': 'group_id'}], 'expressions': [{'aggregation': {'aggregate': 'FUNCTION_COUNT', 'key': {'type': 'TYPE_INT', 'name': 'sentry.project_id'}, 'label': 'count()', 'extrapolationMode': 'EXTRAPOLATION_MODE_SAMPLE_WEIGHTED'}, 'label': 'count()'}]} referrer='sentry.issues.escalating' organization_id=1 trace_item_type=7)
23:53:57 [INFO] sentry.utils.snuba_rpc: Timeseries RPC query response (rpc_rows=169 organization_id=1 meta=request_id: "e080d91837454cce94a73d6b9934fe2b"
query_info {
  stats {
    progress_bytes: 1488
  }
  metadata {
  }
}
downsampled_storage_meta {
}
)
EAP returned 4 results
Sample result: {'project_id': 3, 'group_id': 8, 'hourBucket': '2026-01-05T23:00:00+00:00', 'count()': 11}

--- Comparison ---
Total Snuba results: 4
Total EAP results: 4
Matching entries: 4
Mismatching entries: 0
Only in Snuba: 0
Only in EAP: 0

--- Ordering Check ---
✅ Results are in the same order (exact match)

--- Summary ---
✅ SUCCESS: Results match perfectly (data and ordering)!

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 5, 2026
@shashjar shashjar requested review from a team and thetruecpaul January 5, 2026 23:09
@shashjar shashjar marked this pull request as ready for review January 5, 2026 23:10
@shashjar shashjar requested a review from a team as a code owner January 5, 2026 23:10
@shashjar shashjar removed the request for review from a team January 5, 2026 23:10
@shashjar
Copy link
Member Author

shashjar commented Jan 5, 2026

bugbot run

Copy link
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter much that we're adding a couple more queries here and slowing down processing? If so, it could be worth delegating the extra queries to a separate thread

Comment on lines +100 to +105
results = EAPOccurrencesComparator.check_and_choose(
snuba_results,
eap_results,
"issues.escalating.query_groups_past_counts",
is_experimental_data_a_null_result=len(eap_results) == 0,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to include a reasonable match comparator or something along those lines. I think that we're likely to have small differences between the two systems just due to the fact that we don't ingest data at the same rate. It'd be good to have some kind of similarity % to compare even better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't include a reasonable match comparator in this case (since it's a little more involved), but we have that on some of the other read paths we've migrated so far. Right now we're only issuing reads in S4S (where we've been writing to EAP for almost 3 months at this point, so we should be almost at the point where the two data stores are in parity for groups in retention).

Re increased latency by adding more queries: yep this is on my todo list. We've only migrated a handful of read paths over to double reads so far, but as we do more we'll just generally be adding latency across the app which is not ideal. I'm thinking that the best solution here is to switch to a sampling approach (where we double read only a portion of the time) rather than double reading on every query, so that we'll still get representative metrics for comparison but reduce the latency impact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the sampling approach sounds decent. It's also not too hard to delegate to threads if you want to go down that path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of data accuracy - historically it'll probably match up, but recently ingested data can be out of sync since there's no guarantee that writes to EAP are synced up with writes to snuba. It'll be a small discrepancy, but might be worth taking into account if you're seeing some queries not match up. Something you can handle later on, anyway

Base automatically changed from implement-timeseries-query-logic-for-eap-occurrences to master January 10, 2026 00:08
Copy link
Contributor

@thetruecpaul thetruecpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, one Q

Comment on lines +147 to +148
all_results += _query_groups_eap_by_org(error_groups)
all_results += _query_groups_eap_by_org(other_groups)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q here: why not make a single call with error_groups + other_groups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only to keep parity with the Snuba implementation, since that query orders the results such that error groups come first and then issue platform groups. Of course we don't have the same distinction in EAP, so we could definitely combine the queries, but then there's no guarantee that we get results in the same order as Snuba and therefore have exact_match on the comparator be informative.

Maybe we could keep this double query for the rollout to make sure we maintain parity and I can make a ticket to eventually condense this into one query?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds good! Definitely not a blocker.

@shashjar shashjar merged commit 9e24416 into master Jan 13, 2026
66 checks passed
@shashjar shashjar deleted the double-read-query-groups-past-counts-eap branch January 13, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants