feat(dynamic-sampling): Add per-org EAP transaction volume query#115161
Conversation
Add get_eap_transaction_volumes() to retrieve per-project transaction volumes from EAP spans, with optional ordering and max_transactions limit. Uses the existing run_eap_spans_table_query_in_chunks() for batched iteration and adds a new Snuba referrer for the query. Co-Authored-By: Claude Sonnet 4 <noreply@example.com>
Co-authored-by: Simon Hellmayr <shellmayr@users.noreply.github.com>
…r-org-get-eap-transaction-volumes-v2
…ually select the root transaction
…r-org-get-eap-transaction-volumes-v2
…s.run_table_query and set default max_transactions
…r-org-get-eap-transaction-volumes-v2
…r-org-get-eap-transaction-volumes-v2
…r-org-get-eap-transaction-volumes-v2
| if (transaction := row.get(DynamicSamplingQueryFields.TRANSACTION)) is None: | ||
| continue |
There was a problem hiding this comment.
Can we add this filter to the query directly? (not sure if this is exposed)
There was a problem hiding this comment.
Good call - added has:sentry.dsc.transaction to the query string so the filter happens server-side, and dropped the corresponding is None check in the result loop.
…e transaction filter and adjust test cases
…on_volumes function
|
|
||
| if not more_results: | ||
| # either we run out of results or we hit the max results limit, in both cases we should stop | ||
| if not more_results or (max_results is not None and offset >= max_results): |
There was a problem hiding this comment.
Unused max_results parameter adds dead code to chunking function
Low Severity
The max_results parameter was added to run_eap_spans_table_query_in_chunks along with new branching logic (current_chunk_size variable, conditional chunk-size recalculation, extra termination condition), but get_eap_transaction_volumes calls Spans.run_table_query directly instead of using this helper. No caller in the codebase passes max_results, making this parameter and its associated logic untested dead code.
Reviewed by Cursor Bugbot for commit 21a0167. Configure here.
| project_id = _get_aggregate_int(row, DynamicSamplingQueryFields.DSC_PROJECT_ID) | ||
| project_volumes = volumes_by_project[project_id] |
There was a problem hiding this comment.
Bug: The function get_eap_transaction_volumes lacks a guard for missing sentry.dsc.project_id fields, causing transaction data to be incorrectly aggregated under project_id = 0.
Severity: MEDIUM
Suggested Fix
Add a guard in get_eap_transaction_volumes to check if dsc_project_id is present in the row before processing it, similar to the logic in get_eap_project_volumes. If the ID is missing, the row should be skipped to prevent incorrect aggregation.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/dynamic_sampling/per_org/tasks/queries.py#L233-L234
Potential issue: The function `get_eap_transaction_volumes` processes query results to
aggregate transaction volumes. If a row from the query is missing the
`sentry.dsc.project_id` field, the helper `_get_aggregate_int` will default to returning
`0`. This leads to transaction data being incorrectly assigned to `project_id = 0`,
which is not a valid project ID, corrupting the final aggregation. A similar function,
`get_eap_project_volumes`, contains an explicit guard to skip rows with a missing
`dsc_project_id`, but this new function lacks that defensive check.
| project_id = _get_aggregate_int(row, DynamicSamplingQueryFields.DSC_PROJECT_ID) | ||
| project_volumes = volumes_by_project[project_id] | ||
|
|
||
| project_volumes.transaction_counts.append((str(transaction), total)) |
There was a problem hiding this comment.
Bug: A missing or null sentry.dsc.transaction field is converted to the string "None" and stored as a transaction name, leading to incorrect data.
Severity: MEDIUM
Suggested Fix
Before converting the transaction variable to a string, add a check to ensure it is not None. If transaction is None, the row should be skipped to avoid storing "None" as a transaction name. This adds a defensive guard that is missing.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/dynamic_sampling/per_org/tasks/queries.py#L236
Potential issue: In `get_eap_transaction_volumes`, if a row is returned from the query
where the `sentry.dsc.transaction` field is missing or null, `row.get(...)` will return
`None`. The code then calls `str(transaction)`, which converts `None` into the literal
string `"None"`. This string is then stored as a valid transaction name, leading to
incorrect data. While a `has:sentry.dsc.transaction` filter in the query is intended to
prevent this, the code lacks a defensive check to handle cases where a null value might
still be returned, a pattern that is present in similar functions.
| if not get_eap_transaction_volumes(config): | ||
| return DynamicSamplingStatus.NO_TRANSACTION_VOLUMES |
There was a problem hiding this comment.
Bug: The scheduler prematurely stops for organizations with no transaction volume, incorrectly marking the task as failed and preventing dynamic sampling from running.
Severity: HIGH
Suggested Fix
The early return based on an empty result from get_eap_transaction_volumes should be removed or made conditional. If this data is not yet used, the check is premature. If it is required for specific configurations, it should be guarded by a relevant config flag, similar to how get_eap_project_volumes is handled.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/sentry/dynamic_sampling/per_org/tasks/scheduler.py#L120-L121
Potential issue: The scheduler task `run_calculations_per_org_task` unconditionally
calls `get_eap_transaction_volumes`. If this function returns an empty list (e.g., for
an organization with no recent transactions), the scheduler immediately returns
`DynamicSamplingStatus.NO_TRANSACTION_VOLUMES` and stops processing. This incorrectly
prevents the dynamic sampling logic from completing successfully for valid organizations
that simply have no transaction volume in the query window. Unlike the conditional check
for project volumes, this check is always active, blocking the success path (`return
None`) for any organization without transactions.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7eb1d00. Configure here.


Add
get_eap_transaction_volumes()to retrieve per-project transaction volumes from EAP spans, with optional ordering (order_by_volume) andmax_transactionslimit. Uses the existingrun_eap_spans_table_query_in_chunks()for batched iteration.Replaces #115047 (which got corrupted during a rebase).
Closes https://linear.app/getsentry/issue/TET-2306/create-transaction-volume-query-for-eap