Skip to content

feat(dynamic-sampling): Add per-org EAP transaction volume query#115161

Merged
constantinius merged 22 commits into
masterfrom
constantinius/feat/dynamic-sampling/add-per-org-get-eap-transaction-volumes-v2
May 28, 2026
Merged

feat(dynamic-sampling): Add per-org EAP transaction volume query#115161
constantinius merged 22 commits into
masterfrom
constantinius/feat/dynamic-sampling/add-per-org-get-eap-transaction-volumes-v2

Conversation

@constantinius
Copy link
Copy Markdown
Contributor

@constantinius constantinius commented May 8, 2026

Add get_eap_transaction_volumes() to retrieve per-project transaction volumes from EAP spans, with optional ordering (order_by_volume) and max_transactions limit. Uses the existing run_eap_spans_table_query_in_chunks() for batched iteration.

Replaces #115047 (which got corrupted during a rebase).

Closes https://linear.app/getsentry/issue/TET-2306/create-transaction-volume-query-for-eap

Add get_eap_transaction_volumes() to retrieve per-project transaction
volumes from EAP spans, with optional ordering and max_transactions limit.
Uses the existing run_eap_spans_table_query_in_chunks() for batched
iteration and adds a new Snuba referrer for the query.

Co-Authored-By: Claude Sonnet 4 <noreply@example.com>
@constantinius constantinius requested review from a team as code owners May 8, 2026 07:53
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 8, 2026
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 8, 2026

TET-2306

@constantinius constantinius requested a review from shellmayr May 8, 2026 08:00
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
constantinius and others added 2 commits May 8, 2026 10:12
Co-authored-by: Simon Hellmayr <shellmayr@users.noreply.github.com>
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
@constantinius constantinius requested a review from shellmayr May 12, 2026 10:04
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
…s.run_table_query and set default max_transactions
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
@constantinius constantinius requested a review from shellmayr May 21, 2026 12:54
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
Comment on lines +237 to +238
if (transaction := row.get(DynamicSamplingQueryFields.TRANSACTION)) is None:
continue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this filter to the query directly? (not sure if this is exposed)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - added has:sentry.dsc.transaction to the query string so the filter happens server-side, and dropped the corresponding is None check in the result loop.

Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py Outdated
Comment thread tests/sentry/dynamic_sampling/per_org/tasks/test_queries.py
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py
Comment thread src/sentry/dynamic_sampling/per_org/tasks/queries.py
Copy link
Copy Markdown
Member

@shellmayr shellmayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍


if not more_results:
# either we run out of results or we hit the max results limit, in both cases we should stop
if not more_results or (max_results is not None and offset >= max_results):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused max_results parameter adds dead code to chunking function

Low Severity

The max_results parameter was added to run_eap_spans_table_query_in_chunks along with new branching logic (current_chunk_size variable, conditional chunk-size recalculation, extra termination condition), but get_eap_transaction_volumes calls Spans.run_table_query directly instead of using this helper. No caller in the codebase passes max_results, making this parameter and its associated logic untested dead code.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 21a0167. Configure here.

@constantinius constantinius enabled auto-merge (squash) May 28, 2026 07:34
Comment on lines +233 to +234
project_id = _get_aggregate_int(row, DynamicSamplingQueryFields.DSC_PROJECT_ID)
project_volumes = volumes_by_project[project_id]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The function get_eap_transaction_volumes lacks a guard for missing sentry.dsc.project_id fields, causing transaction data to be incorrectly aggregated under project_id = 0.
Severity: MEDIUM

Suggested Fix

Add a guard in get_eap_transaction_volumes to check if dsc_project_id is present in the row before processing it, similar to the logic in get_eap_project_volumes. If the ID is missing, the row should be skipped to prevent incorrect aggregation.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/dynamic_sampling/per_org/tasks/queries.py#L233-L234

Potential issue: The function `get_eap_transaction_volumes` processes query results to
aggregate transaction volumes. If a row from the query is missing the
`sentry.dsc.project_id` field, the helper `_get_aggregate_int` will default to returning
`0`. This leads to transaction data being incorrectly assigned to `project_id = 0`,
which is not a valid project ID, corrupting the final aggregation. A similar function,
`get_eap_project_volumes`, contains an explicit guard to skip rows with a missing
`dsc_project_id`, but this new function lacks that defensive check.

project_id = _get_aggregate_int(row, DynamicSamplingQueryFields.DSC_PROJECT_ID)
project_volumes = volumes_by_project[project_id]

project_volumes.transaction_counts.append((str(transaction), total))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: A missing or null sentry.dsc.transaction field is converted to the string "None" and stored as a transaction name, leading to incorrect data.
Severity: MEDIUM

Suggested Fix

Before converting the transaction variable to a string, add a check to ensure it is not None. If transaction is None, the row should be skipped to avoid storing "None" as a transaction name. This adds a defensive guard that is missing.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/dynamic_sampling/per_org/tasks/queries.py#L236

Potential issue: In `get_eap_transaction_volumes`, if a row is returned from the query
where the `sentry.dsc.transaction` field is missing or null, `row.get(...)` will return
`None`. The code then calls `str(transaction)`, which converts `None` into the literal
string `"None"`. This string is then stored as a valid transaction name, leading to
incorrect data. While a `has:sentry.dsc.transaction` filter in the query is intended to
prevent this, the code lacks a defensive check to handle cases where a null value might
still be returned, a pattern that is present in similar functions.

Comment on lines +120 to +121
if not get_eap_transaction_volumes(config):
return DynamicSamplingStatus.NO_TRANSACTION_VOLUMES
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The scheduler prematurely stops for organizations with no transaction volume, incorrectly marking the task as failed and preventing dynamic sampling from running.
Severity: HIGH

Suggested Fix

The early return based on an empty result from get_eap_transaction_volumes should be removed or made conditional. If this data is not yet used, the check is premature. If it is required for specific configurations, it should be guarded by a relevant config flag, similar to how get_eap_project_volumes is handled.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/dynamic_sampling/per_org/tasks/scheduler.py#L120-L121

Potential issue: The scheduler task `run_calculations_per_org_task` unconditionally
calls `get_eap_transaction_volumes`. If this function returns an empty list (e.g., for
an organization with no recent transactions), the scheduler immediately returns
`DynamicSamplingStatus.NO_TRANSACTION_VOLUMES` and stops processing. This incorrectly
prevents the dynamic sampling logic from completing successfully for valid organizations
that simply have no transaction volume in the query window. Unlike the conditional check
for project volumes, this check is always active, blocking the success path (`return
None`) for any organization without transactions.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7eb1d00. Configure here.

Comment thread src/sentry/dynamic_sampling/per_org/tasks/scheduler.py
@constantinius constantinius merged commit 2cab4fe into master May 28, 2026
84 checks passed
@constantinius constantinius deleted the constantinius/feat/dynamic-sampling/add-per-org-get-eap-transaction-volumes-v2 branch May 28, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants