Skip to content

feat(dynamic-sampling): add recalibration logic to the new per-org pipeline#115786

Open
shellmayr wants to merge 9 commits into
masterfrom
shellmayr/feat/dynamic-sampling-per-org-recalibration
Open

feat(dynamic-sampling): add recalibration logic to the new per-org pipeline#115786
shellmayr wants to merge 9 commits into
masterfrom
shellmayr/feat/dynamic-sampling-per-org-recalibration

Conversation

@shellmayr
Copy link
Copy Markdown
Member

  • Add the query needed for recalibration (total & indexed org volume over the last 5 minutes) on EAP
  • Calculate the recalibration factor and log the old and new pipeline results (on cache retrieval)
  • Add property to configs, whether or not recalibration should be applied (project mode does not have this feature)
  • Add a logging system so we can tune both the sample rates & which logs we want to see using options

Closes TET-2233

@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 19, 2026

TET-2233

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 19, 2026
@shellmayr shellmayr marked this pull request as ready for review May 19, 2026 13:37
@shellmayr shellmayr requested review from a team as code owners May 19, 2026 13:37
Comment thread src/sentry/dynamic_sampling/per_org/tasks/calculations.py Outdated
@shellmayr shellmayr requested a review from constantinius May 20, 2026 13:12
Comment on lines +66 to +69
@property
def needs_recalibration(self) -> bool:
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just use needs_recalibration: bool = False if this never depends on dynamic data

Comment on lines +11 to +20
def should_log(location: str) -> bool:
if location not in (options.get(LOGGING_LOCATIONS_OPTION) or []):
return False

sample_rate = float(options.get(LOGGING_SAMPLE_RATE_OPTION))
if sample_rate <= 0.0:
return False
if sample_rate >= 1.0:
return True
return random.random() < sample_rate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit convoluted, we could separate our concerns a bit better by using a logging filter. We could re-use or adapt the SamplingFilter

if org_volume_1_hour is None:
return DynamicSamplingStatus.NO_VOLUME

calculate_recalibration_factor(config, org_volume_5_minutes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The result of calculate_recalibration_factor is discarded, so the computed recalibration factor is never stored in the cache, rendering the feature ineffective.
Severity: HIGH

Suggested Fix

Store the return value of calculate_recalibration_factor in the cache. This will likely require adding a cache-writing function to the new cache.py module and then calling it from scheduler.py with the computed factor.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/dynamic_sampling/per_org/tasks/scheduler.py#L122

Potential issue: The function `calculate_recalibration_factor` is called, but its return
value is discarded and never stored in the cache. Consequently, any subsequent attempt
to retrieve the factor, such as with
`per_org_recalibration_cache.get_adjusted_factor()`, will always receive the default
value of `1.0`. This renders the per-organization recalibration feature ineffective and
makes the associated diagnostic logging meaningless, as it will always compare the old
factor to a 'new' factor of `1.0`.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment thread src/sentry/dynamic_sampling/per_org/tasks/calculations.py Outdated
Comment thread src/sentry/dynamic_sampling/per_org/scheduler.py
Comment thread src/sentry/dynamic_sampling/per_org/calculations.py
Comment on lines +43 to +44
if org_volume.indexed is None or org_volume.total == 0:
raise DynamicSamplingInvalidOrgVolumes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The check to raise DynamicSamplingInvalidOrgVolumes is unreachable because the conditions are already guaranteed to be false by the preceding is_valid_for_recalibration() call, making it dead code.
Severity: LOW

Suggested Fix

Remove the unreachable code block that raises DynamicSamplingInvalidOrgVolumes. The preceding is_valid_for_recalibration() check is sufficient. The DynamicSamplingInvalidOrgVolumes exception class, which becomes unused, can also be removed.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/dynamic_sampling/per_org/calculations.py#L43-L44

Potential issue: In the `calculate_recalibration_factor` function, a check for
`org_volume.is_valid_for_recalibration()` is performed. If this check passes, it
guarantees that `org_volume.total > 0` and `org_volume.indexed` is not `None`. However,
the subsequent lines of code check for `org_volume.indexed is None or org_volume.total
== 0` and attempt to raise a `DynamicSamplingInvalidOrgVolumes` exception. Due to the
prior validation, these conditions can never be met, rendering the exception-raising
code unreachable. This dead code fails to implement the intended safeguard against
impossible data states.

@github-actions
Copy link
Copy Markdown
Contributor

Backend Test Failures

Failures on f8614bd in this run:

tests/sentry/dynamic_sampling/per_org/test_configuration.py::DynamicSamplingOrgConfigurationTest::test_project_mode_custom_dynamic_sampling_stores_project_sample_rateslog
[gw1] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ef0371a. Configure here.

if org_volume is None or not org_volume.is_valid_for_recalibration():
return None
if org_volume.indexed is None or org_volume.total == 0:
raise DynamicSamplingInvalidOrgVolumes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable exception makes guard logic dead code

Low Severity

The check if org_volume.indexed is None or org_volume.total == 0 on line 43 is unreachable. The preceding guard on line 41 calls org_volume.is_valid_for_recalibration(), which returns self.total > 0 and self.indexed is not None and self.indexed > 0. If that method returns True (required to reach line 43), then indexed is guaranteed non-None and total is guaranteed positive. The DynamicSamplingInvalidOrgVolumes exception can therefore never be raised, and the exception class itself is unused dead code.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ef0371a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants