feat(issue-detection): Add org-level scheduling for AI issue detection by roggenkemper · Pull Request #113060 · getsentry/sentry

roggenkemper · 2026-04-15T16:18:59Z

Replace the hourly per-project dispatch with a 15-minute bucketed dispatcher that spreads org dispatches across time slots using hashing. Each org is assigned to a deterministic slot and dispatched once per full cycle. cached org results so each run doesn't need to find eligible organizations

github-actions · 2026-04-15T16:24:23Z

Backend Test Failures

Failures on 44421d5 in this run:

tests/sentry/tasks/test_llm_issue_detection.py::TestRunLLMIssueDetection::test_dispatches_orgs_in_current_slot — log

[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
src/sentry/features/manager.py:217: in _get_feature_class
    return self._feature_registry[name]
E   KeyError: 'organizations:ai-issue-detection'

During handling of the above exception, another exception occurred:
src/sentry/testutils/helpers/features.py:72: in features_override
    feature = features.get(name, None)
src/sentry/features/manager.py:227: in get
    cls = self._get_feature_class(name)
src/sentry/features/manager.py:219: in _get_feature_class
    raise FeatureNotRegistered(name)
E   sentry.features.exceptions.FeatureNotRegistered: The "organizations:ai-issue-detection" feature has not been registered. Ensure that a feature has been added to sentry.features.default_manager

During handling of the above exception, another exception occurred:
tests/sentry/tasks/test_llm_issue_detection.py:70: in test_dispatches_orgs_in_current_slot
    run_llm_issue_detection()
.venv/lib/python3.13/site-packages/taskbroker_client/task.py:92: in __call__
    return self._func(*args, **kwargs)
src/sentry/tasks/llm_issue_detection/detection.py:292: in run_llm_issue_detection
    if not features.has("organizations:ai-issue-detection", org):
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1167: in __call__
    return self._mock_call(*args, **kwargs)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1171: in _mock_call
    return self._execute_mock_call(*args, **kwargs)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1232: in _execute_mock_call
    result = effect(*args, **kwargs)
src/sentry/testutils/helpers/features.py:74: in features_override
    raise ValueError("Unregistered feature flag: %s", repr(name))
E   ValueError: ('Unregistered feature flag: %s', "'organizations:ai-issue-detection'")

tests/sentry/tasks/test_llm_issue_detection.py::TestRunLLMIssueDetection::test_skips_orgs_with_hidden_ai — log

[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
src/sentry/features/manager.py:217: in _get_feature_class
    return self._feature_registry[name]
E   KeyError: 'organizations:ai-issue-detection'

During handling of the above exception, another exception occurred:
src/sentry/testutils/helpers/features.py:72: in features_override
    feature = features.get(name, None)
src/sentry/features/manager.py:227: in get
    cls = self._get_feature_class(name)
src/sentry/features/manager.py:219: in _get_feature_class
    raise FeatureNotRegistered(name)
E   sentry.features.exceptions.FeatureNotRegistered: The "organizations:ai-issue-detection" feature has not been registered. Ensure that a feature has been added to sentry.features.default_manager

During handling of the above exception, another exception occurred:
tests/sentry/tasks/test_llm_issue_detection.py:121: in test_skips_orgs_with_hidden_ai
    run_llm_issue_detection()
.venv/lib/python3.13/site-packages/taskbroker_client/task.py:92: in __call__
    return self._func(*args, **kwargs)
src/sentry/tasks/llm_issue_detection/detection.py:292: in run_llm_issue_detection
    if not features.has("organizations:ai-issue-detection", org):
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1167: in __call__
    return self._mock_call(*args, **kwargs)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1171: in _mock_call
    return self._execute_mock_call(*args, **kwargs)
/opt/hostedtoolcache/Python/3.13.1/x64/lib/python3.13/unittest/mock.py:1232: in _execute_mock_call
    result = effect(*args, **kwargs)
src/sentry/testutils/helpers/features.py:74: in features_override
    raise ValueError("Unregistered feature flag: %s", repr(name))
E   ValueError: ('Unregistered feature flag: %s', "'organizations:ai-issue-detection'")

…ssue detection Replace the hourly per-project dispatch with a 15-minute bucketed dispatcher that spreads org dispatches across time slots using md5 hashing. Each org is assigned to a deterministic slot and dispatched once per full cycle. - Add `organizations:ai-issue-detection` feature flag (FlagPole) - Rewrite dispatcher to iterate active orgs with RangeQuerySetWrapper - Add `detect_llm_issues_for_org` task: picks random project, sends 1 trace - Remove legacy `detect_llm_issues_for_project` and project allowlist path - Change Celery Beat from hourly to every 15 minutes - NUM_DISPATCH_SLOTS=10 (~2.5h cycle), increase toward 67 as org count grows Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… detection Respect the user's project-level setting to avoid wasting Snuba queries and Seer calls when AI detection is disabled for the selected project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dition Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

roggenkemper · 2026-04-15T20:03:28Z

bugbot run

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5f1e8ae. Configure here.}

Scan all orgs and check feature flags once per cycle (slot 0), store the eligible org IDs in Redis. Subsequent ticks read from cache instead of scanning the DB. Cache TTL is 2x the cycle length as a safety margin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The fallback in _get_eligible_org_ids rebuilds the cache if it expires. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

500 per slot × 10 slots = 5k max, too low for 17k orgs. 2000 × 10 = 20k covers the current enrollment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wedamija

I'm wondering if CursoredScheduler does what you need here? You can run it like

sentry/src/sentry/integrations/source_code_management/sync_repos.py

Lines 396 to 408 in 7340e27

    
           def scm_repo_sync_beat() -> None: 
        
               scheduler = CursoredScheduler( 
        
                   name="scm_repo_sync", 
        
                   schedule_key="scm-repo-sync-beat", 
        
                   queryset=OrganizationIntegration.objects.filter( 
        
                       integration__provider__in=SCM_SYNC_PROVIDERS, 
        
                       integration__status=ObjectStatus.ACTIVE, 
        
                       status=ObjectStatus.ACTIVE, 
        
                   ), 
        
                   task=sync_repos_for_org, 
        
                   cycle_duration=timedelta(hours=24), 
        
               ) 
        
               scheduler.tick()

It doesn't do the caching of all the orgs in the batch, but we could expand it that way if it's helpful. I think it'd be good to try and genericise this kind of logic, so let me know if you want to try and integrate with it and i'm happy to either work on it or review any of your changes

shashjar

some minor comments & also @wedamija's comment makes sense to me if possible

shashjar · 2026-04-16T20:24:39Z

+    """Read cached eligible org IDs, or rebuild if missing."""
+    cluster = redis_clusters.get("default")
+    cached = cluster.get(ELIGIBLE_ORGS_CACHE_KEY)
+    if cached:


edge case that'll probably never get hit but may want to check not None instead of truthy in case the eligible orgs are cached as no orgs

shashjar · 2026-04-16T20:27:46Z

    if not has_access:
        return

+    projects = list(


nit: project_ids might be a better name

shashjar · 2026-04-16T20:28:29Z

    traces_to_send: list[TraceMetadataWithSpanCount] = [
        t for t in evidence_traces if t.trace_id in unprocessed_ids
-    ][:NUM_TRANSACTIONS_TO_PROCESS]
+    ][:1]


Any chance of this changing in the future? Should we consider leaving it as a constant?

there's a chance tho it's hard to say rn how likely it is, depends on some data we will gather as we LA

shashjar · 2026-04-16T20:29:16Z


-    Returns the allowlist from system options.
-    """
-    return options.get("issue-detection.llm-detection.projects-allowlist")


Should/can we delete the registration for this option?

yes - will do that in future PRs!

shashjar · 2026-04-16T20:33:47Z

+
+    dispatched = 0
+    for org_id in eligible_org_ids:
+        if dispatched >= MAX_ORGS_PER_CYCLE:


should we log/metric when we drop some orgs?

this won't be a problem during the LA/EA, and as we get closer to GA the number of groups will increase to the point where i doubt we will actually hit this, but having a metric could be good

Replace md5 hash bucketing + Redis org cache with the built-in CursoredScheduler framework. It handles cursor-based batching, distributed locking, and cycle metrics out of the box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sentry · 2026-04-16T21:15:20Z

    perf_settings = project.get_option("sentry:performance_issue_settings", default={})
    if not perf_settings.get("ai_issue_detection_enabled", True):


Bug: The task detect_llm_issues_for_org randomly selects one project. If that project has ai_issue_detection_enabled=False, the entire organization is silently skipped for the cycle.
_{Severity: MEDIUM}

Suggested Fix

Instead of selecting one random project and exiting if it's ineligible, iterate through the organization's projects until an eligible one is found. Alternatively, filter the initial project list to only include those with ai_issue_detection_enabled=True before making a random selection. This ensures that organizations with at least one eligible project are always processed.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/tasks/llm_issue_detection/detection.py#L327-L328 Potential issue: The `detect_llm_issues_for_org` task processes an organization by selecting a single random project to check for eligibility. If this randomly chosen project has the `ai_issue_detection_enabled` setting disabled, the function returns early. This causes the entire organization to be silently skipped for the current detection cycle, even if other projects within the same organization have the feature enabled. This behavior is a functional regression from the previous implementation, which processed each eligible project individually, and leads to non-deterministic and reduced feature coverage for multi-project organizations.

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 15, 2026

cursor bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/sentry/tasks/llm_issue_detection/detection.py Outdated

vercel bot deployed to Preview April 15, 2026 17:58 View deployment

roggenkemper and others added 2 commits April 15, 2026 15:56

ref: Move celery schedule and feature flag changes to separate PRs

b12c4d1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

roggenkemper force-pushed the roggenkemper/feat/budget-paced-detection-scheduling branch from 173d79f to b12c4d1 Compare April 15, 2026 19:58

roggenkemper and others added 3 commits April 15, 2026 15:59

ref(issue-detection): Remove comment above dispatch constants

f8a09b3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ref(issue-detection): Simplify org eligibility checks into single con…

5f1e8ae

…dition Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sentry-warden bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/sentry/tasks/llm_issue_detection/detection.py

vercel bot deployed to Preview April 15, 2026 20:02 View deployment

fix(issue-detection): Stop reverting cron schedule change from master

ef4bfd9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/sentry/conf/server.py Outdated

vercel bot deployed to Preview April 15, 2026 20:08 View deployment

roggenkemper and others added 3 commits April 16, 2026 15:06

ref(issue-detection): Remove unnecessary 2x multiplier on cache TTL

c34cc0a

The fallback in _get_eligible_org_ids rebuilds the cache if it expires. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ref(issue-detection): Remove orgs_dispatched metric

4df3c33

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel bot deployed to Preview April 16, 2026 19:10 View deployment

roggenkemper and others added 2 commits April 16, 2026 15:15

fix(issue-detection): Increase MAX_ORGS_PER_CYCLE to 2000

d73f399

500 per slot × 10 slots = 5k max, too low for 17k orgs. 2000 × 10 = 20k covers the current enrollment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

revert: Revert MAX_ORGS_PER_CYCLE change

4d07ece

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel bot deployed to Preview April 16, 2026 19:18 View deployment

fix(issue-detection): Handle both str and bytes from Redis get

bb174ce

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel bot deployed to Preview April 16, 2026 20:16 View deployment

roggenkemper marked this pull request as ready for review April 16, 2026 20:19

roggenkemper requested a review from a team as a code owner April 16, 2026 20:19

sentry bot reviewed Apr 16, 2026

View reviewed changes

Comment thread src/sentry/tasks/llm_issue_detection/detection.py Outdated

Comment thread src/sentry/tasks/llm_issue_detection/detection.py Outdated

Comment thread src/sentry/tasks/llm_issue_detection/detection.py Outdated

wedamija reviewed Apr 16, 2026

View reviewed changes

shashjar reviewed Apr 16, 2026

View reviewed changes

roggenkemper requested a review from a team as a code owner April 16, 2026 21:12

roggenkemper marked this pull request as draft April 16, 2026 21:14

vercel bot deployed to Preview April 16, 2026 21:14 View deployment

sentry bot reviewed Apr 16, 2026

View reviewed changes

roggenkemper removed the request for review from a team April 17, 2026 21:47

	def scm_repo_sync_beat() -> None:
	scheduler = CursoredScheduler(
	name="scm_repo_sync",
	schedule_key="scm-repo-sync-beat",
	queryset=OrganizationIntegration.objects.filter(
	integration__provider__in=SCM_SYNC_PROVIDERS,
	integration__status=ObjectStatus.ACTIVE,
	status=ObjectStatus.ACTIVE,
	),
	task=sync_repos_for_org,
	cycle_duration=timedelta(hours=24),
	)
	scheduler.tick()

		perf_settings = project.get_option("sentry:performance_issue_settings", default={})
		if not perf_settings.get("ai_issue_detection_enabled", True):

Uh oh!

Conversation

roggenkemper commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 15, 2026

Backend Test Failures

Uh oh!

Uh oh!

Uh oh!

roggenkemper commented Apr 15, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wedamija left a comment

Choose a reason for hiding this comment

Uh oh!

shashjar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sentry bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roggenkemper commented Apr 15, 2026 •

edited

Loading