Skip to content

fix(repos): Have repo sync batch up the work#113131

Open
wedamija wants to merge 1 commit intomasterfrom
danf/repo-sync-batch
Open

fix(repos): Have repo sync batch up the work#113131
wedamija wants to merge 1 commit intomasterfrom
danf/repo-sync-batch

Conversation

@wedamija
Copy link
Copy Markdown
Member

We've had a few task timeouts with the sync, caused by extremely large batches. Switching this over to limit batch sizes to 100, and to just fire parallel tasks instead.

This pr just creates the tasks and call them directly rather than scheduling them. This is so that we can be sure the tasks are deployed before we start firing them off. There will be a follow up pr to actually schedule them.

@wedamija wedamija requested a review from a team April 15, 2026 22:55
@wedamija wedamija requested review from a team as code owners April 15, 2026 22:55
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 15, 2026
wedamija added a commit that referenced this pull request Apr 15, 2026
Follow up to #113131. This fires off the tasks rather than calling them directly.
Comment thread src/sentry/integrations/source_code_management/sync_repos.py Outdated
We've had a few task timeouts with the sync, caused by extremely large batches. Switching this over to limit batch sizes to 100, and to just fire parallel tasks instead.

This pr just creates the tasks and call them directly rather than scheduling them. This is so that we can be sure the tasks are deployed before we start firing them off. There will be a follow up pr to actually schedule them.
@wedamija wedamija force-pushed the danf/repo-sync-batch branch from e70900c to a6ce51e Compare April 15, 2026 23:01
wedamija added a commit that referenced this pull request Apr 15, 2026
Follow up to #113131. This fires off the tasks rather than calling them directly.
providers=[provider],
)
restore_set = set(external_ids)
for repo in all_repos:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The refactored repo sync tasks can create duplicate REPO_ENABLED and REPO_DISABLED audit logs on retry because they don't filter repositories by their current status before logging.
Severity: MEDIUM

Suggested Fix

In restore_repos_batch, filter the initial repository query to only fetch repos with status=ObjectStatus.DISABLED. In disable_repos_batch, ensure the audit logging logic only considers repos that were actually transitioned from an ACTIVE state in the current task execution, rather than re-fetching all repos without a status filter.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/sentry/integrations/source_code_management/sync_repos.py#L373

Potential issue: The refactored `restore_repos_batch` and `disable_repos_batch` tasks
are vulnerable to creating duplicate audit log entries upon retry. The original code
performed these operations atomically, but the new, separate tasks can be retried. In
`restore_repos_batch`, the function fetches all repos without filtering by status,
causing already-restored repos to be processed again, generating a duplicate
`REPO_ENABLED` log. Similarly, in `disable_repos_batch`, the audit logging logic fetches
repos without a status filter, leading to duplicate `REPO_DISABLED` logs for
already-disabled repos if a retry occurs. This results in an inaccurate and misleading
audit trail.

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a6ce51e. Configure here.

organization_id=rpc_org.id,
integration_id=integration.id,
providers=[provider],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore_repos_batch missing disabled status filter on query

Medium Severity

restore_repos_batch fetches all repos regardless of status, while the old code only iterated disabled_repos (filtered to ObjectStatus.DISABLED). The get_repositories call here doesn't pass the available status parameter. When these tasks are moved to async execution (the stated follow-up), a repo that transitioned from DISABLED to PENDING_DELETION between the parent task and this batch task would be incorrectly set back to ACTIVE, overriding the deletion. Even synchronously, this needlessly updates already-active repos and emits false REPO_ENABLED audit log entries if duplicate external_id values exist.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a6ce51e. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant