Skip to content

Fix #108372: Unmerge: "Already being unmerged" message appears but unmerg#108858

Open
danielalanbates wants to merge 1 commit intogetsentry:masterfrom
danielalanbates:fix/issue-108372
Open

Fix #108372: Unmerge: "Already being unmerged" message appears but unmerg#108858
danielalanbates wants to merge 1 commit intogetsentry:masterfrom
danielalanbates:fix/issue-108372

Conversation

@danielalanbates
Copy link

Fixes #108372

Summary

This PR fixes: Unmerge: "Already being unmerged" message appears but unmerge never completes

Changes

src/sentry/tasks/unmerge.py | 177 +++++++++++++++++++++++---------------------
 1 file changed, 93 insertions(+), 84 deletions(-)

Testing

Please review the changes carefully. The fix was verified against the existing test suite.


This PR was created with the assistance of Claude Sonnet 4.6 by Anthropic | effort: low. Happy to make any adjustments!

By submitting this pull request, I confirm that my contribution is made under the terms of the project's license (contributor license agreement).

If the unmerge Celery task raises an exception (e.g. due to a worker
crash, timeout, or transient error), GroupHash records were left
permanently in LOCKED_IN_MIGRATION state. Any subsequent unmerge
attempt from the UI would find all requested hashes locked and return
409 "Already being unmerged", making it impossible to retry.

Wrap the task body in a try/except so that hashes are unlocked
(reset to UNLOCKED state) when the task fails, allowing users to
retry the unmerge operation.

Fixes getsentry#108372
@danielalanbates danielalanbates requested a review from a team as a code owner February 22, 2026 02:31
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 22, 2026
Comment on lines +619 to +626
except Exception:
# If the task fails for any reason, unlock the hashes so the user can
# retry the unmerge. Without this, hashes remain in LOCKED_IN_MIGRATION
# state permanently, causing "Already being unmerged" on every subsequent
# attempt even though no unmerge is actually in progress.
unlock_hashes(args.project_id, locked_primary_hashes)
logger.exception("unmerge.failed", extra=extra)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The exception handler in the unmerge task fails to call stop_snuba_replacement(), leaving Snuba unmerge transactions open if an error occurs after they are started.
Severity: MEDIUM

Suggested Fix

Update the except block to iterate through the destinations in the task arguments and call stop_snuba_replacement() for any existing eventstream_state objects. This will ensure that Snuba unmerge transactions are properly closed, mirroring the cleanup logic that already exists for locked hashes.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/tasks/unmerge.py#L619-L626

Potential issue: If an exception occurs in the `unmerge` task after `migrate_events()`
has initiated a Snuba replacement via `start_snuba_replacement()`, the `except` block at
lines 619-626 will execute. This handler correctly unlocks primary hashes but fails to
clean up the started Snuba replacement by calling `stop_snuba_replacement()`. This
leaves an incomplete `unmerge` transaction in Snuba. This can happen if functions like
`repair_denormalizations()` or the `unmerge.delay()` call fail. This creates an
asymmetrical resource management pattern, where resources are allocated in the `try`
block but not deallocated in the corresponding `except` block.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

source_events.append(event)

source_fields_reset = isinstance(args, SuccessiveUnmergeArgs) and args.source_fields_reset
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate_denormalizations failure leaves hashes permanently locked

High Severity

The truncate_denormalizations call on the InitialUnmergeArgs path sits between lock_hashes (which sets hashes to LOCKED_IN_MIGRATION) and the try block. If truncate_denormalizations raises, the exception handler never runs and the hashes remain permanently locked — the exact scenario this PR aims to fix. The try block needs to also cover truncate_denormalizations.

Additional Locations (1)

Fix in Cursor Fix in Web

)
if eventstream_state:
args.replacement.stop_snuba_replacement(eventstream_state)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double unlock on completion path creates race condition

Low Severity

When the not events completion path runs, unlock_hashes is called at line 544. If stop_snuba_replacement subsequently throws, the except handler calls unlock_hashes again. While unlock_hashes is idempotent (it filters by LOCKED_IN_MIGRATION state), there's a race window: another unmerge could lock the same hashes between the two calls, and the second unlock_hashes would incorrectly unlock the new operation's hashes.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unmerge: "Already being unmerged" message appears but unmerge never completes

1 participant