Fix #108372: Unmerge: "Already being unmerged" message appears but unmerg#108858
Fix #108372: Unmerge: "Already being unmerged" message appears but unmerg#108858danielalanbates wants to merge 1 commit intogetsentry:masterfrom
Conversation
If the unmerge Celery task raises an exception (e.g. due to a worker crash, timeout, or transient error), GroupHash records were left permanently in LOCKED_IN_MIGRATION state. Any subsequent unmerge attempt from the UI would find all requested hashes locked and return 409 "Already being unmerged", making it impossible to retry. Wrap the task body in a try/except so that hashes are unlocked (reset to UNLOCKED state) when the task fails, allowing users to retry the unmerge operation. Fixes getsentry#108372
| except Exception: | ||
| # If the task fails for any reason, unlock the hashes so the user can | ||
| # retry the unmerge. Without this, hashes remain in LOCKED_IN_MIGRATION | ||
| # state permanently, causing "Already being unmerged" on every subsequent | ||
| # attempt even though no unmerge is actually in progress. | ||
| unlock_hashes(args.project_id, locked_primary_hashes) | ||
| logger.exception("unmerge.failed", extra=extra) | ||
| raise |
There was a problem hiding this comment.
Bug: The exception handler in the unmerge task fails to call stop_snuba_replacement(), leaving Snuba unmerge transactions open if an error occurs after they are started.
Severity: MEDIUM
Suggested Fix
Update the except block to iterate through the destinations in the task arguments and call stop_snuba_replacement() for any existing eventstream_state objects. This will ensure that Snuba unmerge transactions are properly closed, mirroring the cleanup logic that already exists for locked hashes.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/tasks/unmerge.py#L619-L626
Potential issue: If an exception occurs in the `unmerge` task after `migrate_events()`
has initiated a Snuba replacement via `start_snuba_replacement()`, the `except` block at
lines 619-626 will execute. This handler correctly unlocks primary hashes but fails to
clean up the started Snuba replacement by calling `stop_snuba_replacement()`. This
leaves an incomplete `unmerge` transaction in Snuba. This can happen if functions like
`repair_denormalizations()` or the `unmerge.delay()` call fail. This creates an
asymmetrical resource management pattern, where resources are allocated in the `try`
block but not deallocated in the corresponding `except` block.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| source_events.append(event) | ||
|
|
||
| source_fields_reset = isinstance(args, SuccessiveUnmergeArgs) and args.source_fields_reset | ||
| try: |
There was a problem hiding this comment.
truncate_denormalizations failure leaves hashes permanently locked
High Severity
The truncate_denormalizations call on the InitialUnmergeArgs path sits between lock_hashes (which sets hashes to LOCKED_IN_MIGRATION) and the try block. If truncate_denormalizations raises, the exception handler never runs and the hashes remain permanently locked — the exact scenario this PR aims to fix. The try block needs to also cover truncate_denormalizations.
Additional Locations (1)
| ) | ||
| if eventstream_state: | ||
| args.replacement.stop_snuba_replacement(eventstream_state) | ||
| return |
There was a problem hiding this comment.
Double unlock on completion path creates race condition
Low Severity
When the not events completion path runs, unlock_hashes is called at line 544. If stop_snuba_replacement subsequently throws, the except handler calls unlock_hashes again. While unlock_hashes is idempotent (it filters by LOCKED_IN_MIGRATION state), there's a race window: another unmerge could lock the same hashes between the two calls, and the second unlock_hashes would incorrectly unlock the new operation's hashes.


Fixes #108372
Summary
This PR fixes: Unmerge: "Already being unmerged" message appears but unmerge never completes
Changes
Testing
Please review the changes carefully. The fix was verified against the existing test suite.
This PR was created with the assistance of Claude Sonnet 4.6 by Anthropic | effort: low. Happy to make any adjustments!
By submitting this pull request, I confirm that my contribution is made under the terms of the project's license (contributor license agreement).