Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSM: fix locking on restart #670

Merged
merged 1 commit into from
Sep 22, 2022
Merged

Conversation

mkemel
Copy link
Member

@mkemel mkemel commented Sep 22, 2022

This patch fixes an issue when the engine is restarted during LSM, the command gets stuck.

The cause of this issue was that EngineLocks were passed to LiveMigrateDiskCommand, and from there to CreateSnapshotCommand in the executeCommand() stage. There the locks are released, and reacquired later on before snapshot remove phase. If the engine was restarted during CreateSnapshot phase - both the CreateSnapshot and LiveMigrateDisk commands resumed, running reacquireLocks, and then before the Snapshot Remove phase, LiveMigrateDiskCommand attempted to reacquire locks again, getting stuck.

In this patch we override reacquireLocks method in LiveMigrateDisk command, doing nothing, and then the locks on Disk and VM are acquired separately before the snapshot remove phase

Bug-Url: https://bugzilla.redhat.com/2110186

This patch fixes an issue when the engine is restarted during LSM,
the command gets stuck.

The cause of this issue was that EngineLocks were passed to
LiveMigrateDiskCommand, and from there to CreateSnapshotCommand in
the executeCommand() stage. There the locks are released, and
reacquired later on before snapshot remove phase. If the engine was
restarted during CreateSnapshot phase - both the CreateSnapshot and
LiveMigrateDisk commands resumed, running reacquireLocks, and then
before the Snapshot Remove phase, LiveMigrateDiskCommand attempted
to reacquire locks again, getting stuck.

In this patch we override reacquireLocks method in LiveMigrateDisk
command, doing nothing, and then the locks on Disk and VM are
acquired separately before the snapshot remove phase

Bug-Url: https://bugzilla.redhat.com/2110186
@ahadas
Copy link
Member

ahadas commented Sep 22, 2022

/ost

@ahadas
Copy link
Member

ahadas commented Sep 22, 2022

unrelated failure in OST

@ahadas ahadas merged commit 0455a66 into oVirt:master Sep 22, 2022

private EngineLock createEngineLockForSnapshotRemove() {
return new EngineLock(
getExclusiveLocksForSnapshotRemove(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right, we don't need the disk lock, what we actually need is an exclusive lock on the VM and a shared lock is taken.

(To be clear, this was an existing problem introduced in mkemel@b5d5499)

@mkemel mkemel deleted the 2110186_lsm_stuck_fix branch November 27, 2022 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants