Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent migration operations running before previous finalization completes #14834

Merged

Conversation

Projects
None yet
2 participants
@mdogan
Copy link
Member

commented Apr 3, 2019

Normally finalization is scheduled when either PublishCompletedMigrationsOperation
or a migration operation is executed.

But in a small window of time, a MigrationOperation can come and start just
after PublishCompletedMigrationsOperation starts executing.

In this case, if completed migrations include a previous migration which
belongs to the same partition with MigrationOperation and local member
was source of that migration and if MigrationOperation starts its execution
before the FinalizeMigrationOperation is put into the partition operation
threads queue, then FinalizeMigrationOperation can run after the MigrationOperation
and remove data replicated by it.

To fix that, MigrationOperation is retried if it cannot set migrating flag
of a partition. migrating flag is set by migration operations and cleared by
FinalizeMigrationOperation. So, if migrating flag is set while MigrationOperation
is executed, that means former FinalizeMigrationOperation is not executed yet.

Fixes #14809

Backport of #14832

Prevent migration operations running before previous finalization com…
…pletes

Normally finalization is scheduled when either `PublishCompletedMigrationsOperation`
or a migration operation is executed.

But in a small window of time, a `MigrationOperation` can come and start just
after `PublishCompletedMigrationsOperation` starts executing.

In this case, if completed migrations include a previous migration which
belongs to the same partition with `MigrationOperation` and local member
was source of that migration and if `MigrationOperation` starts its execution
before the `FinalizeMigrationOperation` is put into the partition operation
threads queue, then `FinalizeMigrationOperation` can run after the `MigrationOperation`
and remove data replicated by it.

To fix that, `MigrationOperation` is retried if it cannot set `migrating` flag
of a partition. `migrating` flag is set by migration operations and cleared by
`FinalizeMigrationOperation`. So, if `migrating` flag is set while `MigrationOperation`
is executed, that means former `FinalizeMigrationOperation` is not executed yet.

(cherry picked from commit 5060616)

@mdogan mdogan force-pushed the mdogan:migration-finalization-race-fix-z branch from 90f9555 to 3969c8d Apr 4, 2019

@mdogan mdogan merged commit b718ba0 into hazelcast:maintenance-3.x Apr 4, 2019

1 check passed

default Test PASSed.
Details

@mdogan mdogan deleted the mdogan:migration-finalization-race-fix-z branch Apr 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.