Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent migration operations running before previous finalization completes #14832

Merged
merged 1 commit into from Apr 4, 2019

Conversation

@mdogan
Copy link
Member

mdogan commented Apr 3, 2019

Normally finalization is scheduled when either PublishCompletedMigrationsOperation
or a migration operation is executed.

But in a small window of time, a MigrationOperation can come and start just
after PublishCompletedMigrationsOperation starts executing.

In this case, if completed migrations include a previous migration which
belongs to the same partition with MigrationOperation and local member
was source of that migration and if MigrationOperation starts its execution
before the FinalizeMigrationOperation is put into the partition operation
threads queue, then FinalizeMigrationOperation can run after the MigrationOperation
and remove data replicated by it.

To fix that, MigrationOperation is retried if it cannot set migrating flag
of a partition. migrating flag is set by migration operations and cleared by
FinalizeMigrationOperation. So, if migrating flag is set while MigrationOperation
is executed, that means former FinalizeMigrationOperation is not executed yet.

Fixes #14809

@metanet
metanet approved these changes Apr 3, 2019
…pletes

Normally finalization is scheduled when either `PublishCompletedMigrationsOperation`
or a migration operation is executed.

But in a small window of time, a `MigrationOperation` can come and start just
after `PublishCompletedMigrationsOperation` starts executing.

In this case, if completed migrations include a previous migration which
belongs to the same partition with `MigrationOperation` and local member
was source of that migration and if `MigrationOperation` starts its execution
before the `FinalizeMigrationOperation` is put into the partition operation
threads queue, then `FinalizeMigrationOperation` can run after the `MigrationOperation`
and remove data replicated by it.

To fix that, `MigrationOperation` is retried if it cannot set `migrating` flag
of a partition. `migrating` flag is set by migration operations and cleared by
`FinalizeMigrationOperation`. So, if `migrating` flag is set while `MigrationOperation`
is executed, that means former `FinalizeMigrationOperation` is not executed yet.
@mdogan mdogan force-pushed the mdogan:migration-finalization-race-fix branch from 5c46a66 to 5060616 Apr 4, 2019
Copy link
Contributor

mmedenjak left a comment

💯

@mdogan mdogan merged commit 44e5e2c into hazelcast:master Apr 4, 2019
1 check passed
1 check passed
default Test PASSed.
Details
@mdogan mdogan deleted the mdogan:migration-finalization-race-fix branch Apr 4, 2019
@mmedenjak mmedenjak added this to the 4.0 milestone Apr 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.