Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lock eviction scheduling on migration when lock expired #9842

Merged
merged 1 commit into from Feb 9, 2017

Conversation

mmedenjak
Copy link
Contributor

Previously on lock migration the lock was scheduled for eviction with difference between the current cluster clock and the expiration time. If the migration lasted long enough and lock has already expired it could schedule expiration with a negative value. This would cause the lock to be expired after a period which is equal to the time elapsed from the actual expiration time - the more the migration is delayed, the more will the lease be extended.

Added some javadoc for lock classes, fixed some generics and simplified a couple of if-statements.

@mmedenjak mmedenjak added this to the 3.8 milestone Feb 9, 2017
@mmedenjak mmedenjak self-assigned this Feb 9, 2017
*/
@RunWith(HazelcastSerialClassRunner.class)
@Category(SlowTest.class)
public class LockBounceMemberTest {
Copy link
Contributor Author

@mmedenjak mmedenjak Feb 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful test and simulates a scenario similar to the one we are experiencing issues with - https://hazelcast-l337.ci.cloudbees.com/view/rollup/job/rollup-lease-lock/. The flip side is that it actually didn't uncover the cause of the issue (lease never expiring) so I'm not sure if we should keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename the test to make it clear lock with lease is being tested? Something like LockWithLeaseBounceMemberTest, LeaseLockBounceMemberTest ... etc

@mmedenjak mmedenjak requested a review from mdogan February 9, 2017 08:44
Copy link
Contributor

@mdogan mdogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work.

boolean lock(Data key, String caller, long threadId, long referenceId, long leaseTime);

/**
* Lock a specific key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method marks the lock as transactional. It's the main difference between this and above plain lock().

*/
@RunWith(HazelcastSerialClassRunner.class)
@Category(SlowTest.class)
public class LockBounceMemberTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename the test to make it clear lock with lease is being tested? Something like LockWithLeaseBounceMemberTest, LeaseLockBounceMemberTest ... etc

public class LockBounceMemberTest {
private static final int MEMBER_COUNT = 4;
private static final int DRIVER_COUNT = 4;
private static final int LEASE_DURATION_SECONDS = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reducing lease duration to something like 1sec or a few 100ms helps to reveal the issue?

Previously on lock migration the lock was scheduled for eviction with difference between the current cluster clock and the expiration time. If the migration lasted long enough and lock has already expired it could schedule expiration with a negative value. This would cause the lock to be expired after a period which is equal to the time elapsed from the actual expiration time - the more the migration is delayed, the more will the lease be extended.

Added some javadoc for lock classes, fixed some generics and simplified a couple of if-statements.
driver.doAssert();
}
}
}, MILLISECONDS.toSeconds(LEASE_DURATION_MILLIS) + 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think timeout here is too small, it can cause accidental failures. Also note that, assertTrueEventually(...) runs AssertTask in 200ms intervals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but that's the lease duration in seconds (floored) + 10 seconds. Isn't that enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. You're right. I misinterpreted that.
(What I read in my mind is MILLISECONDS.toSeconds(LEASE_DURATION_MILLIS + 10))

@devOpsHazelcast
Copy link
Collaborator

Test PASSed.

@mdogan mdogan merged commit a15ce2b into hazelcast:master Feb 9, 2017
@mmedenjak mmedenjak deleted the lock-lease-migration-fix branch February 9, 2017 11:49
@mmedenjak mmedenjak added the Source: Internal PR or issue was opened by an employee label Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Source: Internal PR or issue was opened by an employee Team: Core Type: Defect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants