New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lock eviction scheduling on migration when lock expired #9842
Conversation
*/ | ||
@RunWith(HazelcastSerialClassRunner.class) | ||
@Category(SlowTest.class) | ||
public class LockBounceMemberTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a useful test and simulates a scenario similar to the one we are experiencing issues with - https://hazelcast-l337.ci.cloudbees.com/view/rollup/job/rollup-lease-lock/. The flip side is that it actually didn't uncover the cause of the issue (lease never expiring) so I'm not sure if we should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename the test to make it clear lock with lease
is being tested? Something like LockWithLeaseBounceMemberTest
, LeaseLockBounceMemberTest
... etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work.
boolean lock(Data key, String caller, long threadId, long referenceId, long leaseTime); | ||
|
||
/** | ||
* Lock a specific key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method marks the lock as transactional. It's the main difference between this and above plain lock()
.
*/ | ||
@RunWith(HazelcastSerialClassRunner.class) | ||
@Category(SlowTest.class) | ||
public class LockBounceMemberTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename the test to make it clear lock with lease
is being tested? Something like LockWithLeaseBounceMemberTest
, LeaseLockBounceMemberTest
... etc
public class LockBounceMemberTest { | ||
private static final int MEMBER_COUNT = 4; | ||
private static final int DRIVER_COUNT = 4; | ||
private static final int LEASE_DURATION_SECONDS = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reducing lease duration to something like 1sec or a few 100ms helps to reveal the issue?
Previously on lock migration the lock was scheduled for eviction with difference between the current cluster clock and the expiration time. If the migration lasted long enough and lock has already expired it could schedule expiration with a negative value. This would cause the lock to be expired after a period which is equal to the time elapsed from the actual expiration time - the more the migration is delayed, the more will the lease be extended. Added some javadoc for lock classes, fixed some generics and simplified a couple of if-statements.
5beb909
to
b478421
Compare
driver.doAssert(); | ||
} | ||
} | ||
}, MILLISECONDS.toSeconds(LEASE_DURATION_MILLIS) + 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think timeout here is too small, it can cause accidental failures. Also note that, assertTrueEventually(...)
runs AssertTask
in 200ms intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but that's the lease duration in seconds (floored) + 10 seconds. Isn't that enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry. You're right. I misinterpreted that.
(What I read in my mind is MILLISECONDS.toSeconds(LEASE_DURATION_MILLIS + 10)
)
Test PASSed. |
Previously on lock migration the lock was scheduled for eviction with difference between the current cluster clock and the expiration time. If the migration lasted long enough and lock has already expired it could schedule expiration with a negative value. This would cause the lock to be expired after a period which is equal to the time elapsed from the actual expiration time - the more the migration is delayed, the more will the lease be extended.
Added some javadoc for lock classes, fixed some generics and simplified a couple of if-statements.