Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locks return without checking whether the lease is still held #11456

Closed
aphyr opened this issue Dec 16, 2019 · 0 comments
Closed

Locks return without checking whether the lease is still held #11456

aphyr opened this issue Dec 16, 2019 · 0 comments

Comments

@aphyr
Copy link

@aphyr aphyr commented Dec 16, 2019

In Jepsen testing, we found that updating shared state protected by a lock could result in lost updates or other anomalies. This was due in part to the fact that distributed locks are intrinsically unsafe, but was exacerbated by an issue where a server failed to check whether a lease was valid after blocking for a lock to be released, and simply told the client they held the lock instead.

As @xiang90 wrote:

It seems the root cause is the failure of maintaining the lease via keepalive (not sure why this happened though...) of process 0. We can know this since the explicit release of the lease failed after the acquire operation.

And there is a "bug" inside etcd server side. While waiting for the previous lock holder to release the lock, the client does not check the current lease status. So when the lock operation returns, the lease might be already expired. Then the etcd client will falsely think it still holds the lock.

We can mitigate the issue by checking the lease status before the return of lock operation. However, it wont completely solve the problem. If the response RPC takes long enough, the lease might still expire in between...

Basically, as we discussed, the lock holder have to checked the sequence number (in etcd the revision of the lock key) and the remaining lease timeout before doing anything...

More concretely, we should check if the owner key is still exist here before return

if len(resp.Kvs) == 0 {

This was addressed by #11408; I'm creating this issue as a reference point for later readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant