Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock with read/write lock recipe #649

Closed
westphahl opened this issue Jul 30, 2021 · 0 comments · Fixed by #650
Closed

Deadlock with read/write lock recipe #649

westphahl opened this issue Jul 30, 2021 · 0 comments · Fixed by #650

Comments

@westphahl
Copy link
Contributor

westphahl commented Jul 30, 2021

Expected Behavior

Read/write lock recipe never deadlocks.

Actual Behavior

Read/write lock recipe deadlocks under certain conditions. The deadlock seems to happen as follows:

  1. write lock is acquired
  2. another component tries to acquire the read lock and blocks (watching the write lock as predecessor)
  3. yet another component tries to acquire the write lock and also blocks (watching the read lock as predecessor)
  4. the first write lock is released which wakes up the read lock
  5. the read lock starts watching the second write lock as predecessor 💥

We now have a deadlock with the read lock and the second write lock waiting for each other to be "released".

There can be a variation of this when the second write lock's znode is created before the read lock calls:

predecessor = self._get_predecessor(node)

This will lead to the same deadlock condition with the read and the second write lock watching each other independent of the currently acquired write lock.

Snippet to Reproduce the Problem

import logging
logging.basicConfig(format="%(asctime)s %(levelname)s:%(message)s", level=logging.DEBUG)

import threading
import time

from kazoo.client import KazooClient


k = KazooClient()
k.start()


lock_path = "/lock"
writer_running = threading.Event()
reader_running = threading.Event()


def writer():
    logging.info("Writer running")
    writer_running.set()
    with k.WriteLock(lock_path):
        logging.info("Got write lock")


def reader():
    logging.info("Reader running")
    reader_running.set()
    with k.ReadLock(lock_path):
        logging.info("Got read lock")


writer_thread = threading.Thread(target=writer)
reader_thread = threading.Thread(target=reader)

with k.WriteLock(lock_path):
    logging.info("Starting reader")
    reader_thread.start()
    reader_running.wait()
    logging.info("Starting writer")
    writer_thread.start()
    writer_running.wait()
    logging.info("Waiting for locks to block")
    time.sleep(5)

logging.info("Outer lock released")
logging.info("Joining reader/writer threads ...")
reader_thread.join()
writer_thread.join()
logging.error("This will never show up")

Logs with logging in DEBUG mode

2021-07-30 12:06:08,586 INFO:Connecting to localhost(127.0.0.1):2281, use_ssl: True
2021-07-30 12:06:08,599 DEBUG:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=10000, session_id=0, passwd=b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
2021-07-30 12:06:08,600 INFO:Zookeeper connection established, state: CONNECTED
2021-07-30 12:06:08,600 DEBUG:Sending request(xid=1): Exists(path='/lock', watcher=None)
2021-07-30 12:06:08,601 DEBUG:Received response(xid=1): ZnodeStat(czxid=555413, mzxid=555413, ctime=1627635134011, mtime=1627635134011, version=0, cversion=96, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=0, pzxid=555531)
2021-07-30 12:06:08,601 DEBUG:Sending request(xid=2): Create(path='/lock/2f779b4cbe444c31abb4c551f9f5e9c0__lock__', data=b'', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=3)
2021-07-30 12:06:08,601 DEBUG:Received response(xid=2): '/lock/2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048'
2021-07-30 12:06:08,602 DEBUG:Sending request(xid=3): GetChildren(path='/lock', watcher=None)
2021-07-30 12:06:08,602 DEBUG:Received response(xid=3): ['2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048']
2021-07-30 12:06:08,602 INFO:Starting reader
2021-07-30 12:06:08,602 INFO:Reader running
2021-07-30 12:06:08,603 INFO:Starting writer
2021-07-30 12:06:08,603 DEBUG:Sending request(xid=4): Exists(path='/lock', watcher=None)
2021-07-30 12:06:08,603 INFO:Writer running
2021-07-30 12:06:08,603 INFO:Waiting for locks to block
2021-07-30 12:06:08,603 DEBUG:Sending request(xid=5): Exists(path='/lock', watcher=None)
2021-07-30 12:06:08,603 DEBUG:Received response(xid=4): ZnodeStat(czxid=555413, mzxid=555413, ctime=1627635134011, mtime=1627635134011, version=0, cversion=97, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=1, pzxid=555533)
2021-07-30 12:06:08,604 DEBUG:Received response(xid=5): ZnodeStat(czxid=555413, mzxid=555413, ctime=1627635134011, mtime=1627635134011, version=0, cversion=97, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=1, pzxid=555533)
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=6): Create(path='/lock/1e0d745a23fb459f918b4d247151525a__rlock__', data=b'', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=3)
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=7): Create(path='/lock/ed26a149ee2947d9bc18df4df51359f4__lock__', data=b'', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=3)
2021-07-30 12:06:08,604 DEBUG:Received response(xid=6): '/lock/1e0d745a23fb459f918b4d247151525a__rlock__0000000049'
2021-07-30 12:06:08,604 DEBUG:Received response(xid=7): '/lock/ed26a149ee2947d9bc18df4df51359f4__lock__0000000050'
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=8): GetChildren(path='/lock', watcher=None)
2021-07-30 12:06:08,604 DEBUG:Sending request(xid=9): GetChildren(path='/lock', watcher=None)
2021-07-30 12:06:08,605 DEBUG:Received response(xid=8): ['1e0d745a23fb459f918b4d247151525a__rlock__0000000049', 'ed26a149ee2947d9bc18df4df51359f4__lock__0000000050', '2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048']
2021-07-30 12:06:08,605 DEBUG:Received response(xid=9): ['1e0d745a23fb459f918b4d247151525a__rlock__0000000049', 'ed26a149ee2947d9bc18df4df51359f4__lock__0000000050', '2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048']
2021-07-30 12:06:08,605 DEBUG:Sending request(xid=10): GetData(path='/lock/ed26a149ee2947d9bc18df4df51359f4__lock__0000000050', watcher=<bound method Lock._watch_predecessor of <kazoo.recipe.lock.ReadLock object at 0x7f4f04a00c10>>)
2021-07-30 12:06:08,605 DEBUG:Sending request(xid=11): GetData(path='/lock/1e0d745a23fb459f918b4d247151525a__rlock__0000000049', watcher=<bound method Lock._watch_predecessor of <kazoo.recipe.lock.WriteLock object at 0x7f4f0405c250>>)
2021-07-30 12:06:08,605 DEBUG:Received response(xid=10): (b'', ZnodeStat(czxid=555535, mzxid=555535, ctime=1627639568604, mtime=1627639568604, version=0, cversion=0, aversion=0, ephemeralOwner=72060594463974197, dataLength=0, numChildren=0, pzxid=555535))
2021-07-30 12:06:08,605 DEBUG:Received response(xid=11): (b'', ZnodeStat(czxid=555534, mzxid=555534, ctime=1627639568604, mtime=1627639568604, version=0, cversion=0, aversion=0, ephemeralOwner=72060594463974197, dataLength=0, numChildren=0, pzxid=555534))
2021-07-30 12:06:13,607 DEBUG:Sending request(xid=12): Delete(path='/lock/2f779b4cbe444c31abb4c551f9f5e9c0__lock__0000000048', version=-1)
2021-07-30 12:06:13,609 DEBUG:Received response(xid=12): True
2021-07-30 12:06:13,610 INFO:Outer lock released
2021-07-30 12:06:13,610 INFO:Joining reader/writer threads ...

Specifications

  • Kazoo version: 2.8.0
  • Result of pip list command:
Package    Version
---------- -------
kazoo      2.8.0
pip        21.1.3
setuptools 56.0.0
six        1.16.0
  • Zookeeper version: 3.6.2 (Docker)
  • Zookeeper configuration: -
  • Python version: Python 3.9.6
  • OS: Linux 5.12.15-arch1-1 x86_64
westphahl added a commit to westphahl/kazoo that referenced this issue Jul 31, 2021
The lock must only consider contenders with a sequence number lower than
it's own sequence number as also stated in the Zookeeper recipe
description for shared locks[0].

This wasn't working correctly as the ReadLock also considered WriteLocks
with a higher sequence number as contenders. This can lead to a deadlock
as described in python-zk#649.

[0]: https://zookeeper.apache.org/doc/r3.7.0/recipes.html#Shared+Locks

Fixes: python-zk#649
westphahl added a commit to westphahl/kazoo that referenced this issue Aug 2, 2021
The lock must only consider contenders with a sequence number lower than
it's own sequence number as also stated in the Zookeeper recipe
description for shared locks[0].

This wasn't working correctly as the ReadLock also considered WriteLocks
with a higher sequence number as contenders. This can lead to a deadlock
as described in python-zk#649.

[0]: https://zookeeper.apache.org/doc/r3.7.0/recipes.html#Shared+Locks

Closes python-zk#649
westphahl added a commit to westphahl/kazoo that referenced this issue Aug 2, 2021
The lock must only consider contenders with a sequence number lower than
it's own sequence number as also stated in the Zookeeper recipe
description for shared locks[0].

This wasn't working correctly as the ReadLock also considered WriteLocks
with a higher sequence number as contenders. This can lead to a deadlock
as described in python-zk#649.

[0]: https://zookeeper.apache.org/doc/r3.7.0/recipes.html#Shared+Locks

Closes python-zk#649
westphahl added a commit to westphahl/kazoo that referenced this issue Feb 1, 2022
The lock must only consider contenders with a sequence number lower than
it's own sequence number as also stated in the Zookeeper recipe
description for shared locks[0].

This wasn't working correctly as the ReadLock also considered WriteLocks
with a higher sequence number as contenders. This can lead to a deadlock
as described in python-zk#649.

[0]: https://zookeeper.apache.org/doc/r3.7.0/recipes.html#Shared+Locks

Closes python-zk#649
StephenSorriaux pushed a commit to westphahl/kazoo that referenced this issue Feb 3, 2022
The lock must only consider contenders with a sequence number lower than
it's own sequence number as also stated in the Zookeeper recipe
description for shared locks[0].

This wasn't working correctly as the ReadLock also considered WriteLocks
with a higher sequence number as contenders. This can lead to a deadlock
as described in python-zk#649.

[0]: https://zookeeper.apache.org/doc/r3.7.0/recipes.html#Shared+Locks

Closes python-zk#649
StephenSorriaux pushed a commit that referenced this issue Feb 7, 2022
The lock must only consider contenders with a sequence number lower than
it's own sequence number as also stated in the Zookeeper recipe
description for shared locks[0].

This wasn't working correctly as the ReadLock also considered WriteLocks
with a higher sequence number as contenders. This can lead to a deadlock
as described in #649.

[0]: https://zookeeper.apache.org/doc/r3.7.0/recipes.html#Shared+Locks

Closes #649
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant