Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition in TreeCache leads to wrong sequence of cache events #645

Open
westphahl opened this issue Jun 18, 2021 · 0 comments

Comments

@westphahl
Copy link
Contributor

westphahl commented Jun 18, 2021

There seems to be a race condition in the implementation of the TreeCache recipe where a wrong sequence of TreeEvents is delivered to the cache listener.

Expected Behavior

The TreeEvents are delivered in order and with the correct event type and data.

Actual Behavior

An update of a node immediately followed by a delete of that node can lead to a wrong sequence of TreeEvents.

2021-06-18 08:43:06,973 DEBUG:TEST: tree event: (2, ('/root/branch', b'middle', ZnodeStat(czxid=14010, mzxid=14012, ctime=1623998586959, mtime=1623998586967, version=2, cversion=0, aversion=0, ephemeralOwner=0, dataLength=6, numChildren=0, pzxid=14010)))
2021-06-18 08:43:06,973 DEBUG:TEST: tree event: (0, ('/root/branch', b'end', ZnodeStat(czxid=14010, mzxid=14013, ctime=1623998586959, mtime=1623998586969, version=3, cversion=0, aversion=0, ephemeralOwner=0, dataLength=3, numChildren=0, pzxid=14010)))

The race seems to happen as follows:

  1. node updated event is processed by TreeNode._process_watch() and triggers an async get.
  2. async get retrieves the data
  3. node is deleted
  4. node deleted event is processed by TreeNode._process_watch() and publishes a TreeEvent.NODE_DELETED event with the current (old) data/version and sets the data to None + state to STATE_DEAD
  5. async get result is processed by TreeNode_process_result() and publishes a TreeEvent.NODE_ADDED event with the data that was already deleted.

The NODE_ADDED event seems to be created because the state was set to STATE_DEAD already.

Snippet to Reproduce the Problem

The below script probably needs to be run multiple times in order to observe the described issue.

import logging
logging.basicConfig(format="%(asctime)s %(levelname)s:%(message)s", level=logging.DEBUG)

import sys
import time

from kazoo.client import KazooClient
from kazoo.recipe.cache import TreeCache, TreeEvent


k = KazooClient()
k.start()


def _watcher(tree_event):
    logging.debug("TEST: tree event: %s", tree_event)
    if (tree_event.event_type == TreeEvent.NODE_ADDED
        and tree_event.event_data.data == b"end"):
        logging.error("TEST: node added event with deleted data")



t = TreeCache(k, "/root")
t.listen(_watcher)
t.start()

b = "/root/branch"
try:
    for i in range(10):
        k.create(b, b"create")
        k.set(b, b"start")
        k.set(b, b"middle")
        k.set(b, b"end")
        k.delete(b)
        time.sleep(1)
finally:
    t.close()
    k.stop()

Logs with logging in DEBUG mode

2021-06-18 08:43:06,958 DEBUG:Sending request(xid=14): Create(path='/root/branch', data=b'create', acl=[ACL(perms=31, acl_list=['ALL'], id=Id(scheme='world', id='anyone'))], flags=0)
2021-06-18 08:43:06,960 DEBUG:Received EVENT: Watch(type=4, state=3, path='/root')
2021-06-18 08:43:06,961 DEBUG:process_watch: WatchedEvent(type='CHILD', state='CONNECTED', path='/root')
2021-06-18 08:43:06,962 DEBUG:Received response(xid=14): '/root/branch'
2021-06-18 08:43:06,962 DEBUG:Sending request(xid=15): GetChildren(path='/root', watcher=<bound method TreeNode._process_watch of <kazoo.recipe.cache.TreeNode object at 0x7fd805553ee0>>)
2021-06-18 08:43:06,963 DEBUG:Sending request(xid=16): SetData(path='/root/branch', data=b'start', version=-1)
2021-06-18 08:43:06,964 DEBUG:Received response(xid=15): ['branch']
2021-06-18 08:43:06,965 DEBUG:Received response(xid=16): ZnodeStat(czxid=14010, mzxid=14011, ctime=1623998586959, mtime=1623998586964, version=1, cversion=0, aversion=0, ephemeralOwner=0, dataLength=5, numChildren=0, pzxid=14010)
2021-06-18 08:43:06,965 DEBUG:process_result: get_children /root
2021-06-18 08:43:06,966 DEBUG:Sending request(xid=17): SetData(path='/root/branch', data=b'middle', version=-1)
2021-06-18 08:43:06,967 DEBUG:Sending request(xid=18): GetData(path='/root/branch', watcher=<bound method TreeNode._process_watch of <kazoo.recipe.cache.TreeNode object at 0x7fd8044ec160>>)
2021-06-18 08:43:06,967 DEBUG:Sending request(xid=19): GetChildren(path='/root/branch', watcher=<bound method TreeNode._process_watch of <kazoo.recipe.cache.TreeNode object at 0x7fd8044ec160>>)
2021-06-18 08:43:06,968 DEBUG:Received response(xid=17): ZnodeStat(czxid=14010, mzxid=14012, ctime=1623998586959, mtime=1623998586967, version=2, cversion=0, aversion=0, ephemeralOwner=0, dataLength=6, numChildren=0, pzxid=14010)
2021-06-18 08:43:06,969 DEBUG:Received response(xid=18): (b'middle', ZnodeStat(czxid=14010, mzxid=14012, ctime=1623998586959, mtime=1623998586967, version=2, cversion=0, aversion=0, ephemeralOwner=0, dataLength=6, numChildren=0, pzxid=14010))
2021-06-18 08:43:06,969 DEBUG:process_result: get /root/branch
2021-06-18 08:43:06,969 DEBUG:public event: (0, ('/root/branch', b'middle', ZnodeStat(czxid=14010, mzxid=14012, ctime=1623998586959, mtime=1623998586967, version=2, cversion=0, aversion=0, ephemeralOwner=0, dataLength=6, numChildren=0, pzxid=14010)))
2021-06-18 08:43:06,969 DEBUG:TEST: tree event: (0, ('/root/branch', b'middle', ZnodeStat(czxid=14010, mzxid=14012, ctime=1623998586959, mtime=1623998586967, version=2, cversion=0, aversion=0, ephemeralOwner=0, dataLength=6, numChildren=0, pzxid=14010)))
2021-06-18 08:43:06,969 DEBUG:Received response(xid=19): []
2021-06-18 08:43:06,969 DEBUG:Sending request(xid=20): SetData(path='/root/branch', data=b'end', version=-1)
2021-06-18 08:43:06,969 DEBUG:process_result: get_children /root/branch
2021-06-18 08:43:06,970 DEBUG:Received EVENT: Watch(type=3, state=3, path='/root/branch')
2021-06-18 08:43:06,970 DEBUG:process_watch: WatchedEvent(type='CHANGED', state='CONNECTED', path='/root/branch')
2021-06-18 08:43:06,970 DEBUG:Received response(xid=20): ZnodeStat(czxid=14010, mzxid=14013, ctime=1623998586959, mtime=1623998586969, version=3, cversion=0, aversion=0, ephemeralOwner=0, dataLength=3, numChildren=0, pzxid=14010)
2021-06-18 08:43:06,970 DEBUG:Sending request(xid=21): GetData(path='/root/branch', watcher=<bound method TreeNode._process_watch of <kazoo.recipe.cache.TreeNode object at 0x7fd8044ec160>>)
2021-06-18 08:43:06,970 DEBUG:Sending request(xid=22): Delete(path='/root/branch', version=-1)
2021-06-18 08:43:06,970 DEBUG:Received response(xid=21): (b'end', ZnodeStat(czxid=14010, mzxid=14013, ctime=1623998586959, mtime=1623998586969, version=3, cversion=0, aversion=0, ephemeralOwner=0, dataLength=3, numChildren=0, pzxid=14010))
2021-06-18 08:43:06,971 DEBUG:Received EVENT: Watch(type=2, state=3, path='/root/branch')
2021-06-18 08:43:06,973 DEBUG:process_watch: WatchedEvent(type='DELETED', state='CONNECTED', path='/root/branch')
2021-06-18 08:43:06,973 DEBUG:public event: (2, ('/root/branch', b'middle', ZnodeStat(czxid=14010, mzxid=14012, ctime=1623998586959, mtime=1623998586967, version=2, cversion=0, aversion=0, ephemeralOwner=0, dataLength=6, numChildren=0, pzxid=14010)))
2021-06-18 08:43:06,973 DEBUG:process_watch: WatchedEvent(type='DELETED', state='CONNECTED', path='/root/branch')
2021-06-18 08:43:06,973 DEBUG:process_result: get /root/branch
2021-06-18 08:43:06,973 DEBUG:public event: (0, ('/root/branch', b'end', ZnodeStat(czxid=14010, mzxid=14013, ctime=1623998586959, mtime=1623998586969, version=3, cversion=0, aversion=0, ephemeralOwner=0, dataLength=3, numChildren=0, pzxid=14010)))
2021-06-18 08:43:06,973 DEBUG:TEST: tree event: (2, ('/root/branch', b'middle', ZnodeStat(czxid=14010, mzxid=14012, ctime=1623998586959, mtime=1623998586967, version=2, cversion=0, aversion=0, ephemeralOwner=0, dataLength=6, numChildren=0, pzxid=14010)))
2021-06-18 08:43:06,973 DEBUG:TEST: tree event: (0, ('/root/branch', b'end', ZnodeStat(czxid=14010, mzxid=14013, ctime=1623998586959, mtime=1623998586969, version=3, cversion=0, aversion=0, ephemeralOwner=0, dataLength=3, numChildren=0, pzxid=14010)))
2021-06-18 08:43:06,973 ERROR:TEST: node added event with deleted data
2021-06-18 08:43:06,973 DEBUG:Received EVENT: Watch(type=4, state=3, path='/root')
2021-06-18 08:43:06,973 DEBUG:process_watch: WatchedEvent(type='CHILD', state='CONNECTED', path='/root')
2021-06-18 08:43:06,973 DEBUG:Received response(xid=22): True
2021-06-18 08:43:06,973 DEBUG:Sending request(xid=23): GetChildren(path='/root', watcher=<bound method TreeNode._process_watch of <kazoo.recipe.cache.TreeNode object at 0x7fd805553ee0>>)
2021-06-18 08:43:06,974 DEBUG:Received response(xid=23): []
2021-06-18 08:43:06,974 DEBUG:process_result: get_children /root

Specifications

  • Kazoo version: 2.8.0
  • Result of pip list command:
Package    Version
---------- -------
kazoo      2.8.0
pip        21.1.1
setuptools 56.0.0
six        1.16.0
  • Zookeeper version: 3.6.2 (Docker)
  • Zookeeper configuration: -
  • Python version: Python 3.9.5
  • OS: Linux 5.12.9-arch1-1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants