Skip to content

Commit

Permalink
fix(core): do not allow responses to choke request and ping processing
Browse files Browse the repository at this point in the history
Without this patch, a single select event is processed by iteration in
the 'ConnectionHandler' event loop.

In a scenario where the client issues a large number of async requests
with an important amplification factor, e.g. 'get_children_async' on a
large node, it is possible for the 'select' operation to almost always
return a "response ready" socket--as the server is often able to
process, serialize and ship a new reponse while Kazoo processes the
previous one.

That response socket often (always?) ends up at the beginning of the
list returned by 'select'.

As only 'select_result[0]' is processed in the loop, this can cause
the client to ignore the "request ready" FD for a long time, during
which no requests or pings are sent.

In effect, asynchronously "browsing" a large tree of nodes can stretch
that duration to the point where it exceeds the timeout--causing the
client to lose its session.

This patch considers both descriptors after 'select', and also
arranges for pings to be sent in case it encounters an "unending"
stream of responses to requests which were sent earlier.
  • Loading branch information
ztzg authored and jeffwidman committed Dec 13, 2020
1 parent b2f7a46 commit 89e0660
Showing 1 changed file with 21 additions and 10 deletions.
31 changes: 21 additions & 10 deletions kazoo/protocol/connection.py
Expand Up @@ -561,7 +561,6 @@ def _connect_loop(self, retry):
def _connect_attempt(self, host, hostip, port, retry):
client = self.client
KazooTimeoutError = self.handler.timeout_exception
close_connection = False

self._socket = None

Expand All @@ -582,13 +581,14 @@ def _connect_attempt(self, host, hostip, port, retry):
connect_timeout = connect_timeout / 1000.0
retry.reset()
self.ping_outstanding.clear()
last_send = time.time()
with self._socket_error_handling():
while not close_connection:
while True:
# Watch for something to read or send
jitter_time = random.randint(0, 40) / 100.0
jitter_time = random.randint(1, 40) / 100.0
deadline = last_send + read_timeout / 2.0 - jitter_time
# Ensure our timeout is positive
timeout = max([read_timeout / 2.0 - jitter_time,
jitter_time])
timeout = max([deadline - time.time(), jitter_time])
s = self.handler.select([self._socket, self._read_sock],
[], [], timeout)[0]

Expand All @@ -597,12 +597,23 @@ def _connect_attempt(self, host, hostip, port, retry):
self.ping_outstanding.clear()
raise ConnectionDropped(
"outstanding heartbeat ping not received")
self._send_ping(connect_timeout)
elif s[0] == self._socket:
response = self._read_socket(read_timeout)
close_connection = response == CLOSE_RESPONSE
else:
self._send_request(read_timeout, connect_timeout)
if self._socket in s:
response = self._read_socket(read_timeout)
if response == CLOSE_RESPONSE:
break
# Check if any requests need sending before proceeding
# to process more responses. Otherwise the responses
# may choke out the requests. See PR#633.
if self._read_sock in s:
self._send_request(read_timeout, connect_timeout)
# Requests act as implicit pings.
last_send = time.time()
continue

if time.time() >= deadline:
self._send_ping(connect_timeout)
last_send = time.time()
self.logger.info('Closing connection to %s:%s', host, port)
client._session_callback(KeeperState.CLOSED)
return STOP_CONNECTING
Expand Down

0 comments on commit 89e0660

Please sign in to comment.