Kazoo uses the entire session timeout on each attempted TCP connection before negotiating a session #517

graysonchao · 2018-07-25T18:18:23Z

When initializing a KazooClient and connecting to a ZK ensemble, the user has two places where they can pass in a timeout:

kazoo.client.KazooClient takes a timeout kwarg that specifies the requested client session timeout, which is used to negotiate a session timeout with the server. I'll refer to what is passed in to the constructor as the client session timeout, and the eventual negotiated result as the negotiated session timeout.
KazooClient.start takes a timeout kwarg that specifies how long to wait for a connection to ZK overall, without resetting the timer on retries/switching to a different address. I'll call this the client start timeout.

Before a session has ever been negotiated with the server (i.e. when trying to connect to ZK initially) Kazoo passes the client session timeout to kazoo.handlers.utils.create_tcp_connection, which means that the entire session timeout is used as the timeout for each attempted TCP connection. But later, the negotiated session timeout is divided by the number of hosts in the ensemble, and this divided value is used when retrying requests or recovering from a broken connection. Assuming that the client session timeout and the negotiated session timeout are equal, this means that the timeout for each attempted TCP connection is much shorter after a session has been negotiated.

This doesn't seem to make sense. To be consistent, it seems to me that Kazoo should divide the passed client session timeout by the number of hosts when trying to create an initial connection, so that it can try to connect to each host in the ensemble. Otherwise, when the client start timeout is equal to the client session timeout, a single failed connection can eat up the entire timeout and cause Kazoo to throw a timeout_exception even if there are healthy hosts in the connect string.

If a user sets a low client session timeout, they might run into issues where reconnecting or submitting requests times out much faster than they expect, because this timeout is later divided by the ensemble size. If they set the timeouts equal or have the client session timeout higher, then they could burn their entire KazooClient.start timeout on a single unresponsive host without ever trying the other hosts in the connect string.

One workaround is to set the client start timeout to NUM_HOSTS * client start timeout, but this seems weird to me - why would we want a different timeout for creating the session than for keeping it alive?

At any rate, I'm not sure if this is intended behavior. Could someone please help me understand? Of course, if it's not intended, I'm happy to submit a PR.

I think this issue was touched on in #205, btw.

The text was updated successfully, but these errors were encountered:

StephenSorriaux · 2018-12-03T14:09:22Z

Hi @graysonchao,

Thanks for your issue. It seems to me that the problem you are describing has been fixed in PR #540. The initial Connect() request (which defines the client session timeout you are describing) is now sent with a timeout that takes into account the number of hosts in the ZK ensemble.

I'm closing the issue now but feel free to re-open it in case I misunderstood the underlying problem.

StephenSorriaux added the Enhancement label Oct 29, 2018

StephenSorriaux closed this as completed Dec 3, 2018

gaoxinge mentioned this issue Mar 2, 2019

Dev apache/dubbo-python#15

Closed

BaranOrnarli mentioned this issue Mar 5, 2020

Issue with KazooTimeout #593

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kazoo uses the entire session timeout on each attempted TCP connection before negotiating a session #517

Kazoo uses the entire session timeout on each attempted TCP connection before negotiating a session #517

graysonchao commented Jul 25, 2018

StephenSorriaux commented Dec 3, 2018

Kazoo uses the entire session timeout on each attempted TCP connection before negotiating a session #517

Kazoo uses the entire session timeout on each attempted TCP connection before negotiating a session #517

Comments

graysonchao commented Jul 25, 2018

StephenSorriaux commented Dec 3, 2018