Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where kazoo client returns SESSION EXPIRED when in CONNECTING state #570

Closed
wants to merge 2 commits into from

Conversation

prajay
Copy link

@prajay prajay commented Aug 28, 2019

kazoo client returns a SessionExpired when its state is in CONNECTING. This is not an indication of session expiry and could just be that the client is still re-establishing a connection to the server after a network blip. Hence, the error returned should be a ConnectionLoss and not a SessionExpired

@prajay
Copy link
Author

prajay commented Aug 28, 2019

Would it be possible to restart the failed jobs on travis-ci? I can see that all jobs succeeded when I triggered a build on my fork
Screen Shot 2019-08-27 at 10 30 02 PM

@StephenSorriaux
Copy link
Member

StephenSorriaux commented Aug 28, 2019

Hi,

Thanks for this PR, I will link into it. Don't bother with the tests for now, I will re-trigger the Travis build.

@prajay
Copy link
Author

prajay commented Sep 5, 2019

Hey @StephenSorriaux,

Just wanted to check on this. Does the change seem reasonable to you?

@StephenSorriaux
Copy link
Member

StephenSorriaux commented Sep 6, 2019

Hey @prajay,

I did not have time to go deep in the reason why we chose to raise SessionExpiredError in this case. Also, this change is not that straightforward since some users might be catching this exception for their use case, so ,as if, changing it to ConnectionLoss would be a breaking change.

@ztzg
Copy link
Contributor

ztzg commented Nov 13, 2019

Hi @StephenSorriaux, @prajay,

A note to let you know that we are also seeing the issue addressed by this PR (also reported as
#374) and would like to see something akin to this patch in Kazoo master.

The main issue is that spurious "session expired" exceptions can cause applications to tear down much more state than required, and to try to rebuild it—further stressing an ensemble which may just have, for example, lost its leader.

I see that the goal of the commit (by @bbangert) which introduced the CONNECTING check was to allow applications to distinguish between a method invoked on a CLOSED client and a server-generated SESSIONEXPIRED (error code -112): 4eabbd1

I basically see three ways of allowing applications to discriminate between these different states while retaining "full" backward compatibility:

  1. No code change, but a "redefinition" of "session expired" to SessionExpiredError + client.client_state == KeeperState.EXPIRED_SESSION.

  2. Keeping the KeeperState, or alternatively, "fake codes" in the existing exception. Something akin to:

    @_zookeeper_exception(-112)
    class SessionExpiredError(ZookeeperError):
        def __init__(self, *, state=None):
            self.state = state
  3. Taking inspiration of the commit above: introducing distinct subclasses of SessionExpiredError (one per actual condition) and redirecting EXCEPTIONS[-112] to one of them.

Each of these options is "distasteful" in its own way, but I could nevertheless whip up patches/PRs for #3 and/or #2 if you think they would be worth a closer look.

(#3 would probably be more "natural" if we manage to find "reasonable" subclass names. ConnectingError and ServerSessionExpiredError are the best I could come up with so far. Suggestions are welcome!)

@StephenSorriaux
Copy link
Member

StephenSorriaux commented Nov 17, 2019

Hi @ztzg,

Thanks for taking some time to share your thoughts about this subject.

I understand what you want to do the and I start to think that the CONNECTING state should actually not raise any exception but, instead, just makes the request be queued as it is done with the CONNECTED state. Since the request stays in the queue as long as no connection is available, it should work fine but that definitely needs some testing (no side-effect on recipes, ensure that we don't keep too much requests, etc.).

@ztzg
Copy link
Contributor

ztzg commented Nov 19, 2019

Not raising an exception on CONNECTING does indeed sound like the best solution (and matches what other clients are doing).

I had written it off as requiring invasive changes and/or going against the "philosophy" of Kazoo (no setWatches, etc.). I could have a closer look and see if I can come up with a proof of concept which is not too disruptive, if you think it has a chance to move forward.

@StephenSorriaux
Copy link
Member

StephenSorriaux commented Nov 19, 2019

I really would like to more forward with this issue, so yes any contributions is very welcomed.

@jeffwidman
Copy link
Member

jeffwidman commented Feb 6, 2020

Nudge @ztzg

@ztzg
Copy link
Contributor

ztzg commented Feb 6, 2020

@jeffwidman: Still in my TODO list, but other things got on top. You're right that I should move forward a bit on this front, though. I'll keep your nudge in mind! :)

ztzg added a commit to ztzg/kazoo that referenced this issue Feb 9, 2020
…n-zk#374)

As discussed in python-zk#570 (comment):

  * Requests issued in 'CONNECTING' state get queued, and a "token"
    added to the pipe, as in the 'CONNECTED' state;

  * All test pass;

  * TODO: This patch does not try to limit the maximum queue length.
    How should that be controlled?  A new 'KazooClient' parameter?  If
    so, what should it default to?
ztzg added a commit to ztzg/kazoo that referenced this issue Feb 9, 2020
…n-zk#374)

As discussed in python-zk#570 (comment):

With this patch, requests issued while the client is in the
'CONNECTING' state get queued instead of raising a (misleading)
'SessionExpiredError'.

  * [TODO] This patch does not prevent requests which have been queued
    but not emitted from being rejected with 'ConnectionLoss'.  It
    should also get rid of the second part of '_notify_pending',
    shouldn't it?

  * [TODO] This patch does not try to limit the maximum queue length,
    which should probably be controlled via a new 'KazooClient'
    parameter.  (How about 'max_queue_length'?  What should it default
    to?)
ztzg added a commit to ztzg/kazoo that referenced this issue Feb 9, 2020
…n-zk#374)

As discussed in python-zk#570 (comment):

With this patch, requests issued while the client is in the
'CONNECTING' state get queued instead of raising a (misleading)
'SessionExpiredError'.

  * [TODO] This patch does not prevent requests which have been queued
    but not emitted from being rejected with 'ConnectionLoss'.  It
    should also get rid of the second part of '_notify_pending',
    shouldn't it?

  * [TODO] This patch does not try to limit the maximum queue length,
    which should probably be controlled via a new 'KazooClient'
    parameter.  (How about 'max_queue_length'?  What should it default
    to?)
ztzg added a commit to ztzg/kazoo that referenced this issue Feb 14, 2020
…#374)

As discussed in python-zk#570 (comment):

With this patch, requests issued while the client is in the
'CONNECTING' state get queued instead of raising a (misleading)
'SessionExpiredError'.
ztzg added a commit to ztzg/kazoo that referenced this issue Feb 17, 2020
…#374)

With this patch, requests issued while the client is in the
'CONNECTING' state get queued instead of raising a misleading
'SessionExpiredError'.

This fixes python-zk#374, and brings
Kazoo more in line with the Java and C clients.

See the 'kazoo.client.KazooClient.state' documentation as well as
these discussions for more details:

python-zk#570 (comment)
python-zk#583 (comment)
@StephenSorriaux
Copy link
Member

StephenSorriaux commented Feb 23, 2020

Thanks @prajay for your time on this PR, I am closing it in favor of #588.

ztzg added a commit to ztzg/kazoo that referenced this issue Feb 24, 2020
…#374)

With this patch, requests issued while the client is in the
'CONNECTING' state get queued instead of raising a misleading
'SessionExpiredError'.

This fixes python-zk#374, and brings
Kazoo more in line with the Java and C clients.

See the 'kazoo.client.KazooClient.state' documentation as well as
these discussions for more details:

python-zk#570 (comment)
python-zk#583 (comment)
StephenSorriaux pushed a commit that referenced this issue Mar 9, 2020
With this patch, requests issued while the client is in the
'CONNECTING' state get queued instead of raising a misleading
'SessionExpiredError'.

This fixes #374, and brings
Kazoo more in line with the Java and C clients.

See the 'kazoo.client.KazooClient.state' documentation as well as
these discussions for more details:

#570 (comment)
#583 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants