Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backoff when connecting to cluster with leader election in progress #32

Merged
merged 1 commit into from Feb 8, 2016
Merged

Conversation

anttirt
Copy link

@anttirt anttirt commented Feb 3, 2016

When the client attempts to connect to a cluster that's in the process of leader re-election, the TCP connection initially succeeds but is then closed by the ZK node. The current backoff logic treats this as a "success" and resets the backoff, resulting in reconnect attempts without any backoff, and with many clients this can result in a considerable load on the ZK servers at a very critical moment, potentially delaying the re-election and recovery.

The effect can be seen with https://github.com/anttirt/zktestcluster (you'll need Virtualbox and Vagrant) and any client connecting to it using ZooKeeperNet. Set up the cluster (vagrant up), connect a client to it and run vagrant -c ssh //vagrant/restart-leader.sh. You should see the client attempting many (dozens) of reconnects with no backoff.

This patch delays resetting the backoff until we have a confirmed session on the ZK node.

ewhauser added a commit that referenced this pull request Feb 8, 2016
Fix backoff when connecting to cluster with leader election in progress
@ewhauser ewhauser merged commit 927dbb1 into ewhauser:trunk Feb 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants