Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully resolve and randomize the client host list at connect time #372

Merged
merged 10 commits into from
Aug 5, 2016

Conversation

toddpalino
Copy link
Contributor

When using a round-robin DNS record set for a Zookeeper ensemble (i.e. a single A record that points to all the IP addresses of the servers in the ensemble), we found that Kazoo was always connecting to the same server. It appears that the underlying socket calls that Kazoo uses to set up connections do not rotate the addresses in a round robin properly (if at all).

This is a refactor of PR #360, which resolved the hosts at client setup time. As pointed out, that can generate some issues if the DNS records change while an application is running. A cleaner solution, that more closely matches the current behavior, is to expand the host list in the same way at connect time. In order to do this properly, randomization of the host list was moved to connect time as well. Otherwise we could end up with one host in the hostlist that is a DNS round robin, and the connect loop would always pick the first item in the RRset.

@toddpalino
Copy link
Contributor Author

FYI - There appears to be a problem with the tests. The Zookeeper mirrors have removed 3.4.6 from the repository, and this is causing huge problems.

@harlowja
Copy link
Contributor

Hmmm, that's not good.

@toddpalino
Copy link
Contributor Author

OK, I got this to work by changing the URL for pulling ZK to point to archive.apache.org, which apparently where the older versions are being stored. May want to separately consider changing the tests to use 3.4.7 instead of 3.4.6, at which point the script can be repointed to a mirror.

Also, apologize for the commit spam on this to get it done. My git-fu is not mighty.

@harlowja
Copy link
Contributor

Thanks much (for also fixing the zk archive location to).

for rhost in socket.getaddrinfo(host.strip(), port, 0, 0, socket.IPPROTO_TCP):
host_ports.append((rhost[4][0], rhost[4][1]))
except socket.gaierror:
# Skip hosts that don't resolve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do u think about logging this? Seems like a warning would at least be useful (as someone giving an unresolveable host seems not that good).

@micahhausler
Copy link

Any status on this? I'd love for this to clear, I'm using project that relies on this. I'd be happy to pitch in!

@toddpalino
Copy link
Contributor Author

Sorry about that... all for the want of a log statement!

I've updated the PR with the logging (good idea), but I'm having problems with intermittent failures in Travis, of course. Nothing related to this code. I'll try and get a clean test.

@toddpalino toddpalino closed this Apr 23, 2016
@toddpalino toddpalino reopened this Apr 23, 2016
@toddpalino toddpalino closed this Apr 23, 2016
@toddpalino toddpalino reopened this Apr 23, 2016
@toddpalino toddpalino closed this Apr 23, 2016
@toddpalino toddpalino reopened this Apr 23, 2016
@toddpalino toddpalino closed this Apr 23, 2016
@toddpalino toddpalino reopened this Apr 23, 2016
@toddpalino toddpalino closed this Apr 23, 2016
@toddpalino toddpalino reopened this Apr 23, 2016
@toddpalino toddpalino closed this Apr 24, 2016
@toddpalino toddpalino reopened this Apr 24, 2016
@toddpalino toddpalino closed this Apr 24, 2016
@toddpalino toddpalino reopened this Apr 24, 2016
@toddpalino toddpalino closed this Apr 24, 2016
@toddpalino toddpalino reopened this Apr 24, 2016
@toddpalino
Copy link
Contributor Author

Apologize for all the email spam on this as I tried to get a clean set of tests. I finally did. Can we merge this one now?

@toddpalino
Copy link
Contributor Author

Hey @harlowja, can we get this merged? We're now running into some serious issues even with our workaround internally because of IPv6 (and this would resolve it)

@bbangert
Copy link
Member

bbangert commented Aug 5, 2016

LGTM

@bbangert bbangert merged commit a4ef623 into python-zk:master Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants