Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve hosts so that we properly randomize any round robin RRsets #360

Closed
wants to merge 4 commits into from
Closed

Conversation

toddpalino
Copy link
Contributor

When using a round-robin DNS record set for a Zookeeper ensemble (i.e. a single A record that points to all the IP addresses of the servers in the ensemble), we found that Kazoo was always connecting to the same server. It appears that the underlying socket calls that Kazoo uses to set up connections do not rotate the addresses in a round robin properly (if at all).

The simplest solution to this is to use a call to socket.getaddrinfo in the collect_hosts method when compiling the list of hosts from the connect string. This allows any hostnames to be dereferenced and provides a full set of IP addresses that can then be randomized by the client. This works regardless of whether a hostname, an IPv4, or an IPv6 address is provided, as socket.getaddrinfo handles it properly and returns the tuples needed for the socket calls.

@toddpalino
Copy link
Contributor Author

The test failure appears to be unrelated to anything I changed. Closing and reopening to try a fresh build

@toddpalino toddpalino closed this Oct 14, 2015
@toddpalino toddpalino reopened this Oct 14, 2015
@toddpalino
Copy link
Contributor Author

Continuing transient failures - a different set this time.

@toddpalino toddpalino closed this Oct 14, 2015
@toddpalino toddpalino reopened this Oct 14, 2015
@toddpalino
Copy link
Contributor Author

It's unclear to me why the build is failing. The errors change on each run, and they appear to be something inherent to either Travis CI or the test suite, not the changes I have made. The initial failure was due to a test that needed to be changed for the new code, but that's it.

@toddpalino toddpalino closed this Oct 20, 2015
@toddpalino toddpalino reopened this Oct 20, 2015
@toddpalino
Copy link
Contributor Author

OK, now that this is passing, I think it can be merged!

@toddpalino toddpalino closed this Oct 21, 2015
@toddpalino toddpalino reopened this Oct 21, 2015
@toddpalino
Copy link
Contributor Author

Thanks @rgs1. That's a good catch. All fixed now with explicit mapping of the tuple.

@bringhurst
Copy link
Contributor

+1 on this, but only after the CHANGES.rst is updated with a short description of the fix

@toddpalino
Copy link
Contributor Author

It would seem that CHANGES.rst is more appropriate to be modified when the next version of kazoo is cut by the kazoo team. Which may or may not include just this PR.

Can we get this merged?

@harlowja
Copy link
Contributor

harlowja commented Nov 3, 2015

So what I would like and it doesn't seem to hard is to add the following to the CHANGES.rst file.

2.X.Y (TBD)
-----------

Features
********

Bug Handling
************

Documentation
*************

Add your change under the appropriate section and then it doesn't become the kazoo approvers job to track all the things that go into a release (and saves the volunteers that maintain this library some work).
Seem doable?

@toddpalino toddpalino closed this Nov 3, 2015
@toddpalino toddpalino reopened this Nov 3, 2015
@toddpalino toddpalino closed this Nov 3, 2015
@toddpalino toddpalino reopened this Nov 3, 2015
@toddpalino toddpalino closed this Nov 3, 2015
@toddpalino toddpalino reopened this Nov 3, 2015
@toddpalino toddpalino closed this Nov 3, 2015
@toddpalino toddpalino reopened this Nov 3, 2015
@toddpalino
Copy link
Contributor Author

Done, @harlowja, and managed to get it to pass travis-ci again.

@harlowja
Copy link
Contributor

Thanks much 👍

@chchen
Copy link

chchen commented Dec 12, 2015

I ran into this issue recently, and would love to see this merged in!

@@ -17,7 +18,10 @@ def collect_hosts(hosts, randomize=True):
if host is None:
raise ValueError("bad hostname")
port = int(res.port) if res.port else 2181
result.append((host.strip(), port))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after thinking about this I am wondering how RRsets that grow or shrink would be handled here. It seems once this is called now its relying on python code to do the name resolution, whereas previously it was relying on dns to correctly round-robin on reconnect... Am I wrong in saying that resolving the dns hostname at this time would basically make the client itself unable to take advantage of any changes in a hostnames A records that happen after the fact...

@harlowja
Copy link
Contributor

Just one comment that I'd like cleared up, now that I am thinking about what this means for people that were giving kazoo a CNAME and altering (after the fact) the A records under that CNAME, it seems like doing this resolution of A records at set_hosts call time would basically lose the advantage of having a CNAME in the first place right?

@harlowja
Copy link
Contributor

Maybe it's something like:

def _expand_hosts(host, port):
    for rhost in socket.getaddrinfo(host.strip(), port, 0, 0, socket.IPPROTO_TCP):
        yield (rhost[4][0], rhost[4][1])

def _connect_loop(self, retry):
    # Iterate through the hosts a full cycle before starting over
    status = None
    hosts_ports = []
    for host, port in self.client.hosts:
        hosts_ports.extend(_expand_hosts(host, port))
    for host, port in hosts_ports:
        if self.client._stopped.is_set():
            status = STOP_CONNECTING
            break
        status = self._connect_attempt(host, port, retry)
        if status is STOP_CONNECTING:
            break
    if status is STOP_CONNECTING:
        return STOP_CONNECTING
    else:
        raise ForceRetryError('Reconnecting')

@toddpalino
Copy link
Contributor Author

Yeah, I can definitely see where this can cause problems. While it may be fine for short-lived use cases that connect and go away, anything else would have problems with changes to the underlying DNS (which is one of the reasons we use RRsets in the first place). I like the idea of doing it in the connection handler upon each connection attempt. That's more in line with when DNS resolution is done now.

Given that this is still a PR, I'm going to drop this PR and create a new one with the changes. No reason to have speculative changes in the history.

@toddpalino
Copy link
Contributor Author

Resubmitted at PR #372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants