Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand RR DNS entries into list of available hosts #215

Closed
ltucker opened this issue Jun 18, 2014 · 25 comments
Closed

Expand RR DNS entries into list of available hosts #215

ltucker opened this issue Jun 18, 2014 · 25 comments

Comments

@ltucker
Copy link

ltucker commented Jun 18, 2014

For example if I have multiple DNS entries for "zk.example.com" set up:

10.0.0.10
10.0.0.11
10.0.0.12
10.0.0.13

kazoo.hosts.collect_hosts("zk.example.com") will return a single node "zk.example.com" while the java client will expand this into 4 nodes with the listed ip addresses.

During a call to client.start(), Kazoo will try the hostname "zk.example.com" once (on one of the addresses) and then give up (unless retries are configured). If expanded, it would try each of these in a single try as if listed separately in the configuration.

Zookeeper FAQs mention this, not sure how expected/official it is:

http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A8
"DNS round robin: this setup involves one hostname and a list of IPs. ZK clients correctly make used of a list of IPs returned by DNS query. Thus this setup works the same way as multiple hostname (IP) argument to zookeeper_init."

@rgs1
Copy link
Contributor

rgs1 commented Jun 19, 2014

With what version? With > 1.3.1, what should happen is that create_tcp_connection is called:

https://github.com/python-zk/kazoo/blob/master/kazoo/handlers/utils.py#L54

which will call socket.create_connection:

https://github.com/python-git/python/blob/master/Lib/socket.py#L498

which should expand the DNS name via socket.getaddrinfo. In our use-case, Kazoo expands RR DNS entries just fine..

@lra
Copy link

lra commented Jun 19, 2014

We are using the stable version available on pypi: 1.3.1.
I tested with master and it exands the RR DNS.
So this problem will be fixed in the next release: 2.0?

@rgs1
Copy link
Contributor

rgs1 commented Jun 19, 2014

that's weird, 1.3.1 has that code too:

https://github.com/python-zk/kazoo/blob/1.3.1/kazoo/handlers/utils.py#L48

here's the commit that replaced socket() && connect() with socket.create_connection():

01d3f0d

As mentioned before, I am using 1.3.1 and it expands DNS entries just fine...

@rgs1
Copy link
Contributor

rgs1 commented Jun 19, 2014

Doubled checked that the released egg matches the branch (i.e.: downloaded https://pypi.python.org/packages/source/k/kazoo/kazoo-1.3.1.zip) and it does.. I wonder if maybe you have multiple versions of Kazoo installed and you are picking up the old one?

@lra
Copy link

lra commented Jun 19, 2014

As mentioned before, I am using 1.3.1 and it expands DNS entries just fine...

Is there a solid way to make sure it does? kz._connection._socket.getpeername() gives me only one server.

I wonder if maybe you have multiple versions of Kazoo installed and you are picking up the old one?

Our infrastructure is consistent, we are running 1.3.1 everywhere.

@rgs1
Copy link
Contributor

rgs1 commented Jun 19, 2014

Oh, kz._connection._socket.getpeername() refers to the currently connected endpoint, it will give you only the connected address.

Something you could do to check if it connects to a different address is force a reconnect via:

kz._connection._socket.shutdown(socket.SHUT_RDWR)

Then, when it reconnects, call kz._connection._socket.getpeername() again and check if you see a different address.

@lra
Copy link

lra commented Jun 20, 2014

It works, thanks for helping us making sure it does.

@lra
Copy link

lra commented Jun 20, 2014

We can close this =) (I can't)

@ltucker
Copy link
Author

ltucker commented Jun 20, 2014

Sorry, that's not what I mean... I'm talking about the list of addresses that are collected by kazoo.hosts.collect_hosts(...)

I realize these correctly rotate for any single call using the hostname, but they are not considered separate nodes as if they were listed out separately in the configuration.

During a zk.start() each address returned by collect_hosts() here is tried once (unless retries are configured). As is, one of the RR addresses (a random one) is tried, and then none of the others are tried.

@rgs1
Copy link
Contributor

rgs1 commented Jun 20, 2014

@ltucker: well but expanding the hostname to its IP addresses at the start is also problematic, you would then fail to get the changes during reconnects (i.e.: if IPs were added/removed).

Could you not use retries=N were N is the number of IPs associated to the hostname?

@ltucker
Copy link
Author

ltucker commented Jun 20, 2014

@rgs1 This is AFAICT what the java client does. I'd rather not set retries=IPs because that presumes that I know how many DNS entries are in DNS on the client end...

Suggestions:
Could we make the list of hosts consulted later (say a property that calls that function) so that it's recomputed when needed (or cached and recomputed at some reasonable interval)?

@rgs1
Copy link
Contributor

rgs1 commented Jun 20, 2014

Or I guess that if you real want retries=1 then for me it would be responsibility of the caller to pass in the expanded list of hosts (i.e.: map(lambda r: r[4][0], socket.getaddrinfo('hostname', 2181, socket.AF_INET, socket.SOCK_STREAM))).

@ltucker
Copy link
Author

ltucker commented Jun 20, 2014

@rgs1 yeah user doing expansion beforehand is decent, I guess a helper would be nice, since the format of the hosts list is a bit bizarre (single string where you can only have chroot on the last item etc?)

@rgs1
Copy link
Contributor

rgs1 commented Jun 20, 2014

I think the chroot affects all hosts... (i.e.: in your expanded list, '10.0.0.3:2181,10.0.0.1:2181,10.0.0.2:2181/home' would set it for all). Or am I missing something? See: https://github.com/python-zk/kazoo/blob/master/kazoo/client.py#L336

@ltucker
Copy link
Author

ltucker commented Jun 20, 2014

Yeah it works, I just mean you have to untangle it, expand, look for that chroot and then know to build that thing with the chroot on the end... I'm not even sure it's documented.

I'd rather see something show up that can solve this for others rather than everyone roll your own DNS expansion mess.

@rgs1
Copy link
Contributor

rgs1 commented Jun 20, 2014

Yeah, I agree we should add/improve the hosts/chroot's string documentation.

@ltucker
Copy link
Author

ltucker commented Jun 20, 2014

I think it's also possible to get a decent behavior when DNS changes if the hosts list can be made into a policy that consulted when needed -- by default it could be as-is, a precomputed list. For more exotic cases an expansion / re-lookup could happen.

@ltucker
Copy link
Author

ltucker commented Jun 20, 2014

well, I'll take your last remark to mean it's a problem you don't feel like solving. No big deal. Just letting you know the behavior was surprising, since it didn't match expectation set by the java client.

@rgs1
Copy link
Contributor

rgs1 commented Jun 20, 2014

oh no, just meant that adding docs is something really easy and could be done right away (and even so, I was just trying to help.. I am merely an sporadic Kazoo contributor so don't take my comments too seriously!).

@ltucker
Copy link
Author

ltucker commented Jun 20, 2014

hah okay, sorry. @bbangert can I get a re-read on this to see whether it's worth re-opening?

@lra
Copy link

lra commented Jun 20, 2014

I guess the weird connection string with an appeneded chroot is more a zookeeper thing than a kazoo one =(

hosts:

provide a connection string containing a comma separated list of host:port pairs

chroot:

An optional "chroot" suffix may also be appended to the connection string

http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkSessions

@novas0x2a
Copy link
Contributor

for what it's worth, we have a patch that makes the timeouts independent of the number of hosts (after all, if you set a timeout of 30s but have a 100 possible servers you can connect to.. you probably wanted the timeout at 30s, not 300ms.)

I punted on making the host list runtime-configurable (though it wouldn't be all that hard) but if you're using RR-dns you probably don't care about that.

https://gist.github.com/novas0x2a/57213e44c6db93814068

@hannosch hannosch reopened this Jun 22, 2014
@bbangert
Copy link
Member

If someone would like to provide a PR that expands the hosts in this number I think that'd be fine to add.

@toddpalino
Copy link
Contributor

I just created PR #360 to expand the hostlist. It uses socket.getaddrinfo in collect_hosts to expand the list.

@bbangert
Copy link
Member

bbangert commented Jun 1, 2017

PR merged as #372.

@bbangert bbangert closed this as completed Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants