Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly truncate and randomly load balance large answers #237

Closed
jdef opened this issue Aug 29, 2015 · 12 comments
Closed

Properly truncate and randomly load balance large answers #237

jdef opened this issue Aug 29, 2015 · 12 comments

Comments

@jdef
Copy link
Contributor

jdef commented Aug 29, 2015

wondering if we need some kind of max-record-count that limits the number if IPs returned for A records, or addresses returned for SRVs in large clusters (10k). for example, the "slave.mesos" name that maps to all slave IPs in the cluster probably doesn't scale very well to large cluster sizes. same goes for 'task.framework.name' for very large numbers of tasks with the same name.

what's the value in getting back 10k addresses or 10k IPs?

@jdef jdef added the question label Aug 29, 2015
@tsenart
Copy link
Contributor

tsenart commented Sep 1, 2015

@jdef: I think our answers should be limited to what fits in a DNS UDP datagram without truncation. I don't think we need a limit the number of IPs in the server memory, only in DNS answers. We should randomize the which IPs are included in the answer.

@tsenart tsenart changed the title cap needed for number of ips/addresses in large clusters? Properly truncate and randomly load balance large answers Sep 10, 2015
@tsenart tsenart added bug and removed question labels Sep 10, 2015
@tsenart tsenart added this to the v1.0.0 milestone Oct 6, 2015
@discordianfish
Copy link
Contributor

@tsenart But this would break for example the prometheus DNS based SD. That expects that we return all slaves.

@tsenart
Copy link
Contributor

tsenart commented Oct 19, 2015

@discordianfish: We need to do this regardless of whether we support EDNS(0), but that should alleviate the problem for Prometheus, as long as it has EDNS(0) enabled DNS client. The upper bound of records we can have in the answer section is 65535 due to the ANCOUNT field in the header being an uint16. If the client won't support EDNS(0) then it could try over TCP, which is done by default in the native Go DNS client if the returned message over UDP is truncated. Over TCP, we could always serve up to 65535 answers in a single message.

@discordianfish
Copy link
Contributor

@tsenart Yes, that's what we should do but you said above that we should always limit the number so it fits into UDP. Instead we should return all, even if this means we need to truncate and have clients fall back to tcp.

@tsenart
Copy link
Contributor

tsenart commented Oct 19, 2015

@discordianfish: By definition, when you truncate, you can't return all :-)

@discordianfish
Copy link
Contributor

Ok, let me rephrase: We should truncate the responses to whatever the client supports (512 without EDNS, whatever is indicated with EDNS) and 65535 records for TCP.

@tsenart
Copy link
Contributor

tsenart commented Oct 19, 2015

👍

@sargun
Copy link
Contributor

sargun commented Nov 17, 2015

@discordianfish Is this what your PR that you merged today does?

@sargun sargun closed this as completed Nov 17, 2015
@sargun sargun reopened this Nov 17, 2015
@jdef
Copy link
Contributor Author

jdef commented Nov 17, 2015

#330 (merged today) should take care of proper truncation. is there a need
for a separate follow-up ticket re: the "randomly load balance" part?

On Tue, Nov 17, 2015 at 12:08 PM, Sargun Dhillon notifications@github.com
wrote:

Closed #237 #237.


Reply to this email directly or view it on GitHub
#237 (comment).

@sargun
Copy link
Contributor

sargun commented Nov 17, 2015

@jdef So, digging through the code, it looks like shuffleAnswers gets called in HandleMesos before reply, truncate in reply. This should take care of the "randomly load balance" requirement. What do you think?

P.S. perhaps truncate should be called in HandleMesos?

@jdef
Copy link
Contributor Author

jdef commented Nov 17, 2015

if we're already shuffling then it sounds like we're all set with this
ticket.

On Tue, Nov 17, 2015 at 3:45 PM, Sargun Dhillon notifications@github.com
wrote:

@jdef https://github.com/jdef So, digging through the code, it looks
like shuffleAnswers gets called in HandleMesos before reply, truncate in
reply. This should take care of the "randomly load balance" requirement.
What do you think?

P.S. perhaps truncate should be called in HandleMesos?


Reply to this email directly or view it on GitHub
#237 (comment)
.

@sargun
Copy link
Contributor

sargun commented Nov 17, 2015

😸

@sargun sargun closed this as completed Nov 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants