-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be able to retry the query via TCP if a query failed because of a timeout #13757
Conversation
…sing UDP Motivation: We should retry the query via TCP if the query failed because of a timeout when using UDP. Modifications: - Move all the retry code for TCP into DnsQueryContext so we can reuse the same code for handling truncation and retry. - Retry on timeout if possible - Add unit tests Result: More robust resolver
* @return {@code true} if retry via TCP is supported and so the ownership of | ||
* {@code originalResult} was transferred, {@code false} otherwise. | ||
*/ | ||
private boolean retryWithTCP(final Object originalResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate on the motivation in a bit more details? Where this is coming from?
In my experience, DNS timeouts are quite frequent and automatically retrying them via TCP may cause unexpected side-effects, like:
- People expecting the query to fail within a pre-defined timeout. Retrying it will double the total timeout.
- Spike of TCP traffic for DNS when timeouts are frequent enough.
I will propose to make this an opt-in feature, but mostly looking for a motivational story behind it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idelpivnitskiy you may read this https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
Dns through tcp is real in needed, maybe tcp is not default behavior, but we should make people can enable tcp fallback option and set tcp timeout.
We use redisson watch dns change event, redisson dns watching implement by netty dns resolver, redisson dns udp timeout 5000ms frequently becase of udp dns race. redisson/redisson#5137 and #13705
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link and additional context.
My arguments (1) and (2) remain valid. If we merge this PR, it changes the current behavior and may cause unexpected outages under heavy load and scale. So, if we proceed with this change, it should be configurable (preferable to be opt-in by default) to let impacted users experiment in a safe fashion without affecting everyone.
After reading the article, I'm not sure this specific change is a good mitigation for the described problem. DNS timeouts are quite often, happen not only in k8s env, and may be caused by various different reasons. There are cases, when automatic retry over TCP can make everything even worse.
Good mitigation points from the article are:
a. disable parallel lookups
b. use TCP for lookups
With (b) and RFC7766 together, it makes more sense to let users chose TCP as the default protocol for their DNS client and skip UDP completely or let it be a fallback. However, it opens other questions around connection management and DoS protection described in RFC7766.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot assume what kind of network our program will run on. In any case, a timeout will occur. We should allow users to configure: when a failure occurs, switch to another network connection method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idelpivnitskiy the problem is that there is really no way for people to to do a retry via TCP by themself (we might want to support also create a DnsNameResolver
which only does TCP in the future). That said let's just make it configurable to also fallback to TCP on timeout.
Will this PR be merged in the futrue. |
Will have a look later this weekAm 10.01.2024 um 04:39 schrieb hellojukay ***@***.***>:
Will this PR be merged in the futrue.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the config flag @normanmaurer!
Will work on a patch for main |
…eout when u… (#13757) …sing UDP Motivation: We should allow people to retry the query via TCP if the query failed because of a timeout when using UDP. Modifications: - Move all the retry code for TCP into DnsQueryContext so we can reuse the same code for handling truncation and retry. - Retry on timeout if configured by user - Add unit tests Result: More robust resolver
…eout when u… (netty#13757) …sing UDP Motivation: We should allow people to retry the query via TCP if the query failed because of a timeout when using UDP. Modifications: - Move all the retry code for TCP into DnsQueryContext so we can reuse the same code for handling truncation and retry. - Retry on timeout if configured by user - Add unit tests Result: More robust resolver
Hi,
Using the 4.1.105.Final version, how can I configure/enable the TCP fallback in Spring? |
you would configure it via the |
If you use
See more here https://projectreactor.io/docs/netty/release/reference/index.html#_host_name_resolution_2 |
Thanks for the reply.
|
@paolosciamm I would say, you need |
Motivation:
We should allow people to retry the query via TCP if the query failed because of a
timeout when using UDP.
Modifications:
the same code for handling truncation and retry.
Result:
More robust resolver