Be able to retry the query via TCP if a query failed because of a timeout #13757

normanmaurer · 2023-12-29T16:04:20Z

Motivation:

We should allow people to retry the query via TCP if the query failed because of a
timeout when using UDP.

Modifications:

Move all the retry code for TCP into DnsQueryContext so we can reuse
the same code for handling truncation and retry.
Retry on timeout if configured by user
Add unit tests

Result:

More robust resolver

…sing UDP Motivation: We should retry the query via TCP if the query failed because of a timeout when using UDP. Modifications: - Move all the retry code for TCP into DnsQueryContext so we can reuse the same code for handling truncation and retry. - Retry on timeout if possible - Add unit tests Result: More robust resolver

idelpivnitskiy · 2024-01-02T19:25:41Z

resolver-dns/src/main/java/io/netty/resolver/dns/DnsQueryContext.java

+     * @return                  {@code true} if retry via TCP is supported and so the ownership of
+     *                          {@code originalResult} was transferred, {@code false} otherwise.
+     */
+    private boolean retryWithTCP(final Object originalResult) {


Can you please elaborate on the motivation in a bit more details? Where this is coming from?
In my experience, DNS timeouts are quite frequent and automatically retrying them via TCP may cause unexpected side-effects, like:

People expecting the query to fail within a pre-defined timeout. Retrying it will double the total timeout.

Spike of TCP traffic for DNS when timeouts are frequent enough.

I will propose to make this an opt-in feature, but mostly looking for a motivational story behind it.

@idelpivnitskiy you may read this https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

Dns through tcp is real in needed, maybe tcp is not default behavior, but we should make people can enable tcp fallback option and set tcp timeout.

We use redisson watch dns change event, redisson dns watching implement by netty dns resolver, redisson dns udp timeout 5000ms frequently becase of udp dns race. redisson/redisson#5137 and #13705

Thanks for the link and additional context.

My arguments (1) and (2) remain valid. If we merge this PR, it changes the current behavior and may cause unexpected outages under heavy load and scale. So, if we proceed with this change, it should be configurable (preferable to be opt-in by default) to let impacted users experiment in a safe fashion without affecting everyone.

After reading the article, I'm not sure this specific change is a good mitigation for the described problem. DNS timeouts are quite often, happen not only in k8s env, and may be caused by various different reasons. There are cases, when automatic retry over TCP can make everything even worse.
Good mitigation points from the article are:
a. disable parallel lookups
b. use TCP for lookups

With (b) and RFC7766 together, it makes more sense to let users chose TCP as the default protocol for their DNS client and skip UDP completely or let it be a fallback. However, it opens other questions around connection management and DoS protection described in RFC7766.

We cannot assume what kind of network our program will run on. In any case, a timeout will occur. We should allow users to configure: when a failure occurs, switch to another network connection method.

@idelpivnitskiy the problem is that there is really no way for people to to do a retry via TCP by themself (we might want to support also create a DnsNameResolver which only does TCP in the future). That said let's just make it configurable to also fallback to TCP on timeout.

hellojukay · 2024-01-10T09:39:17Z

Will this PR be merged in the futrue.

normanmaurer · 2024-01-10T16:03:49Z

Will have a look later this weekAm 10.01.2024 um 04:39 schrieb hellojukay ***@***.***>: Will this PR be merged in the futrue. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

idelpivnitskiy

Thanks for the config flag @normanmaurer!

normanmaurer · 2024-01-12T19:24:59Z

Will work on a patch for main

…eout when u… (#13757) …sing UDP Motivation: We should allow people to retry the query via TCP if the query failed because of a timeout when using UDP. Modifications: - Move all the retry code for TCP into DnsQueryContext so we can reuse the same code for handling truncation and retry. - Retry on timeout if configured by user - Add unit tests Result: More robust resolver

…eout when u… (netty#13757) …sing UDP Motivation: We should allow people to retry the query via TCP if the query failed because of a timeout when using UDP. Modifications: - Move all the retry code for TCP into DnsQueryContext so we can reuse the same code for handling truncation and retry. - Retry on timeout if configured by user - Add unit tests Result: More robust resolver

paolosciamm · 2024-07-01T12:57:33Z

Hi,
thanks for this release.
Just a question about this quote:

Retry on timeout if configured by user

Using the 4.1.105.Final version, how can I configure/enable the TCP fallback in Spring?
Is it needed a particular properties or bean config?

normanmaurer · 2024-07-01T13:09:20Z

you would configure it via the DnsNameResolverBuilder.

violetagg · 2024-07-01T13:44:32Z

Hi, thanks for this release. Just a question about this quote:

Retry on timeout if configured by user

Using the 4.1.105.Final version, how can I configure/enable the TCP fallback in Spring? Is it needed a particular properties or bean config?

If you use WebClient then you can configure the Reactor Netty HttpClient like this:

HttpClient client = HttpClient.create().resolver(spec -> spec.retryTcpOnTimeout(true));

See more here https://projectreactor.io/docs/netty/release/reference/index.html#_host_name_resolution_2

paolosciamm · 2024-07-01T13:51:35Z

Hi, thanks for this release. Just a question about this quote:

Retry on timeout if configured by user

Using the 4.1.105.Final version, how can I configure/enable the TCP fallback in Spring? Is it needed a particular properties or bean config?

If you use WebClient then you can configure the Reactor Netty HttpClient like this:
HttpClient client = HttpClient.create().resolver(spec -> spec.retryTcpOnTimeout(true));
See more here https://projectreactor.io/docs/netty/release/reference/index.html#_host_name_resolution_2

Thanks for the reply.
And if I won't use Reactor Netty, would adding the following configuration (according to @normanmaurer answer) be enough?

@Configuration
public class CustomDnsNameResolver {
    @Bean
    public DnsNameResolver customDnsNameResolverBuilder() {
        NioEventLoopGroup group = new NioEventLoopGroup();
        return new DnsNameResolverBuilder(group.next())
                .channelType(NioDatagramChannel.class)
                .resolvedAddressTypes(ResolvedAddressTypes.IPV4_ONLY)
                .nameServerProvider(DnsServerAddressStreamProviders.platformDefault())
                .queryTimeoutMillis(5000)
                .maxQueriesPerResolve(2)
                .build();
    }
}

violetagg · 2024-07-01T15:13:48Z

@paolosciamm I would say, you need .socketChannelType(NioSocketChannel.class, true)

normanmaurer requested review from idelpivnitskiy and chrisvest December 29, 2023 16:04

normanmaurer mentioned this pull request Dec 29, 2023

When DNS by UDP truncated or timeout, then fallback to TCP #13750

Closed

Fix race in test

fea7225

chrisvest approved these changes Dec 30, 2023

View reviewed changes

idelpivnitskiy reviewed Jan 2, 2024

View reviewed changes

normanmaurer added 3 commits January 12, 2024 12:20

Merge branch '4.1' into retry_tcp_on_timeout

7d83bcf

Fix compile error

6785c2d

Make tcp fallback on timeout configurable

3c8df3a

normanmaurer requested review from idelpivnitskiy and hellojukay January 12, 2024 12:54

normanmaurer added this to the 4.1.105.Final milestone Jan 12, 2024

idelpivnitskiy approved these changes Jan 12, 2024

View reviewed changes

normanmaurer merged commit 684dfd8 into 4.1 Jan 12, 2024
15 checks passed

normanmaurer deleted the retry_tcp_on_timeout branch January 12, 2024 17:47

normanmaurer changed the title ~~Retry the query via TCP if a query failed because of a timeout when u…~~ Be able to retry the query via TCP if a query failed because of a timeout Jan 16, 2024

hellojukay mentioned this pull request Jan 18, 2024

need fix dns query timeout redisson/redisson#5572

Closed

claraccio mentioned this pull request Feb 14, 2024

Enabling DNS retryOnTimeout with TCP reactor/reactor-netty#3059

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be able to retry the query via TCP if a query failed because of a timeout #13757

Be able to retry the query via TCP if a query failed because of a timeout #13757

normanmaurer commented Dec 29, 2023 •

edited

Loading

idelpivnitskiy Jan 2, 2024

hellojukay Jan 3, 2024 •

edited

Loading

idelpivnitskiy Jan 4, 2024

hellojukay Jan 5, 2024

normanmaurer Jan 12, 2024

hellojukay commented Jan 10, 2024

normanmaurer commented Jan 10, 2024 via email

idelpivnitskiy left a comment

normanmaurer commented Jan 12, 2024

paolosciamm commented Jul 1, 2024

normanmaurer commented Jul 1, 2024

violetagg commented Jul 1, 2024

paolosciamm commented Jul 1, 2024 •

edited

Loading

violetagg commented Jul 1, 2024

Be able to retry the query via TCP if a query failed because of a timeout #13757

Be able to retry the query via TCP if a query failed because of a timeout #13757

Conversation

normanmaurer commented Dec 29, 2023 • edited Loading

idelpivnitskiy Jan 2, 2024

Choose a reason for hiding this comment

hellojukay Jan 3, 2024 • edited Loading

Choose a reason for hiding this comment

idelpivnitskiy Jan 4, 2024

Choose a reason for hiding this comment

hellojukay Jan 5, 2024

Choose a reason for hiding this comment

normanmaurer Jan 12, 2024

Choose a reason for hiding this comment

hellojukay commented Jan 10, 2024

normanmaurer commented Jan 10, 2024 via email

idelpivnitskiy left a comment

Choose a reason for hiding this comment

normanmaurer commented Jan 12, 2024

paolosciamm commented Jul 1, 2024

normanmaurer commented Jul 1, 2024

violetagg commented Jul 1, 2024

paolosciamm commented Jul 1, 2024 • edited Loading

violetagg commented Jul 1, 2024

normanmaurer commented Dec 29, 2023 •

edited

Loading

hellojukay Jan 3, 2024 •

edited

Loading

paolosciamm commented Jul 1, 2024 •

edited

Loading