Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolver failing to find valid DNS record #8261

Closed
johnjaylward opened this issue Sep 4, 2018 · 93 comments
Closed

DNS resolver failing to find valid DNS record #8261

johnjaylward opened this issue Sep 4, 2018 · 93 comments
Assignees
Labels
Milestone

Comments

@johnjaylward
Copy link

johnjaylward commented Sep 4, 2018

Expected behavior

The DNS resolver should find valid DNS records.

Actual behavior

Exception thrown:

Caused by: io.netty.resolver.dns.DnsNameResolverContext$SearchDomainUnknownHostException: Search domain query failed. Original hostname: 'host.toplevel' failed to resolve 'host.toplevel.search.domain' after 7 queries 
	at io.netty.resolver.dns.DnsNameResolverContext.finishResolve(DnsNameResolverContext.java:721)
	at io.netty.resolver.dns.DnsNameResolverContext.tryToFinishResolve(DnsNameResolverContext.java:663)
	at io.netty.resolver.dns.DnsNameResolverContext.query(DnsNameResolverContext.java:306)
	at io.netty.resolver.dns.DnsNameResolverContext.query(DnsNameResolverContext.java:295)
	at io.netty.resolver.dns.DnsNameResolverContext.tryToFinishResolve(DnsNameResolverContext.java:636)
	at io.netty.resolver.dns.DnsNameResolverContext$3.operationComplete(DnsNameResolverContext.java:342)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
	at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
	at io.netty.resolver.dns.DnsQueryContext.setSuccess(DnsQueryContext.java:197)
	at io.netty.resolver.dns.DnsQueryContext.finish(DnsQueryContext.java:180)
	at io.netty.resolver.dns.DnsNameResolver$DnsResponseHandler.channelRead(DnsNameResolver.java:969)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1412)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:943)
	at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

Steps to reproduce

  1. Configure a top level domain someDomain on a DNS server you own
  2. Configure a host under the new top level domain someHost.someDomain
  3. Configure multiple resolvers on the DNS client machine that will run the Netty code. i.e. 8.8.8.8, 192.168.1.1, and 10.0.0.1 (I have 3 resolvers configured, each pointing to different DNS masters - global DNS, local personal private network, company private network over a VPN)
  4. Configure the search domain to not match the top level domain, i.e. search.otherDomain on the DNS client machine that will run the Netty code
  5. Ask netty to resolve someHost.someDomain
  6. failure.

Minimal yet complete reproducer code (or URL to code)

I'm not using Netty directly so I'm not sure what to put here. Do you want my Redisson code?

Netty version

Breaks when I upgrade to Reddison 3.6+ which pulls in Netty 4.1.20+
When forcing downgrade to Netty 4.1.13 the problem still shows, but with a slightly different stack trace.

JVM version (e.g. java -version)

java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)

OS version (e.g. uname -a)

Windows 10, Centos 7, Ubuntu 16.04

@normanmaurer
Copy link
Member

@johnjaylward would it be possible to try different netty versions and see at which point it breaks ? This would help me a lot.

@johnjaylward
Copy link
Author

Are there any breaking points in the API between 4.1.13 and 4.1.20? If not, I don't see an issue with trying on my end.

@normanmaurer
Copy link
Member

@johnjaylward as the dns resolver is marked as unstable there were a few, but I think most people should not be affected as the scope is small. So I would give it a try.

@johnjaylward
Copy link
Author

well, now that I've played around with the versions a little more, I'm not sure what's up.
Using Redisson 3.5.0 with any version of Netty and the DNS query seems to work.

However using Redisson 3.6.0 with any version of Netty, the query fails. I'm guessing Redisson must have changed some configuration option that it is passing to the Netty resolver? Is that a possibility? If so, I'll open this on the Redisson side.

@normanmaurer
Copy link
Member

@johnjaylward sure thats possible... I dont know enough about Redission to tell you what they do.

@johnjaylward
Copy link
Author

Thanks, I'll check with them.

@normanmaurer
Copy link
Member

@johnjaylward please let me know how it goes :)

@mrniko
Copy link

mrniko commented Sep 10, 2018

@johnjaylward

I would reopen this issue. Issue appeared once Redisson switched to netty based resolver from JDK's InetAddress.getByName method.

@johnjaylward
Copy link
Author

Thanks for looking @mrniko .

@normanmaurer looks like it's an error in how netty is resolving.

@johnjaylward johnjaylward reopened this Sep 10, 2018
@johnjaylward johnjaylward changed the title Regression: DNS resolver failing to find valid DNS record DNS resolver failing to find valid DNS record Sep 10, 2018
@johnjaylward
Copy link
Author

Updated title since it's not a regression.

@normanmaurer
Copy link
Member

normanmaurer commented Sep 10, 2018 via email

@johnjaylward
Copy link
Author

johnjaylward commented Sep 10, 2018

I'm not sure if it's environment related or not. It appears to be environment related as it seems to resolve some host names just fine, it's just this custom top level domain one it has issues with (i.e. not localhost/.com/.edu etc which work fine).

The TLD is just "companyname" not "companyname.com" . The resolver appears to be looking at "host.companyname" and not liking the TLD, so it appends the DNS Search Domain to it (host.companyname.someOther.searchDomain), which is invalid.

Short of setting up your own DNS stack with a custom TLD to test it, I'm not sure what else to do.

@johnjaylward
Copy link
Author

Maybe you can test against a public TLD like .amazon or .google or something? I'm not sure if there would be a difference there though as I have 3 resolvers configured, but only 1 will return a valid result for the custom TLD.

@normanmaurer
Copy link
Member

normanmaurer commented Sep 10, 2018 via email

@normanmaurer
Copy link
Member

@johnjaylward could you test to only include the dnsserver that handles the custom domainname and see if it resolves in this case ?

Also would it be possible to run the following command against each of the servers (using the domain you want to resolve) and add the output here:

dig @dnsserverip host.toplevel A

@johnjaylward
Copy link
Author

Personal local network:

$ dig @192.168.55.5 host.domain A

; <<>> DiG 9.12.2-P1 <<>> @192.168.55.5 host.domain A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 58575
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;host.domain.                   IN      A

;; AUTHORITY SECTION:
.                       10619   IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2018091101 1800 900 604800 86400

;; Query time: 0 msec
;; SERVER: 192.168.55.5#53(192.168.55.5)
;; WHEN: Tue Sep 11 15:05:55 Eastern Daylight Time 2018
;; MSG SIZE  rcvd: 115

Global public network:

$ dig @8.8.8.8 host.domain A

; <<>> DiG 9.12.2-P1 <<>> @8.8.8.8 host.domain A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 18400
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;host.domain.                   IN      A

;; AUTHORITY SECTION:
.                       86362   IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2018091101 1800 900 604800 86400

;; Query time: 31 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Sep 11 15:04:32 Eastern Daylight Time 2018
;; MSG SIZE  rcvd: 115

Company DNS over VPN

$ dig @10.253.48.2 host.domain A

; <<>> DiG 9.12.2-P1 <<>> @10.253.48.2 host.domain A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42724
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: 065a3fc0ed8df40e (echoed)
;; QUESTION SECTION:
;host.domain.                   IN      A

;; ANSWER SECTION:
host.domain.            3600    IN      A       10.253.50.30

;; Query time: 46 msec
;; SERVER: 10.253.48.2#53(10.253.48.2)
;; WHEN: Tue Sep 11 15:08:19 Eastern Daylight Time 2018
;; MSG SIZE  rcvd: 68
$ dig @10.253.48.3 host.domain A

; <<>> DiG 9.12.2-P1 <<>> @10.253.48.3 host.domain A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19221
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
; COOKIE: 5e8dd4d92c46330d (echoed)
;; QUESTION SECTION:
;host.domain.                   IN      A

;; ANSWER SECTION:
host.domain.            3600    IN      A       10.253.50.30

;; Query time: 31 msec
;; SERVER: 10.253.48.3#53(10.253.48.3)
;; WHEN: Tue Sep 11 15:08:38 Eastern Daylight Time 2018
;; MSG SIZE  rcvd: 68

@johnjaylward
Copy link
Author

johnjaylward commented Sep 11, 2018

@normanmaurer I'm not sure I can configure Redisson to use a specific DNS server...

@normanmaurer
Copy link
Member

@johnjaylward interesting so at least one server returns NXDOMAIN. This is on what OS ? Can you also show me the contents of /etc/resolv.conf

@johnjaylward
Copy link
Author

johnjaylward commented Sep 12, 2018

it's on windows 10 in this instance. When I was stepping through the calls the server list contained all 3 primary servers. I can't remember if it contained the secondary over the VPN. If I have time today, I'll try to step through again and see which servers were listed.

@utsavchanda
Copy link

utsavchanda commented Sep 18, 2018

I am getting similar kind of error with redisson version 3.7.2 and netty version 4.1.25.Final

Unable to resolve xxxx.redis.cache.windows.net - java.net.UnknownHostException: failed to resolve xxxx.redis.cache.windows.net' after 4 queries at io.netty.resolver.dns.DnsNameResolverContext.finishResolve(DnsNameResolverContext.java:721) at io.netty.resolver.dns.DnsNameResolverContext.tryToFinishResolve(DnsNameResolverContext.java:663) at io.netty.resolver.dns.DnsNameResolverContext.query(DnsNameResolverContext.java:306) at io.netty.resolver.dns.DnsNameResolverContext.query(DnsNameResolverContext.java:295) at io.netty.resolver.dns.DnsNameResolverContext.access$700(DnsNameResolverContext.java:60) at io.netty.resolver.dns.DnsNameResolverContext$3.operationComplete(DnsNameResolverContext.java:339) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) at io.netty.resolver.dns.DnsQueryContext.setFailure(DnsQueryContext.java:223) at io.netty.resolver.dns.DnsQueryContext.access$300(DnsQueryContext.java:42) at io.netty.resolver.dns.DnsQueryContext$4.run(DnsQueryContext.java:162) at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:125) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745) Caused by: io.netty.resolver.dns.DnsNameResolverTimeoutException: [/8.8.4.4:53] query timed out after 5000 milliseconds (no stack trace available)

@mrniko
Copy link

mrniko commented Sep 25, 2018

@normanmaurer

Could this error redisson/redisson#1646 related to this issue either?

@normanmaurer
Copy link
Member

@johnjaylward @utsavchanda any more details ?

@johnjaylward
Copy link
Author

I don't at this time. I was mainly looking at upgrading the libraries as part of a larger task and have since put down the upgrade due to this issue and moved onto the rest of the task. I'm not sure when I'll have time to investigate it further.

@normanmaurer
Copy link
Member

Sorry but without more infos I suspect I will not be able to help :(

@normanmaurer
Copy link
Member

@johnjaylward @mrniko after re-reading the RFC closely and inspecting our code again I think I found the bug. Can you please check if #8731 works for you ?

Also thanks to @Lukasa to discuss this with me :)

@normanmaurer
Copy link
Member

@mrniko also can you test removing .resolvedAddressTypes(ResolvedAddressTypes.IPV4_ONLY); ?

@mrniko
Copy link

mrniko commented Jan 18, 2019

@normanmaurer

Issue came back if I comment out this code .resolvedAddressTypes(ResolvedAddressTypes.IPV4_ONLY);

@normanmaurer
Copy link
Member

@mrniko ok cool... With #8731 the problem is gone also when .resolvedAddressTypes(ResolvedAddressTypes.IPV4_ONLY) is NOT used (at least for me). Can you try it out ?

@normanmaurer normanmaurer added this to the 4.1.33.Final milestone Jan 18, 2019
@mrniko
Copy link

mrniko commented Jan 18, 2019

@normanmaurer
Everything works fine with group object:

DnsAddressResolverGroup group = new DnsAddressResolverGroup(new DnsNameResolverBuilder()
                .channelType(NioDatagramChannel.class)
                .nameServerProvider(DnsServerAddressStreamProviders.platformDefault())
                .resolvedAddressTypes(ResolvedAddressTypes.IPV4_ONLY));

and doesn't work if group object is follow:

DnsAddressResolverGroup group = new DnsAddressResolverGroup(new DnsNameResolverBuilder()
                .channelType(NioDatagramChannel.class)
                .nameServerProvider(DnsServerAddressStreamProviders.platformDefault()));

@mrniko
Copy link

mrniko commented Jan 18, 2019

@normanmaurer
Do you have netty-all.jar for the latest version?

@normanmaurer
Copy link
Member

@mrniko let me build one for you with the pr included... one sec

@normanmaurer
Copy link
Member

normanmaurer commented Jan 18, 2019

@mrniko https://drive.google.com/open?id=11pNwvCkl3ECB3CpjlUzDnuG0m3Td27yg please try with this jar and report back.

@mrniko
Copy link

mrniko commented Jan 18, 2019

@normanmaurer

This page by this link reports 404

@normanmaurer
Copy link
Member

@mrniko just fixed the link... please try again

@mrniko
Copy link

mrniko commented Jan 18, 2019

@normanmaurer

Now it works. Thank you!

@normanmaurer
Copy link
Member

@mrniko the code or the link, or both ? ;)

@mrniko
Copy link

mrniko commented Jan 18, 2019

@normanmaurer

Both :)

@normanmaurer
Copy link
Member

Yeah :) ... Please note in the PR as well...

normanmaurer added a commit that referenced this issue Jan 18, 2019
…other nameservers are left. (#8731)

Motivation:

When using multiple nameservers and a nameserver respond with NXDOMAIN we should only fail the query if the nameserver in question is authoritive or no nameservers are left to try.

Modifications:

- Try next nameserver if NXDOMAIN was returned but the nameserver is not authoritive
- Adjust testcase to respect correct behaviour.

Result:

Fixes #8261
normanmaurer added a commit that referenced this issue Jan 18, 2019
…other nameservers are left. (#8731)

Motivation:

When using multiple nameservers and a nameserver respond with NXDOMAIN we should only fail the query if the nameserver in question is authoritive or no nameservers are left to try.

Modifications:

- Try next nameserver if NXDOMAIN was returned but the nameserver is not authoritive
- Adjust testcase to respect correct behaviour.

Result:

Fixes #8261
@hlms
Copy link

hlms commented Jan 23, 2019

@normanmaurer ,
Thanks for fixing this.

We attempted to workaround this by setting dnsMonitoringInterval to -1.
config.useSingleServer().setDnsMonitoringInterval(-1);
and surprisingly, it seems to work.

However, JAVA_HOME\jre\lib\security\java.security had the following parameters:

# caching forever
networkaddress.cache.ttl=-1  

# 10 seconds.
networkaddress.cache.negative.ttl=10 

Do you think it is a legitimate workaround?

Also posted the question in another issue: redisson/redisson#1486 (comment)

@normanmaurer
Copy link
Member

@hlms I have no idea how exactly redission uses it so you will need to ask there.

stsatlantis added a commit to stsatlantis/algoliasearch-client-scala that referenced this issue Apr 4, 2019
This version contains the fix for the DNS Resolution bug found in netty.
netty/netty#8261
@BenyahyaF45
Copy link

Captureh
Any Help Plz!

@Sigty
Copy link

Sigty commented Jan 11, 2021

Add this propertie for your custom services:

eureka.instance.prefer-ip-address=true

andxu added a commit to microsoft/azure-maven-plugins that referenced this issue Apr 18, 2021
2. add global http proxy configuration and set proxy in http client
andxu added a commit to microsoft/azure-maven-plugins that referenced this issue Apr 18, 2021
2. add global http proxy configuration and set proxy in http client
andxu added a commit to microsoft/azure-maven-plugins that referenced this issue Apr 18, 2021
2. add global http proxy configuration and set proxy in http client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants