DNS resolver - default server addresses contain stale entries #10264

martin-traverse · 2020-05-09T11:42:24Z

Hi,

I think I've found an issue in the way the default set of name servers is populated, affecting Windows 10 / JDK 11. When moving a machine between networks, in some cases stale DNS servers (i.e. servers from a network the machine is no longer connected to) can be picked up by the JNDI bootstrap mechanism. If one of the stale entries is first in the list it can stop name resolution from working (or at best cause delays). Of course it is possible to work around this by passing in a set of DNS servers somehow determined higher up the stack, or changing the behaviour of the DNS resolution mechanism. However I wonder if it could be worth fixing the discovery mechanism in Netty itself? The issue is perhaps less likely to affect production systems with a fixed set of interfaces, but I guess developers are reasonably likely to encounter the issue on their development machines moving e.g. between work and home, which will still result in fixes being put into client code, startup scripts or other places where they probably shouldn't be!

One possible approach is to look at the address of each name server and see if it matches an interface address in NetworkInterface.networkInterfaces(). IPv4 addresses in private ranges that do not match an interface address, or where interface.isUp() is false, could be excluded or at least pushed down the list. Possibly slightly trickier for IPv6, but it should still be possible to prioritise servers on the local subnet without resorting to platform-specific code. There is also the super simple option of just calling isReachable for each name server, which would risk excluding temporarily unavailable servers and increase startup time, not sure if this is desirable. My gut feel is that some sort of "prioritise and filter" method could be applied to the discovered set of servers before they are decided, which might evolve to contain a few special cases.

Hope this is helpful!

Expected behavior

DefaultDnsServerAddressStreamProvider, DEFAULT_NAME_SERVER_LIST should be populated with the set of name servers in use by the system.

Actual behavior

DefaultDnsServerAddressStreamProvider, DEFAULT_NAME_SERVER_LIST can include stale name servers, assigned to network interfaces that are not currently active. These stale entries are supplied by the underlying JNDI mechanism and can lead to name resolution failure in client code.

The issue applies when a machine is moved between networks and connects using different interfaces, e.g. physical ethernet in one location and wireless in another. I have OpenJDK 11 running on Windows 10, I have not tested other configurations.

Steps to reproduce

This is not hard but slightly cumbersome! The particular case I have is moving between an office and home network. The office network has ethernet cables, someone plugged one in (probably when the machine was first set up) and it configured that interface to a private network (10.x.x.x). DNS servers were configured on the interface under that private range. Subsequently I am using the machine on a wireless networks, which configure the wireless interface but leave the ethernet interface as it is. According to ipconfig /all everything looks good, the details of the physical interface are hidden because it is not active. However the DNS entries for that interface are still being passed through the Sun JNDI bootstrap mechanism. Digging into the registry, it is possible to see they are still present under this key:

Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces

The physical interface also has a DHCP domain set which matches the machine's assigned domain - I guess that could be related to why they are getting passed through even though the interface is inactive. In any event, it seems as if at least on Windows the JNDI mechanism can pass through stale DNS entries in the scenario where machines are moved between two networks using a different interface in each location, particularly moving between a physical office network and a wireless home network, which could be quite common for developers.

Minimal yet complete reproducer code (or URL to code)

I discovered this issue through the VertX container. Just calling Vertx.vertx() creates a container with the default configuration, the unreachable servers are then visible in the debugger.

Netty version

4.1.48

JVM version (e.g. `java -version`)

openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.6+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.18.1, JRE 11 Windows 10 amd64-64-Bit Compressed References 20200122_442 (JIT enabled, AOT enabled)
OpenJ9 - 51a5857d2
OMR - 7a1b0239a
JCL - da35e0c380 based on jdk-11.0.6+10)

OS version (e.g. `uname -a`)

Windows 10
Version 1909 (OS Build 18363.778)

The text was updated successfully, but these errors were encountered:

normanmaurer · 2020-05-11T13:24:25Z

@martin-traverse I happy to review a PR with a fix... I have no windows dev environment to troubleshoot this :/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS resolver - default server addresses contain stale entries #10264

DNS resolver - default server addresses contain stale entries #10264

martin-traverse commented May 9, 2020

normanmaurer commented May 11, 2020

DNS resolver - default server addresses contain stale entries #10264

DNS resolver - default server addresses contain stale entries #10264

Comments

martin-traverse commented May 9, 2020

Expected behavior

Actual behavior

Steps to reproduce

Minimal yet complete reproducer code (or URL to code)

Netty version

JVM version (e.g. java -version)

OS version (e.g. uname -a)

normanmaurer commented May 11, 2020

JVM version (e.g. `java -version`)

OS version (e.g. `uname -a`)