Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS IPv6 UDP - No Route to Host (Works on Linux and Windows) #11563

Closed
tmulle opened this issue Aug 7, 2021 · 21 comments · Fixed by #12019
Closed

MacOS IPv6 UDP - No Route to Host (Works on Linux and Windows) #11563

tmulle opened this issue Aug 7, 2021 · 21 comments · Fixed by #12019
Milestone

Comments

@tmulle
Copy link

tmulle commented Aug 7, 2021

Hi,

I initially created this ticket over on Vert.x, but thought I'd also post it here just in case it is a Netty issue.

I'm sorry for cross posting, since the stack trace shows both Netty and Vert.x I thought I'd share it in both places
in case it turns out to be both frameworks needing a fix.

Please feel free to move or close the issue if this isn't the correct place.

This bug is preventing me from converting our existing legacy Java networking code to using either Netty or Vert.x for UDP handling.

Thanks!

Steps to reproduce

This is the original bug I reported: eclipse-vertx/vert.x#4059

Minimal yet complete reproducer code (or URL to code)

Here is the repo where I have a pretty comprehensive test clients showing both Netty/Vertx and raw Java networking
https://github.com/tmulle/VertxNettyUDP6Tester

@NiteshKant
Copy link
Member

cc @vietj

@normanmaurer
Copy link
Member

I wonder if this is related to #11585 and so #11541

@normanmaurer normanmaurer added this to the 4.1.68.Final milestone Aug 18, 2021
@tmulle
Copy link
Author

tmulle commented Oct 22, 2021

Is there anymore information on this? Any more testing I can help with? I noticed it was pushed to the next milestone. It's really blocking us from using Netty/Vert.x for UDP, since we want to support Windows, Linux and MacOS.

@tmulle
Copy link
Author

tmulle commented Oct 22, 2021

Update: If I run my code using JDK17 with both the Sender and Receiver on the same host, things work fine with UDP IP6. However, if I switch back to JDK11 try both the Sender and Receiver on the same host, I get the error as before.

Trying the Sender on a linux machine and my Receiver (Device Mode) on my Mac, still fails under both JDKs when trying to respond back to the sender UDP address and port.

@tmulle
Copy link
Author

tmulle commented Jan 17, 2022

Any further ideas on this? Am I the only one using VertX UDP in this manner?

Is there any further testing you'd like me to try?

@tmulle
Copy link
Author

tmulle commented Jan 17, 2022

I turned on Netty Debugging and this is the output when it read a packet and tried to respond back to the destination address. I looks like it might be dropping the SCOPE_ID? the %15 when trying to send out? I notice the ID on the other addresses but not on the outgoing attempt... Just guessing at this point.

18:07:16.582 DEBUG i.n.handler.logging.LoggingHandler - [id: 0xd926b49a] READ: DatagramPacket(/[fe80:0:0:0:1c31:d1d1:4824:72a9%15]:60944 => /[0:0:0:0:0:0:0:0]:35056, PooledUnsafeDirectByteBuf(ridx: 0, widx: 41, cap: 2048)), 41B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 7b 22 61 63 74 69 6f 6e 22 3a 22 52 45 51 55 45 |{"action":"REQUE|
|00000010| 53 54 22 2c 22 72 65 73 70 6f 6e 73 65 50 6f 72 |ST","responsePor|
|00000020| 74 22 3a 35 39 33 35 37 7d                      |t":59357}       |
+--------+-------------------------------------------------+----------------+
18:07:16.592 INFO  c.m.v.Tester$Device - Received Discovery Packet from fe80:0:0:0:1c31:d1d1:4824:72a9%15:60944 with data {"action":"REQUEST","responsePort":59357}
18:07:16.619 INFO  c.m.v.Tester$Device - Sending response back to fe80:0:0:0:1c31:d1d1:4824:72a9%15:59357 with data - {"id":"6479f232-09f7-4513-8920-02148851421b","date":"2022-01-17T18:07:16.597884"}
18:07:16.621 DEBUG i.n.handler.logging.LoggingHandler - [id: 0xd926b49a] WRITE: DatagramPacket(=> /[fe80:0:0:0:1c31:d1d1:4824:72a9]:59357, UnreleasableByteBuf(UnpooledDuplicatedByteBuf(ridx: 0, widx: 81, cap: 256, unwrapped: UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeHeapByteBuf(ridx: 0, widx: 81, cap: 256)))), 81B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 7b 22 69 64 22 3a 22 36 34 37 39 66 32 33 32 2d |{"id":"6479f232-|
|00000010| 30 39 66 37 2d 34 35 31 33 2d 38 39 32 30 2d 30 |09f7-4513-8920-0|
|00000020| 32 31 34 38 38 35 31 34 32 31 62 22 2c 22 64 61 |2148851421b","da|
|00000030| 74 65 22 3a 22 32 30 32 32 2d 30 31 2d 31 37 54 |te":"2022-01-17T|
|00000040| 31 38 3a 30 37 3a 31 36 2e 35 39 37 38 38 34 22 |18:07:16.597884"|
|00000050| 7d                                              |}               |
+--------+-------------------------------------------------+----------------+
18:07:16.622 DEBUG i.n.handler.logging.LoggingHandler - [id: 0xd926b49a] FLUSH
java.net.NoRouteToHostException: No route to host
        at java.base/sun.nio.ch.DatagramChannelImpl.send0(Native Method)
        at java.base/sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(DatagramChannelImpl.java:901)
        at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:863)
        at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:821)
        at io.netty.channel.socket.nio.NioDatagramChannel.doWriteMessage(NioDatagramChannel.java:296)
        at io.netty.channel.nio.AbstractNioMessageChannel.doWrite(AbstractNioMessageChannel.java:143)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:949)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:913)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742)
        at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728)
        at io.netty.handler.logging.LoggingHandler.flush(LoggingHandler.java:304)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306)
        at io.vertx.core.datagram.impl.DatagramSocketImpl.lambda$send$3(DatagramSocketImpl.java:380)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:552)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
        at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:184)
        at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:35)
        at io.vertx.core.datagram.impl.DatagramSocketImpl.send(DatagramSocketImpl.java:378)
        at com.mycompany.vertxnettyip6udptester.Tester$Device.respondToDiscovery(Tester.java:299)

@tmulle
Copy link
Author

tmulle commented Jan 18, 2022

@normanmaurer I found this Netty issue (#267) which discusses the stripping of SCOPE_ID and how there was some sort of "hack" that was removed because it broke MacOS.

I believe that is exactly what I'm running into when trying to send UDP over a link-local address, the SCOPED_ID is being stripped somehow.

I see the address properly in the logs but when it gets to NioDatagramChannel.doWriteMessage() the scope id has been set to 0..when in my case it should be 15.

I see where the address is being stripped in DatagramSocketImpl.java:

@Override
  public Future<Void> send(Buffer packet, int port, String host) {
    Objects.requireNonNull(packet, "no null packet accepted");
    Objects.requireNonNull(host, "no null host accepted");
    if (port < 0 || port > 65535) {
      throw new IllegalArgumentException("port out of range:" + port);
    }
    AddressResolver resolver = context.owner().addressResolver();
    PromiseInternal<Void> promise = context.promise();
    io.netty.util.concurrent.Future<InetSocketAddress> f1 = resolver.resolveHostname(context.nettyEventLoop(), host);
    f1.addListener((GenericFutureListener<io.netty.util.concurrent.Future<InetSocketAddress>>) res1 -> {
      if (res1.isSuccess()) {
        ChannelFuture f2 = channel.writeAndFlush(new DatagramPacket(packet.getByteBuf(), new InetSocketAddress(f1.getNow().getAddress(), port)));
        if (metrics != null) {
          f2.addListener(fut -> {
            if (fut.isSuccess()) {
              metrics.bytesWritten(null, SocketAddress.inetSocketAddress(port, host), packet.length());
            }
          });
        }
        f2.addListener(promise);
      } else {
        promise.fail(res1.cause());
      }
    });
    return promise.future();
  }

This line: "io.netty.util.concurrent.Future<InetSocketAddress> f1 = resolver.resolveHostname(context.nettyEventLoop(), host);"

The resolver "AddressResolver" just seems to create an "Unresolved Address" and then is passed to a resolver in my case appears to be "DnsResolverProvider" which I think is the issue and the stripping of the SCOPE_ID is happening.

In my Netty Log output I see these lines:

08:52:12.388 DEBUG i.n.r.d.DefaultDnsServerAddressStreamProvider - Default DNS servers: [/192.168.1.1:53] (sun.net.dns.ResolverConfiguration)

and then later on:

08:52:15.989 WARN i.n.r.d.DnsServerAddressStreamProviders - Can not find io.netty.resolver.dns.macos.MacOSDnsServerAddressStreamProvider in the classpath, fallback to system defaults. This may result in incorrect DNS resolutions on MacOS.

After the call to "resolver.resolve()" the scope is gone..

public io.netty.util.concurrent.Future<InetSocketAddress> resolveHostname(EventLoop eventLoop, String hostname) {
    io.netty.resolver.AddressResolver<InetSocketAddress> resolver = resolverGroup.getResolver(eventLoop);
    return resolver.resolve(InetSocketAddress.createUnresolved(hostname, 0));
  }

By the time this code from IpAddressUtil gets called to find the scoped address it's too late because the original scope has been stripped and the comparison from the incoming address to the ones found on the NetworkInterface fail because of the missing scope, causing an empty list to be returned.

@SuppressWarnings("removal")
    private static InetAddress findScopedAddress(InetAddress address) {
        PrivilegedExceptionAction<List<InetAddress>> pa = () -> NetworkInterface.networkInterfaces()
                .flatMap(NetworkInterface::inetAddresses)
                .filter(a -> (a instanceof Inet6Address)
                        && address.equals(a)
                        && ((Inet6Address) a).getScopeId() != 0)
                .toList();
        List<InetAddress> result;
        try {
            result = AccessController.doPrivileged(pa);
            var sz = result.size();
            if (sz == 0)
                return null;
            if (sz > 1)
                throw new UncheckedIOException(new SocketException(
                    "Duplicate link local addresses: must specify scope-id"));
            return result.get(0);
        } catch (PrivilegedActionException pae) {
            return null;
        }
    }

I ran the same filter code above using the AccessController in a standalone program to read all my network cards and it found all the scoped addresses just fine.

So, if I had to guess, that is where the problem lies.

Something in the RESOLVER is stripping out the SCOPE_ID

@normanmaurer
Copy link
Member

@tmulle so you say this only happens on JDK11 on macOS ?

@tmulle
Copy link
Author

tmulle commented Jan 18, 2022

@normanmaurer JDK11 and JDK17 on both Intel and ARM based JDKs. I originally noticed this issue on my Intel MacBook Pro with JDK11 and JDK17. I have moved to a M1 Mac and the same error is occurring with both JDKs (native..non Rosetta)

@normanmaurer
Copy link
Member

but only on macOs ?

@tmulle
Copy link
Author

tmulle commented Jan 18, 2022

yes..Windows and Linux seem fine last I tried. It's been a while, I will try on my Fedora linux system at work in a little while

@normanmaurer
Copy link
Member

@tmulle ok perfect... let me dig in

@tmulle
Copy link
Author

tmulle commented Jan 18, 2022

@normanmaurer Perfect thanks! Hope you can find something.. I wrote a Network interface utility to print out all the information about the cards found on a system. Shows things like Scope ID, etc.

Feel free to use it: https://github.com/tmulle/NetworkInformationUtility.git

normanmaurer added a commit that referenced this issue Jan 18, 2022
Motivation:

Due a bug we did strip the scopeId of the ipv6 address string when using the DnsNameResolver. This could later then result to things like "No Route to host" exceptions.

Modifications:

- Add new method to NetUtil which will create the InetAddress for an string that contains an ip while still preserve the scopeId.
- Use this new method
- Add unit tests.

Result:

Fixes #11563
@normanmaurer
Copy link
Member

normanmaurer commented Jan 18, 2022

@tmulle #12019 ... This should have been ailing on windows and linux as well tho. My guess here is that windows and linux didn't use a scopeId.

@tmulle
Copy link
Author

tmulle commented Jan 18, 2022

@normanmaurer Yeah, unfortunately, I don't have two bare metal machines (Linux -> Linux) or (Windows->Windows) to run my test code on. It works fine when I use Linux -> Windows VM.

I found this discussion about % in Ipv6 addresses.. packed full of information.

https://superuser.com/questions/99746/why-is-there-a-percent-sign-in-the-ipv6-address

@tmulle
Copy link
Author

tmulle commented Jan 18, 2022

@normanmaurer FYI - I just tried (MacOS -> Linux VM) and the responding from MacOS back to Linux failed with the 'No Route to Host' so it's definitely something on the MacOS side.

normanmaurer added a commit that referenced this issue Jan 19, 2022
Motivation:

Due a bug we did strip the scopeId of the ipv6 address string when using the DnsNameResolver. This could later then result to things like "No Route to host" exceptions.

Modifications:

- Add new method to NetUtil which will create the InetAddress for an string that contains an ip while still preserve the scopeId.
- Use this new method
- Add unit tests.

Result:

Fixes #11563
normanmaurer added a commit that referenced this issue Jan 19, 2022
Motivation:

Due a bug we did strip the scopeId of the ipv6 address string when using the DnsNameResolver. This could later then result to things like "No Route to host" exceptions.

Modifications:

- Add new method to NetUtil which will create the InetAddress for an string that contains an ip while still preserve the scopeId.
- Use this new method
- Add unit tests.

Result:

Fixes #11563
@normanmaurer
Copy link
Member

This should be fixed by the next netty release (4.1.74.Final)

@tmulle
Copy link
Author

tmulle commented Jan 19, 2022

@normanmaurer Great! is there a build I can try out before the official release? I tried to build your 'dont_strip_scope_id' branch but I kept getting errors about the native transport and OpenSSL tests failed.

When do you anticipate the next official release?

10brothers pushed a commit to 10brothers/netty that referenced this issue Jan 20, 2022
Motivation:

Due a bug we did strip the scopeId of the ipv6 address string when using the DnsNameResolver. This could later then result to things like "No Route to host" exceptions.

Modifications:

- Add new method to NetUtil which will create the InetAddress for an string that contains an ip while still preserve the scopeId.
- Use this new method
- Add unit tests.

Result:

Fixes netty#11563
@tmulle
Copy link
Author

tmulle commented Feb 8, 2022

@normanmaurer I just verified that this now works in 4.1.74.Final. I had to override the Vert.x Netty versions it is pulling in since they are still using 4.1.72 officially.

But once they update to 4.1.74 things are good.

Thanks to everyone who helped!

@vietj
Copy link
Contributor

vietj commented Feb 10, 2022

thanks for this contribution @tmulle

@vietj
Copy link
Contributor

vietj commented Feb 10, 2022

and thanks everyone who helped :-)

raidyue pushed a commit to raidyue/netty that referenced this issue Jul 8, 2022
Motivation:

Due a bug we did strip the scopeId of the ipv6 address string when using the DnsNameResolver. This could later then result to things like "No Route to host" exceptions.

Modifications:

- Add new method to NetUtil which will create the InetAddress for an string that contains an ip while still preserve the scopeId.
- Use this new method
- Add unit tests.

Result:

Fixes netty#11563
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants