Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libuv 1.47.0: tcp_connect6_link_local fails on Fedora (all arches) #4211

Closed
sgallagher opened this issue Nov 8, 2023 · 8 comments · Fixed by #4220
Closed

libuv 1.47.0: tcp_connect6_link_local fails on Fedora (all arches) #4211

sgallagher opened this issue Nov 8, 2023 · 8 comments · Fixed by #4220

Comments

@sgallagher
Copy link
Contributor

  • Version: 1.47.0
  • Platform: Fedora Linux 38 and 39 for x86_64, aarch64, powerpc64le and s390x

Looks like it's related to #4107

For build logs and detailed hardware info, see https://koji.fedoraproject.org/koji/taskinfo?taskID=108752390

not ok 310 - tcp_connect6_link_local
# exit code 134
# Output from process `tcp_connect6_link_local`:
# Assertion failed in /builddir/build/BUILD/libuv-v1.47.0/test/test-tcp-connect6-error.c on line 102: `uv_tcp_connect(&req, &server, (struct sockaddr*) &addr, connect_cb) == 0` (-22 == 0)
@bnoordhuis
Copy link
Member

That's a newly added test that checks that connecting to a link-local address works (fe80::/10, fe80::0bad:babe in particular.)

That the connect() system call fails with EINVAL makes me suspect it's one of a few reasons. Maybe you can check?

  1. IPv6 not enabled on the host (unlikely because then other tests would fail too)
  2. No network device that accepts link-local traffic (unlikely but possible)
  3. Multiple network devices that accept link-local traffic; libuv picks the wrong one
  4. Link-local traffic not routable
  5. Link-local traffic blocked by a firewall rule

@sgallagher
Copy link
Contributor Author

From my local machine (aarch64 VM):

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: enp0s5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:1c:42:71:11:17 brd ff:ff:ff:ff:ff:ff
    inet 10.211.55.4/24 brd 10.211.55.255 scope global dynamic noprefixroute enp0s5
       valid_lft 1800sec preferred_lft 1800sec
    inet6 fdb2:2c26:f4e4:0:d3f8:c3a4:2109:2d18/64 scope global dynamic noprefixroute 
       valid_lft 2591822sec preferred_lft 604622sec
    inet6 fe80::ecf0:a758:b9d7:a4f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
$ ip -6 route
fdb2:2c26:f4e4::/64 dev enp0s5 proto ra metric 100 pref medium
fe80::/64 dev enp0s5 proto kernel metric 1024 pref medium
default via fe80::21c:42ff:fe00:18 dev enp0s5 proto ra metric 100 pref medium

This actually results in a different error than we're getting on the build machines:

not ok 310 - tcp_connect6_link_local
# exit code 139
# Output from process `tcp_connect6_link_local`: (no output)

As far as the link-local traffic, I'm using an out-of-the-box configuration for the firewall:

$ firewall-cmd --list-all
FedoraWorkstation (default, active)
  target: default
  ingress-priority: 0
  egress-priority: 0
  icmp-block-inversion: no
  interfaces: enp0s5
  sources: 
  services: dhcpv6-client mdns samba-client ssh
  ports: 1025-65535/udp 1025-65535/tcp
  protocols: 
  forward: yes
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

@sgallagher
Copy link
Contributor Author

Sorry, I just realized the information above was incomplete. It appears that it works fine in that configuration. I had actually had a VPN connection in place during the failure case, which adds a second fe80:: link-local address, which is likely triggering your option 3 here.

There's definitely a different error for the builders, though: they have limited network capability to ensure reproducible builds (the network is "up", but outgoing communication is blocked. I don't know all the details.)

When built in "mock", the error is as I originally reported:

not ok 310 - tcp_connect6_link_local
# exit code 134
# Output from process `tcp_connect6_link_local`:
# Assertion failed in /builddir/build/BUILD/libuv-v1.47.0/test/test-tcp-connect6-error.c on line 102: `uv_tcp_connect(&req, &server, (struct sockaddr*) &addr, connect_cb) == 0` (-22 == 0)

I can modify our build to skip this test if the test is faulty, but I'm concerned that it's revealing a real bug since this is new functionality in libuv.

@sgallagher
Copy link
Contributor Author

The segmentation fault when I have a VPN connected is caused by a NULL-dereference. I've submitted #4218 to catch that case and avoid it.

I still haven't been able to entirely track down the EINVAL issue, but I can confirm that it's being thrown by the libc call to connect(2) which is not being handled by the calling code.

@bnoordhuis
Copy link
Member

Can you strace the connect call? I'd like to see what interface libuv picked (enp0s5?)

@sgallagher
Copy link
Contributor Author

Here's a full strace of UV_TEST_TIMEOUT_MULTIPLIER=3000 strace -vfo /tmp/strace.txt -s 65536 ./redhat-linux-build/uv_run_tests tcp_connect6_link_local
strace.txt

This was done with a build of libuv that I created with -ggdb3 -O0 for debugging purposes:

CFLAGS='-O0 -flto=auto -ffat-lto-objects -fexceptions -g -ggdb3 -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer ' /usr/bin/cmake --fresh -S . -B redhat-linux-build -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON -DBUILD_TESTING=ON && /usr/bin/cmake --build redhat-linux-build -j8 --verbose

build.log

@bnoordhuis
Copy link
Member

63606 connect(13, {sa_family=AF_INET6, sin6_port=htons(1337), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::bad:babe", &sin6_addr), sin6_scope_id=0}, 28) = -1 EINVAL (Invalid argument)

sin6_scope_id=0 suggests libuv didn't find a suitable network device (option 2 above) and the koji build logs indicate the buildbots only have a lo device assigned so that would explain it.

# uv_interface_addresses:
#   name: lo
#   internal: 1
#   physical address: 00:00:00:00:00:00
#   address: 127.0.0.1
#   netmask: 255.0.0.0
#   name: lo
#   internal: 1
#   physical address: 00:00:00:00:00:00
#   address: ::1
#   netmask: ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff

I'll fix up the test to detect that condition. Anyone have opinions on whether to keep returning that EINVAL immediately or pass it on to the callback? It's morally equivalent to ENETUNREACH and we could report it as such.

@bnoordhuis
Copy link
Member

#4220

sgallagher pushed a commit to sgallagher/libuv-1 that referenced this issue Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants