Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP6 broken on armhf (and probably other 32 bit arches) #20564

Closed
mwhudson opened this issue Aug 29, 2021 · 12 comments · Fixed by #20567
Closed

DHCP6 broken on armhf (and probably other 32 bit arches) #20564

mwhudson opened this issue Aug 29, 2021 · 12 comments · Fixed by #20567

Comments

@mwhudson
Copy link

systemd version the issue has been seen with

248.3

The code looks the same in main though.

Used distribution

Ubuntu 21.10

Linux kernel version used (uname -a)

Linux impish 5.4.0-1028-raspi #31-Ubuntu SMP PREEMPT Wed Jan 20 11:30:45 UTC 2021 armv8l armv8l armv8l GNU/Linux

But it's also important that this all happens when glibc is built on a system with kernel before 5.1

CPU architecture issue was seen on

armhf

Expected behaviour you didn't see

python3 networkd-test.py passing

Unexpected behaviour you saw

test_coldplug_dhcp_ip6 failed with "timed out waiting for IPv6 configuration"

Steps to reproduce the problem

It's a bit complicated but https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1940635/comments/8 has the details

Additional program output to the terminal or log subsystem illustrating the issue

If you crank up networkd's log level you get this in the output:

Aug 27 14:28:29 autopkgtest systemd-networkd[939]: NDISC: Unexpected error while reading from ICMPv6, ignoring: Exchange full
@mwhudson
Copy link
Author

mwhudson commented Aug 29, 2021

So what's going on here?

In this set up libc is built on a system with a fairly old kernel, so __ASSUME_TIME64_SYSCALLS is not set in its build process. That means that its recvmsg wrapper calls convert_scm_timestamps which translates the "old" format timestamps to "new" and tries to append this to msg_control. But networkd passes exactly as large a buffer as it needs in icmp6_receive so convert_scm_timestamps sets MSG_CTRUNC instead. This causes recvmsg_safe to error and then IPv6 configuration does not happen and the test fails.

I'm not really sure how much of what glibc is doing makes sense really. But it's easy to work around this in systemd by passing a larger buffer in icmp6_receive.

@yuwata
Copy link
Member

yuwata commented Aug 29, 2021

Maybe related to #20482.

@mwhudson
Copy link
Author

This is the very simple patch I'm uploading to Ubuntu fwiw:

Description: pass a larger buffer in icmp6_receive
 To have enough room for glibc 2.34's timestamp translation games.
Author: Michael Hudson-Doyle <michael.hudson@ubuntu.com>
Origin: vendor
Bug: https://github.com/systemd/systemd/issues/20564
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1940635
Last-Update: 2021-08-29
---
This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
--- a/src/libsystemd-network/icmp6-util.c
+++ b/src/libsystemd-network/icmp6-util.c
@@ -147,9 +147,14 @@
 
 int icmp6_receive(int fd, void *buffer, size_t size, struct in6_addr *ret_dst,
                   triple_timestamp *ret_timestamp) {
-
+#ifdef __arm__
+        CMSG_BUFFER_TYPE(CMSG_SPACE(sizeof(int)) + /* ttl */
+                         CMSG_SPACE(sizeof(int64_t)*2) + /* 64 bit timestamp */
+                         CMSG_SPACE(sizeof(struct timeval))) control;
+#else
         CMSG_BUFFER_TYPE(CMSG_SPACE(sizeof(int)) + /* ttl */
                          CMSG_SPACE(sizeof(struct timeval))) control;
+#endif
         struct iovec iov = {};
         union sockaddr_union sa = {};
         struct msghdr msg = {

@mwhudson
Copy link
Author

Maybe related to #20482.

It could be, if the libc/kernel versions line up as in this report. The red flag for this issue I think is recvmsg_safe failing with -EXFULL but the strace output showing msg_flags==0.

@yuwata
Copy link
Member

yuwata commented Aug 29, 2021

I think this is a glibc issue. Even if we pass large space, then the condition cmsg->cmsg_len == CMSG_LEN(sizeof(struct timeval)) in icmp6-util.c should not be satisfied. icmp6_receive() has fallback mechanism when we do not receive timestamp, but we have no way for #20482 case...

@mwhudson
Copy link
Author

Even if we pass large space, then the condition cmsg->cmsg_len == CMSG_LEN(sizeof(struct timeval)) in icmp6-util.c should not be satisfied.

Why do you say that? systemd does not define _TIME_BITS to 64, which triggers both "sizeof(struct timeval)" and "SO_TIMESTAMP" to change in value (or at least I think that's what happens, it's all fairly confusing to me).

@yuwata
Copy link
Member

yuwata commented Aug 29, 2021

Even if we pass large space, then the condition cmsg->cmsg_len == CMSG_LEN(sizeof(struct timeval)) in icmp6-util.c should not be satisfied.

Why do you say that?

Hmm, I was wrong. I read __convert_scm_timestamps() now, and it just appends 64bit timestamp, instead of modifying timestamp from 32bit -> 64bit.

@yuwata
Copy link
Member

yuwata commented Aug 29, 2021

Could you test #20567?

@fweimer-rh
Copy link
Contributor

In glibc, we can split the recvmsg and __recvmsg_time64 implementations so that the rewriting only happens for the latter. (That's why I requested the addition of the __recvmsg_time64 alias in the first place.)

The core issue here is that we cannot tell from the descriptor passed to recvmsg if a time64 setsockopt call has been applied to it that required 32-bit fallback. That's the case in which the rewriting is required. It was felt that the setsockopt fallback (and other emulation) was needed to support older kernels for time64 applications, otherwise time64 applications would only run on fairly recent kernels.

yuwata added a commit to yuwata/systemd that referenced this issue Aug 29, 2021
@poettering
Copy link
Member

@fweimer-rh did i get this right, on such archs/kernels recvmsg will have two auxiliary fields attached? the 32bit and the 64bit version of timespec?

@fweimer-rh
Copy link
Contributor

Yes, the kernel produces the 32-bit version (as requested via setsockopt), and the emulation unconditionally adds the 64-bit version.

@poettering
Copy link
Member

Can't glibc figure out automatically which timespec is required? i.e. pick a different recvmsg stub depending if some 64bit-timespec macro is seen/not seen in the glibc header files?

I mean, appending both sounds like something that will necessarily break any program that tries to estimate how much auxiliary buffer space to provide to recvmsg().

I mean, how are programs supposed to size the auxiliary buffer otherwise?

yuwata added a commit to yuwata/systemd that referenced this issue Aug 30, 2021
pld-gitsync pushed a commit to pld-linux/systemd that referenced this issue Sep 26, 2021
see systemd/systemd#20564
while the issue is about dhcp, it's actually about any recvmsg_safe user
in systemd such as timesyncd
codepeon pushed a commit to codepeon/systemd that referenced this issue Oct 14, 2021
…dditional 64bit timeval or timespec

Fixes systemd#20482 and systemd#20564.

(cherry picked from commit 9365e29)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants