Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multilib: test-event & test-firewall-util time out #19052

Closed
eworm-de opened this issue Mar 19, 2021 · 20 comments · Fixed by #19076
Closed

multilib: test-event & test-firewall-util time out #19052

eworm-de opened this issue Mar 19, 2021 · 20 comments · Fixed by #19076

Comments

@eworm-de
Copy link
Contributor

eworm-de commented Mar 19, 2021

systemd version the issue has been seen with

248-rc4

Used distribution

Arch Linux

Linux kernel version used (uname -a)

Linux leda 5.11.7-arch1-1 #1 SMP PREEMPT Wed, 17 Mar 2021 16:59:58 +0000 x86_64 GNU/Linux

CPU architecture issue was seen on

x86_64 (building 32-bit multilib package)

Expected behaviour you didn't see

tests pass

Unexpected behaviour you saw

two tests fail with timeout

Steps to reproduce the problem

build a multilib package that runs the tests

Additional program output to the terminal or log subsystem illustrating the issue

598/599 test-event                                TIMEOUT         30.03s   killed by signal 15 SIGTERM
>>> MALLOC_PERTURB_=48 SYSTEMD_KBD_MODEL_MAP=/build/lib32-systemd/src/systemd-stable/src/locale/kbd-model-map SYSTEMD_LANGUAGE_FALLBACK_MAP=/build/lib32-systemd/src/systemd-stable/src/locale/language-fallback-map PATH=/build/lib32-systemd/src/build:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl /build/lib32-systemd/src/build/test-event

599/599 test-firewall-util                        TIMEOUT         30.02s   killed by signal 15 SIGTERM
>>> MALLOC_PERTURB_=96 SYSTEMD_KBD_MODEL_MAP=/build/lib32-systemd/src/systemd-stable/src/locale/kbd-model-map SYSTEMD_LANGUAGE_FALLBACK_MAP=/build/lib32-systemd/src/systemd-stable/src/locale/language-fallback-map PATH=/build/lib32-systemd/src/build:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl /build/lib32-systemd/src/build/test-firewall-util
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this issue Mar 19, 2021
Currently this does build, but two tests fail with timeout...
systemd/systemd#19052

git-svn-id: file:///srv/repos/svn-community/svn@894544 9fca08f4-af9d-4005-b8df-a31f2cc04f65
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this issue Mar 19, 2021
Currently this does build, but two tests fail with timeout...
systemd/systemd#19052


git-svn-id: file:///srv/repos/svn-community/svn@894544 9fca08f4-af9d-4005-b8df-a31f2cc04f65
@mrc0mmand
Copy link
Member

Most likely related to #19042.

@eworm-de
Copy link
Contributor Author

That explains the timeout in test-event at lease.
But test-firewall-util does not use epoll_pwait2(), no? Also that does not fail for Fedora.

@poettering
Copy link
Member

"lib32"? is that x32 stuff? or proper i386?

@yuwata
Copy link
Member

yuwata commented Mar 19, 2021

@eworm-de Could you provide logs of both tests? Especially, test-firewall-util.

@yuwata yuwata added this to the v248 milestone Mar 19, 2021
@eworm-de
Copy link
Contributor Author

No x32 stuff, it's just i686 libraries to make proprietary 32 bit binaries run on x86_64 systems.

@eworm-de
Copy link
Contributor Author

test-event time out (After 30 seconds)
361/599 test-event                                TIMEOUT         30.17s   killed by signal 15 SIGTERM
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
/* test_basic(pidfd=yes) */
got IO on d
got IO on d
preparing c
preparing a
got IO on b
preparing c
preparing a
got timer on c
preparing a
got IO on a
preparing c
got timer on c
got post handler
got defer on d
got post handler
got signal on e
got post handler
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

test-firewall-util time out (After 30 seconds)
504/599 test-firewall-util                        TIMEOUT         30.27s   killed by signal 15 SIGTERM
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
Failed to modify firewall: Operation not supported
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――


Summary of Failures:

361/599 test-event                                TIMEOUT         30.17s   killed by signal 15 SIGTERM
504/599 test-firewall-util                        TIMEOUT         30.27s   killed by signal 15 SIGTERM

I think the test-event issue is handled in #19042.

I will investigate what is not supported for the firewall.

BTW, building the x86_64 package works without issue, including the tests.

@yuwata
Copy link
Member

yuwata commented Mar 19, 2021

Is it possible to run test-firewall-util with SYSTEMD_LOG_LEVEL=debug?

@yuwata
Copy link
Member

yuwata commented Mar 19, 2021

About test-event, #19042 or 4cbb372 do not solve the issue. So, fedora tentatively disables to use epoll_pwait2() in sd-event. See https://src.fedoraproject.org/rpms/systemd/blob/rawhide/f/0001-sd-event-do-not-use-epoll_pwait2-tentatively.patch.

I cannot find any reason why #19042 does not work...

@eworm-de
Copy link
Contributor Author

Ha, my test to disable epoll_pwait2() looks very similar. 😜 No idea why #19042 does not work, either.

For test-firewall-util the message is printed first after 30 seconds, then every 10 seconds.

@eworm-de
Copy link
Contributor Author

Looks like test-firewall-util does not hang but take a long time to finish:

# time ./test-firewall-util
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify ipv6 firewall: Operation not supported
Failed to modify ipv6 firewall: Operation not supported
Failed to modify ipv6 firewall: Operation not supported
Failed to modify ipv6 firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify firewall: Operation not supported
Failed to modify ipv6 firewall: Operation not supported
Failed to modify ipv6 firewall: Operation not supported

real    2m51.142s
user    0m0.003s
sys     0m0.007s

I do not see a difference with SYSTEMD_LOG_LEVEL=debug.

@yuwata
Copy link
Member

yuwata commented Mar 19, 2021

@eworm-de nftable related library is installed? nf_tables is enabled on kernel?

@yuwata
Copy link
Member

yuwata commented Mar 19, 2021

Hmm, several logs are added, but still I cannot find anything...

--- stderr ---
sd-event: epoll_wait_usec: timeout=1035938
sd-event: epoll_wait_usec: calling epoll_pwait2() with timeout=1035938
sd-event: epoll_wait_usec: Failed to call epoll_pwait2(): Function not implemented
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=1035938
/* test_basic(pidfd=yes) */
sd-event: epoll_wait_usec: timeout=18446744073709551615(infinity)
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=18446744073709551615(infinity)
sd-event: epoll_wait_usec: timeout=18446744073709551615(infinity)
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=18446744073709551615(infinity)
got IO on d
sd-event: epoll_wait_usec: timeout=18446744073709551615(infinity)
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=18446744073709551615(infinity)
got IO on d
preparing c
preparing a
sd-event: epoll_wait_usec: timeout=18446744073709551615(infinity)
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=18446744073709551615(infinity)
got IO on b
preparing c
preparing a
sd-event: epoll_wait_usec: timeout=0
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=0
got timer on c
preparing a
sd-event: epoll_wait_usec: timeout=0
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=0
got IO on a
preparing c
sd-event: epoll_wait_usec: timeout=18446744073709551615(infinity)
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=18446744073709551615(infinity)
got timer on c
sd-event: epoll_wait_usec: timeout=0
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=0
got post handler
sd-event: epoll_wait_usec: timeout=0
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=0
got defer on d
sd-event: epoll_wait_usec: timeout=0
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=0
got post handler
sd-event: epoll_wait_usec: timeout=0
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=0
got signal on e
sd-event: epoll_wait_usec: timeout=0
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=0
got post handler
sd-event: epoll_wait_usec: timeout=18446744073709551615(infinity)
sd-event: epoll_wait_usec: calling epoll_wait() with timeout=18446744073709551615(infinity)
-------

@eworm-de
Copy link
Contributor Author

It's the default Arch Linux kernel, so yes it is enabled. Even using it on the host.

Build is done inside systemd-nspawn, for the regular package and multilib package. First succeeds, latter fails.

@poettering
Copy link
Member

So by multilib you mean that 32bit userpace runs on a 64bit kernel?

@eworm-de
Copy link
Contributor Author

Yes. But not the whole system is 32bit. This is 64bit kernel with 64bit userspace, with 32bit libraries for specific 32bit applications.
https://archlinux.org/packages/multilib/x86_64/lib32-systemd/

yuwata pushed a commit to yuwata/systemd that referenced this issue Mar 22, 2021
This reverts the gist of commit 798445a.

Unfortunately the new syscall causes test-event to hang. 32 bit architectures
seem affected: i686 and arm32 in fedora koji. 32 bit build of test-event hangs
reliably under valgrind:

$ PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig meson build-32 -Dc_args=-m32 -Dc_link_args=-m32 -Dcpp_args=-m32 -Dcpp_link_args=-m32 && ninja -C build-32 test-event && valgrind build/test-event

If I set epoll_pwait2_absent=true, so the new function is never called, then
the issue does not reproduce. It seems to be strictly tied to the syscall.

On amd64, the syscall is not used, at least with the kernel that Fedora
provides. The kernel patch 58169a52ebc9a733aeb5bea857bc5daa71a301bb says:

  For timespec, only support this new interface on 2038 aware platforms
  that define __kernel_timespec_t. So no CONFIG_COMPAT_32BIT_TIME.

And Fedora sets CONFIG_COMPAT_32BIT_TIME=y. I expect most other distros will too.

On amd64: epoll_wait_usec: epoll_pwait2: ret=-1 / errno=38
On i686 (same kernel): epoll_wait_usec: epoll_pwait2: ret=2 / errno=0

Is this some kind of emulation? Anyway, it seems that this is what is going wrong.

So let's disable the syscall until it becomes more widely available and the
kinks have been ironed out.

Fixes test-event issue in systemd#19052.
@eworm-de
Copy link
Contributor Author

Does it help if I provide a strace log?

@eworm-de
Copy link
Contributor Author

Given that the epoll_pwait2 issue is resolved in #19073 and the firewall-util is not relevant for libsystemd.so (which I am interested in) I am fine if you drop this from milestone.

@yuwata
Copy link
Member

yuwata commented Mar 22, 2021

Does it help if I provide a strace log?

Yes, please.

@eworm-de
Copy link
Contributor Author

And here we go...
test-firewall-util.log

yuwata pushed a commit to yuwata/systemd that referenced this issue Mar 22, 2021
Unfortunately the new syscall causes test-event to hang. 32 bit architectures
seem affected: i686 and arm32 in fedora koji. 32 bit build of test-event hangs
reliably under valgrind:

$ PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig meson build-32 -Dc_args=-m32 -Dc_link_args=-m32 -Dcpp_args=-m32 -Dcpp_link_args=-m32 && ninja -C build-32 test-event && valgrind build/test-event

Fixes test-event issue in systemd#19052.
@yuwata
Copy link
Member

yuwata commented Mar 22, 2021

Thanks.

yuwata added a commit to yuwata/systemd that referenced this issue Mar 23, 2021
FirewallContext is used by networkd and nspawn. Both allocates the
context when it is really necessary. Hence, it is not necessary to delay
probing backend.

Moreover, if iptables backend is not enabled on build, and nftables is
not supported by kernel, previously `fw_nftables_init()` is called
everytime when we try to configure masquerade or dnat. It causes
significant paformance loss.

Fixes test-firewall-util issue in systemd#19052.
yuwata added a commit to yuwata/systemd that referenced this issue Mar 23, 2021
FirewallContext is used by networkd and nspawn. Both allocates the
context when it is really necessary. Hence, it is not necessary to delay
probing backend.

Moreover, if iptables backend is not enabled on build, and nftables is
not supported by kernel, previously `fw_nftables_init()` is called
everytime when we try to configure masquerade or dnat. It causes
significant performance loss.

Fixes test-firewall-util issue in systemd#19052.
keszybz added a commit that referenced this issue Mar 23, 2021
This reverts the gist of commit 798445a.

Unfortunately the new syscall causes test-event to hang. 32 bit architectures
seem affected: i686 and arm32 in fedora koji. 32 bit build of test-event hangs
reliably under valgrind:

$ PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig meson build-32 -Dc_args=-m32 -Dc_link_args=-m32 -Dcpp_args=-m32 -Dcpp_link_args=-m32 && ninja -C build-32 test-event && valgrind build/test-event

If I set epoll_pwait2_absent=true, so the new function is never called, then
the issue does not reproduce. It seems to be strictly tied to the syscall.

On amd64, the syscall is not used, at least with the kernel that Fedora
provides. The kernel patch 58169a52ebc9a733aeb5bea857bc5daa71a301bb says:

  For timespec, only support this new interface on 2038 aware platforms
  that define __kernel_timespec_t. So no CONFIG_COMPAT_32BIT_TIME.

And Fedora sets CONFIG_COMPAT_32BIT_TIME=y. I expect most other distros will too.

On amd64: epoll_wait_usec: epoll_pwait2: ret=-1 / errno=38
On i686 (same kernel): epoll_wait_usec: epoll_pwait2: ret=2 / errno=0

Is this some kind of emulation? Anyway, it seems that this is what is going wrong.

So let's disable the syscall until it becomes more widely available and the
kinks have been ironed out.

Fixes test-event issue in #19052.
@bluca bluca linked a pull request Mar 24, 2021 that will close this issue
@bluca bluca closed this as completed Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants