Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST-01-BASIC started suddenly failing in an unprivileged nspawn container on Arch Linux #13177

Closed
mrc0mmand opened this issue Jul 24, 2019 · 19 comments
Labels
priority Stuff that should enter master quickly, since it fixes a major bug, unbreaks CI or stalls other work regression ⚠️ A bug in something that used to work correctly and broke through some recent commit tests
Milestone

Comments

@mrc0mmand
Copy link
Member

systemd version the issue has been seen with

latest master

Used distribution

Arch Linux

For some reason past ~8 runs consistently failed with:

Child 17 (systemd-sysctl) died (code=exited, status=1/FAILURE)
systemd-sysctl.service: Child 17 belongs to systemd-sysctl.service.
systemd-sysctl.service: Main process exited, code=exited, status=1/FAILURE
systemd-sysctl.service: Failed with result 'exit-code'.
systemd-sysctl.service: Changed start -> failed
systemd-sysctl.service: Job 15 systemd-sysctl.service/start finished, result=failed
[FAILED] Failed to start Apply Kernel Variables.
See 'systemctl status systemd-sysctl.service' for details.
systemd-sysctl.service: Unit entered failed state.
systemd-sysctl.service: Consumed 6ms CPU time.

Full log: https://ci.centos.org/job/systemd-ci-build/745/artifact//artifacts_all/artifacts_Nuy3FU/vagrant-logs.aXu/vagrant-arch-testsuite.bow/TEST-01-BASIC_FAIL.log

@mrc0mmand
Copy link
Member Author

mrc0mmand commented Jul 24, 2019

Ah, I see, this is from the relevant system.journal file:

Jul 24 17:15:32 systemd-testsuite systemd[17]: systemd-sysctl.service: Executing: /usr/lib/systemd/systemd-sysctl
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h' to 'kernel/core_pattern', ignoring: Permission deni>
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '16' to 'kernel/sysrq', ignoring: Permission denied
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '1' to 'kernel/core_uses_pid', ignoring: Permission denied
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '0 2147483647' to 'net/ipv4/ping_group_range': Invalid argument
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write 'fq_codel' to 'net/core/default_qdisc', ignoring: No such file or directory
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '1' to 'fs/protected_hardlinks', ignoring: Permission denied
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '1' to 'fs/protected_symlinks', ignoring: Permission denied
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '1' to 'fs/protected_regular', ignoring: Permission denied
Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '1' to 'fs/protected_fifos', ignoring: Permission denied

The problematic line is:

Jul 24 17:15:32 systemd-testsuite systemd-sysctl[17]: Couldn't write '0 2147483647' to 'net/ipv4/ping_group_range': Invalid argument

Looks suspiciously related to #13141. @poettering

@mrc0mmand
Copy link
Member Author

mrc0mmand commented Jul 24, 2019

Also, all this is because I screwed up the "skip" regex for man-only PRs, so CentOS CI just skipped the testing for #13141, sorry for that...

(Fixed in systemd/systemd-centos-ci@fc74aa2)

@mrc0mmand mrc0mmand added bug 🐛 Programming errors, that need preferential fixing priority Stuff that should enter master quickly, since it fixes a major bug, unbreaks CI or stalls other work regression ⚠️ A bug in something that used to work correctly and broke through some recent commit and removed bug 🐛 Programming errors, that need preferential fixing labels Jul 24, 2019
@yuwata yuwata added this to the v243 milestone Jul 25, 2019
@poettering
Copy link
Member

hmm, this is puzzling... My guess would be that the local kernel on those older distros accepts only a smaller range. Uh oh.

Any chance you can test what values those kernels accept? i.e. echo "0 2147483647" into it, then "0 1073741823", then "0 536870911", "0 268435455", … i.e. all 2^n-1 from 31 down, to see what works on those kernels?

@evverx
Copy link
Member

evverx commented Jul 25, 2019

@poettering given that it happens in user namespaces only I think it has something to do with https://lore.kernel.org/patchwork/patch/959770/, according to which

This patch changes the behavior by clearly returning an error when the sysctl write operation receives a GID range that doesn't map to the associated user namespace.

FWIW it's Arch and hence the kernel there isn't that old (5.2.2-arch1-1-ARCH to be precise)

@mrc0mmand
Copy link
Member Author

As @evverx said, the kernel is pretty new according to the os info log

Linux version 5.2.2-arch1-1-ARCH (builduser@heftig-71853) (gcc version 9.1.0 (GCC)) #1 SMP PREEMPT Sun Jul 21 19:18:34 UTC 2019

@evverx
Copy link
Member

evverx commented Jul 25, 2019

@mrc0mmand @poettering would it maybe make sense to revert that PR to fix CentOS CI and then try to come up with something that would work in nspawn containers where only 65536 gids are mapped onto user namespaces by default and where 0 65535 should be written to ping_group_range?

@poettering
Copy link
Member

hmmm urks, i see, nasty...

i wonder what a long term fix for this could be though...

what kind of container manager is this? How is /proc/sys/ mounted in it? i.e. read-only? or is this nspawn?

@mrc0mmand
Copy link
Member Author

It's nspawn:

+ env --unset=UNIFIED_CGROUP_HIERARCHY timeout --foreground 900 /build/build/systemd-nspawn -U --private-network --machine=TEST-01-BASIC --register=no --kill-signal=SIGKILL --directory=/var/tmp/systemd-test-TEST-01-BASIC/unprivileged-nspawn-root /usr/lib/systemd/systemd rw root=/dev/sda1 raid=noautodetect loglevel=2 init=/usr/lib/systemd/systemd console=ttyS0 selinux=0 printk.devkmsg=on

@poettering
Copy link
Member

hmm, i though we mounted /proc/sys read-only anyway... /me is confused

@evverx
Copy link
Member

evverx commented Jul 25, 2019

hmm, i though we mounted /proc/sys read-only anyway... /me is confused

Yes, usually we do but it doesn't matter when /proc/sys/net is writable: 411e869

@evverx
Copy link
Member

evverx commented Jul 26, 2019

i wonder what a long term fix for this could be though...

I'm not sure how to fix it in general but I think in this particular case systemd-sysctl should at least try to fall back to the last known good range so as not to screw ping_group_range up completely. Another option would be to dynamically detect the upper bound (probably using the bisection method) by trying to write different numbers until it works. Though for this to work we should probably add a keyword systemd-sysctl would recognize or introduce some kind of sorcery inferring the real meaning behind net.ipv4.ping_group_range = 0 2147483647.

@poettering
Copy link
Member

hmm, maybe we should introduce a - prefix for sysctl lines that makes applying the rule non-fatal?

@evverx
Copy link
Member

evverx commented Jul 26, 2019

I think we've discussed that before (though I can't seem to find the thread) and agreed that it might be a good idea. But in this case it's not enough because I think silently messing up ping_group_range (which will contain something like 65534 after a failed attempt to write 2147483647) is even worse than bailing out loudly.

@poettering
Copy link
Member

hmm, what? it's automatically cast to the nearest value? but that'd actually be great!

@evverx
Copy link
Member

evverx commented Jul 26, 2019

Not really. It turns 1 0 into 65534 65534 where 65534 is DEFAULT_OVERFLOWGID.

@poettering
Copy link
Member

hmm, well, but that might simply be because it leaves the values unset, and they then refer to the host gids, which get translated to the overflowgid for display, right?

i mean, they are just the same values as before setting them and getting EINVAL, no?

@evverx
Copy link
Member

evverx commented Jul 26, 2019

@poettering to answer these questions I or anybody else needs to launch a couple of containers (with and without https://lore.kernel.org/patchwork/patch/959770/). It might take a while. In the meantime, CentOS CI has been failing for two days and the only reliable way to "fix" it I came up with is to revert that PR: #13187.

poettering added a commit to poettering/systemd that referenced this issue Jul 26, 2019
@poettering
Copy link
Member

I posted #13191 now, which implements the "-" thing. Works fine, and behaves correctly according to my tests.

(i.e. mkosi -ifT && systemd-nspawn -U --private-network -i ./image.raw -b now boots up correctly. Also, if you drop the '-b' then you'll find the sysctl showing 65535 twice too, because those gids are simply not mapped. Hence all is good)

@keszybz
Copy link
Member

keszybz commented Jul 26, 2019

Centos CI passed with #13187, so let's close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority Stuff that should enter master quickly, since it fixes a major bug, unbreaks CI or stalls other work regression ⚠️ A bug in something that used to work correctly and broke through some recent commit tests
Development

No branches or pull requests

5 participants