-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd-nspawn attempts to parse the deserialization FD as a runlevel, overriding the default target in some cases #24452
Comments
Hmpf, this is strange, it looks like after daemon-reexec the
|
I can seem to reproduce it with a "vanilla" meson build ( diff --git a/test/units/testsuite-60.sh b/test/units/testsuite-60.sh
index a29364568d..ddbaff23bc 100755
--- a/test/units/testsuite-60.sh
+++ b/test/units/testsuite-60.sh
@@ -225,7 +225,7 @@ EOF
# shellcheck disable=SC2064
trap "rm -f /run/systemd/system/tmp-hoge.mount '$mount_mytmpfs'" RETURN
- for ((i = 0; i < 10; i++)); do
+ for ((i = 0; i < 100; i++)); do
systemctl --no-block start tmp-hoge.mount
sleep ".$RANDOM"
systemctl daemon-reexec and
but it still might take a couple of tries. I get more consistent results with a coverge build (
In this case the test fails every time even without the iteration patch (at least on my F36 machine and in the daily coverage build on Arch Linux). Tested with a3e03a3 ATTOW. I'll pre-emptively add this to the v252 milestone, as it looks like a bug in the serialization/deserialization stuff. I'll try to bisect it in the meantime. |
Ah, the bisect is going to be fun, as it's reproducible even with the commit that introduced the test (864d1a4). |
Interesting, I used a trimmed-down version of TEST-60: #!/usr/bin/env bash
# SPDX-License-Identifier: LGPL-2.1-or-later
set -eux
set -o pipefail
test_issue_23796() {
local mount_path mount_mytmpfs
mount_path="$(command -v mount 2>/dev/null)"
mount_mytmpfs="${mount_path/\/bin/\/sbin}.mytmpfs"
cat >"$mount_mytmpfs" <<EOF
#!/bin/bash
sleep ".\$RANDOM"
exec -- $mount_path -t tmpfs tmpfs "\$2"
EOF
chmod +x "$mount_mytmpfs"
mkdir -p /run/systemd/system
cat >/run/systemd/system/tmp-hoge.mount <<EOF
[Mount]
What=mytmpfs
Where=/tmp/hoge
Type=mytmpfs
EOF
# shellcheck disable=SC2064
trap "rm -f /run/systemd/system/tmp-hoge.mount '$mount_mytmpfs'" RETURN
for ((i = 0; i < 100; i++)); do
echo "Iteration $i/100"
systemctl --no-block start tmp-hoge.mount
sleep ".$RANDOM"
systemctl daemon-reexec
sleep 1
systemctl stop tmp-hoge.mount || :
done
}
: >/failed
systemd-analyze log-level debug
systemd-analyze log-target journal
systemctl daemon-reload
# test for reexecuting with background mount job
test_issue_23796
systemd-analyze log-level info
touch /testok
rm /failed and with it I can reproduce it on v251 and v250*. On v249 it also fails after
Anyway, I'm dropping the "regression" tag again, as the issue has been there for quite a while and I can't even find out if it worked at any point in time. [*] Commits needed to successfully build given tag: |
Not sure, but maybe related to #20330?? |
I'm not really sure, since this seems to happen only in nspawn (but maybe I was lucky?). Let's see how #20330 pans out. |
The issue sounds scary for me, but only reported on a specific test environment. Hopefully it merely affects real system. Moved to v253 milestone. |
It definitely sound scary, but since I can reproduce it even with systemd v249 I don't think it's something urgent, especially since no-one else reported it until now. |
After many hours of digging around, I managed to isolate the issue to the #!/usr/bin/env bash
# SPDX-License-Identifier: LGPL-2.1-or-later
set -eux
systemctl log-level info
stat /run/systemd/generator.early/testsuite.target.wants/testsuite-60.service
for ((i = 0; i < 500; i++)); do
systemctl restart --no-block tmp.mount
systemctl daemon-reexec
stat /run/systemd/generator.early/testsuite.target.wants/testsuite-60.service
done
touch /testok the test fails after a couple of tries with:
even though the debug generator succeeded. Unfortunately, I can't seem to get any debug output from the generators when running with nspawn (as this happens only with nspawn, in QEMU everything works as expected). |
Another interesting thing - with |
Oh god, now I see it. Turns out I can reproduce it by calling the generator manually:
#!/usr/bin/env bash
# SPDX-License-Identifier: LGPL-2.1-or-later
set -eux
at_exit() {
set +e
ls -lR test/
}
trap at_exit EXIT
mkdir test
for ((i = 0; i < 500; i++)); do
rm -fr test/*
systemctl restart --no-block tmp.mount
systemctl daemon-reexec
/usr/lib/systemd/system-generators/systemd-debug-generator test/ test/ test/
ls -lR test/
stat test/testsuite.target.wants/testsuite-60.service
#stat /run/systemd/generator.early/testsuite.target.wants/testsuite-60.service
done
touch /testok diff --git a/src/basic/proc-cmdline.c b/src/basic/proc-cmdline.c
index eea70d8606..ffae49732c 100644
--- a/src/basic/proc-cmdline.c
+++ b/src/basic/proc-cmdline.c
@@ -137,6 +137,8 @@ int proc_cmdline_parse(proc_cmdline_parse_t parse_item, void *data, ProcCmdlineF
if (r < 0)
return r;
+ log_warning("%s: line=%s", __func__, line);
+
return proc_cmdline_parse_given(line, parse_item, data, flags);
}
diff --git a/src/debug-generator/debug-generator.c b/src/debug-generator/debug-generator.c
index 1fe2b56810..d11f097581 100644
--- a/src/debug-generator/debug-generator.c
+++ b/src/debug-generator/debug-generator.c
@@ -60,6 +60,8 @@ static int parse_proc_cmdline_item(const char *key, const char *value, void *dat
if (r < 0)
return log_oom();
+ log_warning("%s: Got systemd.wants=%s", __func__, n);
+
} else if (proc_cmdline_key_streq(key, "systemd.debug_shell")) {
const char *t = NULL;
@@ -76,12 +78,15 @@ static int parse_proc_cmdline_item(const char *key, const char *value, void *dat
if (proc_cmdline_value_missing(key, value))
return 0;
+ log_warning("%s: Got systemd.unit=%s", __func__, value);
return free_and_strdup_warn(&arg_default_unit, value);
} else if (!value) {
const char *target;
target = runlevel_to_target(key);
+
+ log_warning("%s: !value, runlevel_to_target(%s) = %s", __func__, key, target);
if (target)
return free_and_strdup_warn(&arg_default_unit, target);
}
@@ -115,6 +120,8 @@ static int generate_wants_symlinks(void) {
_cleanup_free_ char *p = NULL, *f = NULL;
const char *target;
+ log_warning("%s: arg_default_unit=%s, in_initrd=%s", __func__, arg_default_unit, yes_no(in_initrd()));
+
/* This should match what do_queue_default_job() in core/main.c does. */
if (arg_default_unit)
target = arg_default_unit;
@@ -131,6 +138,8 @@ static int generate_wants_symlinks(void) {
if (!f)
return log_oom();
+ log_warning("%s: Creating symlink %s for unit %s with target %s", __func__, f, *u, target);
+
(void) mkdir_parents_label(p, 0755);
if (symlink(f, p) < 0)
As you can see from the output above, we always attempt to parse the FD number from |
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd/systemd#24452 (comment).
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at systemd/systemd#24452 (comment).
systemd version the issue has been seen with
latest main
Used distribution
Arch Linux
Linux kernel version used
5.19.2-arch1-2
CPU architectures issue was seen on
No response
Component
No response
Expected behaviour you didn't see
No response
Unexpected behaviour you saw
In a couple of instances I noticed TEST-60 fail in a strange way when running in nspawn:
Journals:
Steps to reproduce the problem
No response
Additional program output to the terminal or log subsystem illustrating the issue
No response
The text was updated successfully, but these errors were encountered: