New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sparc64]: systemctl fails with 'Failed to parse bus message: Bad message' #2633
Comments
We probably forgot some endianess conversion somewhere... Since BE systems are kinda exotic these days, we are happy to support them but kinda rely on user patches and testing to ensure this stays working. |
I just realized, there is actually a difference between the machines affected by the issue and the machines where systemctl works fine, the latter uses an SMP kernel while the others don't. Maybe it's some kernel option that's configured differently in the SMP kernel. I will do some more testing, but if you have some general pointers where to poke I'll be happy to look there. |
Try gdb on it. The error message is generated by bus_log_parse_error() which is invoked at a ton of places. set a breakpoint on it when running systemctl like this, and see where it triggers. (you might want to recompile systemd with -g -O0 first, otherwise your gdb experience will be crap) |
So, here's my first shot:
|
OK, so this is triggered by sd_message_read() somewhere (the error is an immediate result of that call). Here's my recommendation to track this down further: recompile systemctl (and sd-bus), but add: #undef EBADMSG
#define EBADMSG (__LINE__ + 1000) to the top of src/libsystem/sd-bus/bus-message.c. This has the effect that instead of returning the "Bad message" error code the parsing failure will now result in an error code that is 1000 plus the line number where this was really triggered. If you run the tool then, it should output neatly the line number where the EBADMSG is generated first. |
(add those two lines immediately after all #include lines, so that the EBADMSG definition of the OS is overriden with this debug definition) |
So, this seems to work: root@test-adrian1: |
So, the problem seems that the object path string contains garbage at its end:
Until "3810", it correspondends to the UUID of swap:
But the rest of the string is garbage. |
where does the 125 come from you used? is that the value of l? The string looks pretty OK, and is NUL terminated, but the trailing underscore surprises me. (and of course, if l is 125, then the string contains NUL bytes, and that's weird... |
Yes, that's the value of l. Will continue debugging tonight. |
Any update on this? Any chance that da8358c fixed this issue, too? |
I just applied the single fix from da8358c to my old build tree and just ran make again but that didn't help: root@test-adrian1: I'll clone systemd from git now and then test again with the latest revision. |
Ok, cloned systemd from git, built and installed it. Problem still persists, unfortunately. |
OK, no idea then. I figure there's no way around some gdb sessions on your side... Try to figure out why the memory got corrupted there... |
Oh, it looks like this problem has actually fixed itself somewhat magically. On a machine where it didn't work in the past (an older sun4u machine), systemctl is now working as it should. Tested on another previously affected machine and can't reproduce it there either. Nice! |
Hi!
On a Debian/unstable sparc64 fresh installation with all packages up-to-date and systemd at 229, we're getting the following error when running systemctl:
root@ravirin:
# systemctl#Failed to parse bus message: Bad message
root@ravirin:
root@ravirin:
# systemctl --version# uname -asystemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD -IDN
root@ravirin:
Linux ravirin 4.3.0-1-sparc64 #1 Debian 4.3.3-5 (2016-01-04) sparc64 GNU/Linux
root@ravirin:~#
Listing failed units works, on the other hand, while --all does not:
root@ravirin:
# systemctl --all# systemctl --failedFailed to parse bus message: Bad message
root@ravirin:
UNIT LOAD ACTIVE SUB DESCRIPTION
● rng-tools.service loaded failed failed rng-tools.service
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
root@ravirin:~#
Surprisingly, this happens only on two of three sparc64 systems I tested with the same kernel and systemd version. The machine where the issue does not arise is much newer and has much more memory (SPARC-T5 with 192 GiB of RAM). The older machines are a Sun Blade 100 with just 1 GiB RAM and an older UltraSPARC IIIi (Jalapeno).
Cheers,
Adrian
The text was updated successfully, but these errors were encountered: