Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-journald breaks forwarding to syslog format with extra space, affecting rsyslogd parser #24540

Closed
mk-fg opened this issue Sep 2, 2022 · 6 comments
Labels
journal needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer regression ⚠️ A bug in something that used to work correctly and broke through some recent commit

Comments

@mk-fg
Copy link
Contributor

mk-fg commented Sep 2, 2022

systemd version the issue has been seen with

251.4

Used distribution

Arch Linux

Linux kernel version used

5.15.64

CPU architectures issue was seen on

x86_64

Component

systemd-journald

Expected behaviour you didn't see

Message forwarded to syslog via /run/systemd/journal/syslog socket would look like this:
... msg_iov=[{iov_base="<30>Sep 2 14:56:09 dbus-daemon[1766]: [system] Activating via systemd: ...

Unexpected behaviour you saw

Forwarded msg has an extra whitespace character after timestamp, breaking RFC format:

old: ... msg_iov=[{iov_base="<30>Sep  2 14:56:09 dbus-daemon[1766]: [system] Activating via systemd: ...
new: ... msg_iov=[{iov_base="<30>Sep  2 14:54:15  dbus-daemon[1661]: [system] Activating via systemd: ...

("old" is expected msg from above and systemd-251.3, "new" is one seen with systemd-251.4)

This breaks rsyslog, as reported in this issue there - rsyslog/rsyslog#4979
It was also suggested there that this is a bug in systemd, and this extra whitespace character should not be there.

Steps to reproduce the problem

Relatively easy to do with rsyslog:

  • Build and install latest rsyslogd on a modern Arch Linux OS with systemd 251.4-1.
  • Enable ForwardToSyslog=yes in /etc/systemd/journald.conf, restart systemd-journald.
  • Configure rsyslog.conf to have following input module: input(type="imuxsock" Socket="/run/systemd/journal/syslog")
  • Start rsyslogd, its own startup logging should be sufficient to see broken procid and programname values.

For easier testing with rsyslogd, here is a simple configuration file for it:

global(AbortOnUncleanConfig="on" shutdown.enable.ctlc="on")

module(load="imuxsock")
input(type="imuxsock" Socket="/run/systemd/journal/syslog")

template(name="json" type="list" option.jsonf="on") {
  property(name="timereported" format="jsonf")
  property(name="pri-text" format="jsonf")
  property(name="programname" format="jsonf")
  property(name="procid" format="jsonf")
  property(name="msg" format="jsonf")
}

module(load="omstdout")
action(type="omstdout" template="json")

Running rsyslogd -nf rsyslog.conf | jq -C . with that should immediately reproduce the issue, with "programname": "", and "procid": "-", fields notably bogus in its own startup logging.

To check what gets passed over /run/systemd/journal/syslog, strace tool can be used as follows:
strace -e recvmsg -s8000 -fp $(pgrep -x rsyslogd)

It can also be used with any other reader for that socket, as presumably it'd get same exact output, exhibiting same problem.

Additional program output to the terminal or log subsystem illustrating the issue

Two strace recvmsg() lines from systems without and with the issue (before/after updates):

Correct message from systemd before 251.4 (strace'd from a different system):
[pid 1807] recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="<30>Sep 2 14:56:09 dbus-daemon[1766]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.3593' (uid=0 pid=2626928 comm=\"timedatectl\")", iov_len=8096}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1662112569, tv_usec=796094}}, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=1766, uid=81, gid=81}}], msg_controllen=64, msg_flags=0}, MSG_DONTWAIT) = 220

Incorrect msg from systemd 251.4:
[pid 71761] recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="<30>Sep 2 14:54:15 dbus-daemon[1661]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.126' (uid=0 pid=71823 comm=\"timedatectl\")", iov_len=8096}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1662112455, tv_usec=892229}}, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=1661, uid=81, gid=81}}], msg_controllen=64, msg_flags=0}, MSG_DONTWAIT) = 218

As far as I can tell, issue appeared after update from systemd 251.3-1 to 251.4-1 on an Arch Linux system, where bunch of other libs/components were updated as well, but notably not rsyslogd.

It was also brought up in rsyslog/rsyslog#4979 (as also mentioned above), and it was suggested there that this seem to be a regression in systemd syslog forwarding, as its format no longer conforms to a relevant RFC standard.

I couldn't find any related commits/changes or issues/PRs about this here on a short notice.

Thanks!

@poettering
Copy link
Member

where would one find the patches arch added between 251.3-1 and 251.4-1?

@mk-fg
Copy link
Contributor Author

mk-fg commented Sep 2, 2022

I think in Arch Linux "base" repository, where one can get everything related to current systemd build using:
svn checkout svn://svn.archlinux.org/packages/systemd/repos/core-x86_64 systemd-arch

Looks like it's built from https://github.com/systemd/systemd-stable git tree (wonder if maybe I'm submitting an issue against wrong repo), with only relatively simple 0001-Use-Arch-Linux-device-access-groups.patch applied (should be checked out into "systemd-arch" dir with command above) that changes group names and some udev rules.

@yuwata
Copy link
Member

yuwata commented Sep 2, 2022

I cannot find anything relevant between v251.3 and v251.4:

$ git log --oneline v251.3..v251.4 -- src/fundamental src/basic src/libsystemd src/shared src/journal

695eb67 gcrypt: switch to system rng before gcry_check_version (#24162)
ca0ed3a gcrypt: prefer the OS RNG
e7d635f virt: detect KubeVirt instance
739d713 home: drop conflicted headers
8fe0c12 glibc: Remove #include <linux/fs.h> to resolve fsconfig_command/mount_attr conflict with glibc 2.36
9ac0ad8 unit-file: avoid (null) in debugging logs
64be8d8 base-filesystem: pick more conservative access mode for /root/
9255fa3 stat-util: replace is_dir() + is_dir_fd() by single is_dir_full() call
e56bfc8 sd-bus: do not pass NULL when received message with invalid type
7f80102 sd-device: send udev database version
925cff4 sd-device: make device_get_properties_{nulstr,strv}() take NULL for result value
0ecda6f sd-device: change type of properties nulstr from uint8_t* to char*

@yuwata
Copy link
Member

yuwata commented Sep 2, 2022

Is this really fixed when you downgrade to v251.3, with the same kernel and other libraries?

@yuwata yuwata added needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer regression ⚠️ A bug in something that used to work correctly and broke through some recent commit and removed bug 🐛 Programming errors, that need preferential fixing labels Sep 2, 2022
@mk-fg
Copy link
Contributor Author

mk-fg commented Sep 2, 2022

Yeah, currently looking at specific message in journalctl, I can see that that space is a part of SYSLOG_TIMESTAMP:

{
        "SYSLOG_FACILITY" : "3",
        "_TRANSPORT" : "syslog",
        "MESSAGE" : "[system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.1>
...
        "SYSLOG_TIMESTAMP" : "Sep  2 20:34:06 ",
        "_CMDLINE" : "/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only"
}

Which I think journald itself parses from how libc syslog() formats it (?).

And that's how it gets sent to /dev/log (by dbus-daemon in this case):

sendto(12, "<30>Sep  2 21:40:20  dbus-daemon[1661]: [system] Activating via systemd: ...", 218, MSG_NOSIGNAL, NULL, 0) = 218

So it does look like a problem in libc or whatever formats these timestamps for syslog calls in apps, which systemd simply passes through as-is, and issue only popping-up in rsyslogd which tries to parse them more strictly.

@mk-fg mk-fg closed this as completed Sep 2, 2022
@mk-fg
Copy link
Contributor Author

mk-fg commented Sep 2, 2022

Sorry for the noise.
I'll try to update this msg if I'll find the underlying issue, but at this point it doesn't look to me like this space has much to do with systemd or rsyslog.

Update:

Confirmed an issue to pop-up in a VM with update to glibc-2.36-3 and gone after downgrading back to glibc-2.35-6.
There's been a rewrite of misc/syslog.c there, so maybe that's where regression was introduced.

Related Arch Linux and glibc issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
journal needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Development

No branches or pull requests

3 participants