Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PID1 enters infinite loop when trying to stop socket with incoming traffic #26216

Open
keszybz opened this issue Jan 26, 2023 · 10 comments
Open
Labels
bug 🐛 Programming errors, that need preferential fixing pid1
Milestone

Comments

@keszybz
Copy link
Member

keszybz commented Jan 26, 2023

systemd version the issue has been seen with

253-rc1

Used distribution

Fedora rawhide

Linux kernel version used

No response

CPU architectures issue was seen on

None

Component

systemd

Expected behaviour you didn't see

#25687 removed static enablement of systemd-journald-audit.socket. It is now managed by normal enable/disable symlinks and the presets logic. When building the package in Fedora, I forgot to add a scriptlet that'd call systemctl preset and reeanable the socket. That is fixed now, and this bug is about what happens when the socket is (intentionally or not) disabled.

From https://bugzilla.redhat.com/attachment.cgi?id=1940497:

$ less -R fail.log|rg 'audit.socket|switch-root'|head -n 90
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/systemd-journald-audit.socket
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/plymouth-switch-root.service
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/initrd-switch-root.target
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/initrd-switch-root.service
systemd-journald-audit.socket: Installed new job systemd-journald-audit.socket/start as 14
systemd-journald-audit.socket: ConditionCapability=CAP_AUDIT_READ succeeded.
systemd-journald-audit.socket: ConditionSecurity=audit succeeded.
systemd-journald-audit.socket: Changed dead -> listening
systemd-journald-audit.socket: Job 14 systemd-journald-audit.socket/start finished, result=done
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Changed listening -> running
initrd-cleanup.service: About to execute systemctl --no-block isolate initrd-switch-root.target
initrd-cleanup.service: Executing: systemctl --no-block isolate initrd-switch-root.target
initrd-switch-root.target: Trying to enqueue job initrd-switch-root.target/start/isolate
initrd-switch-root.target: Installed new job initrd-switch-root.target/start as 65
plymouth-switch-root.service: Installed new job plymouth-switch-root.service/start as 91
initrd-switch-root.service: Installed new job initrd-switch-root.service/start as 79
initrd-switch-root.target: Enqueued job initrd-switch-root.target/start as 65
initrd-switch-root.target: starting held back, waiting for: initrd-cleanup.service
initrd-switch-root.service: starting held back, waiting for: plymouth-switch-root.service
plymouth-switch-root.service: ConditionPathExists=/etc/initrd-release succeeded.
plymouth-switch-root.service: Will spawn child (service_enter_start): /usr/bin/plymouth
plymouth-switch-root.service: Passing 0 fds to service
plymouth-switch-root.service: About to execute /usr/bin/plymouth update-root-fs --new-root-dir=/sysroot
plymouth-switch-root.service: Forked /usr/bin/plymouth as 391
plymouth-switch-root.service: Changed dead -> start
plymouth-switch-root.service: Executing: /usr/bin/plymouth update-root-fs --new-root-dir=/sysroot
initrd-switch-root.target: starting held back, waiting for: initrd-cleanup.service
initrd-switch-root.target: starting held back, waiting for: initrd-udevadm-cleanup-db.service
plymouth-switch-root.service: Child 391 belongs to plymouth-switch-root.service.
plymouth-switch-root.service: Main process exited, code=exited, status=0/SUCCESS (success)
plymouth-switch-root.service: Changed start -> exited
plymouth-switch-root.service: Job 91 plymouth-switch-root.service/start finished, result=done
initrd-switch-root.service: starting held back, waiting for: initrd-switch-root.target
initrd-switch-root.target: starting held back, waiting for: initrd-udevadm-cleanup-db.service
initrd-switch-root.target: AssertPathExists=/etc/initrd-release succeeded.
initrd-switch-root.target changed dead -> active
initrd-switch-root.target: Job 65 initrd-switch-root.target/start finished, result=done
[  OK  ] Reached target initrd-switch-root.target - Switch Root.
initrd-switch-root.service: AssertPathExists=/etc/initrd-release succeeded.
initrd-switch-root.service: Will spawn child (service_enter_start): systemctl
initrd-switch-root.service: Passing 0 fds to service
initrd-switch-root.service: About to execute systemctl --no-block switch-root /sysroot
initrd-switch-root.service: Forked systemctl as 394
initrd-switch-root.service: Changed dead -> start
         Starting initrd-switch-root.service - Switch Root...
initrd-switch-root.service: Executing: systemctl --no-block switch-root /sysroot
-.mount: not serializing before switch-root
tmp.mount: not serializing before switch-root
sys-kernel-config.mount: not serializing before switch-root
sysroot.mount: not serializing before switch-root
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/plymouth-switch-root-initramfs.service
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/plymouth-switch-root.service
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/systemd-journald-audit.socket
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/initrd-switch-root.service
unit_file_build_name_map: normal unit file: /usr/lib/systemd/system/initrd-switch-root.target
initrd-switch-root.target changed dead -> active
plymouth-switch-root.service: Changed dead -> exited
systemd-journald-audit.socket: Changed dead -> running
initrd-switch-root.service: Changed dead -> start
plymouth-switch-root.service: Installed new job plymouth-switch-root.service/stop as 268
initrd-switch-root.service: Installed new job initrd-switch-root.service/stop as 270
initrd-switch-root.target: Installed new job initrd-switch-root.target/stop as 133
systemd-journald-audit.socket: Installed new job systemd-journald-audit.socket/stop as 269
systemd-journald-audit.socket: Changed running -> listening
initrd-switch-root.service: Child 394 belongs to initrd-switch-root.service.
initrd-switch-root.service: Main process exited, code=exited, status=0/SUCCESS (success)
initrd-switch-root.service: Deactivated successfully.
initrd-switch-root.service: Service restart not allowed.
initrd-switch-root.service: Changed start -> dead
initrd-switch-root.service: bpf-lsm: Failed to delete cgroup entry from LSM BPF map: No such file or directory
initrd-switch-root.service: Job 270 initrd-switch-root.service/stop finished, result=done
[  OK  ] Stopped initrd-switch-root.service - Switch Root.
initrd-switch-root.service: Consumed 20ms CPU time.
initrd-switch-root.service: Control group is empty.
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Suppressing connection request since unit stop is scheduled.
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Suppressing connection request since unit stop is scheduled.
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Suppressing connection request since unit stop is scheduled.
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Suppressing connection request since unit stop is scheduled.
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Suppressing connection request since unit stop is scheduled.
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Suppressing connection request since unit stop is scheduled.
systemd-journald-audit.socket: Incoming traffic
systemd-journald-audit.socket: Suppressing connection request since unit stop is scheduled.
systemd-journald-audit.socket: Incoming traffic
...

Unexpected behaviour you saw

No response

Steps to reproduce the problem

No response

Additional program output to the terminal or log subsystem illustrating the issue

No response

@keszybz keszybz added the bug 🐛 Programming errors, that need preferential fixing label Jan 26, 2023
@github-actions github-actions bot added the pid1 label Jan 26, 2023
@bluca bluca added this to the v253 milestone Jan 26, 2023
@fbuihuu
Copy link
Contributor

fbuihuu commented Feb 1, 2023

I think the problem in your case is that systemd-journald-audit.socket was left enabled in initrd but disabled in the host. Not really sure if this was intended (maybe the initrd was not rebuilt after re-installing the package with journal audit socket change ?) but that had the side effect to let the socket running during the switch root transition and to stop it once PID1 is reexecuted in the host.

Indeed once PID1 was reexecuted in the host, it noticed that the socket was no more needed (since you forgot to call systemctl preset in your package scriptlet) and attempted to stop it, which seems the correct thing to do. However I'm not sure why systemd is spamming the logs like it did when it realized that the socket was still receiving data while being stopped. Maybe we should log such event only once.

@keszybz
Copy link
Member Author

keszybz commented Feb 1, 2023

Well, yes. But it should be OK to have a socket enabled in the initrd and disabled in the host. This certainly should not end with an infinite loop and the socket never being closed.

@fbuihuu
Copy link
Contributor

fbuihuu commented Feb 1, 2023

Sure, I was trying to describe what was happening in your case. But I didn't mean that the infinite loop was something expected.

I quickly tried to reproduce the problem but couldn't. The results are a bit different depending on whether the debug logs are enabled or not but I couldn't trigger the infinite loop in both case.

Unfortunately I won't be able to look at this in details before next week.

@fbuihuu
Copy link
Contributor

fbuihuu commented Feb 7, 2023

So we have a logic to disable a listening socket when it has a stop job scheduled to prevent incoming data from triggering unnecessary events. This is done by flushing all incoming connections and draining all data accumulated in the socket buffers.

However the "flushing all incoming connections" logic doesn't apply to netlink sockets, see: https://github.com/systemd/systemd/blob/v253-rc2/src/basic/socket-util.c#L1117

And due to the fact that the socket buffer is also drained, there's always room in the socket buffer to accept more incoming data, which is probably the reason why you're seeing the infinite loop.

At that point, I'm not really sure whether it's worth improving the logic because it only flushes the accumulated incoming connections and data at a given time but it doesn't prevent new ones from triggering new events. So in theory an application that keeps opening the listening socket could trigger the same infinite loop, I think.

Maybe instead we should give more priority to the event source dealing with the run queue than the socket io events has and gives PID1 a chance to proceed with the stop job.

@bluca bluca modified the milestones: v253, v254 Feb 7, 2023
@keszybz
Copy link
Member Author

keszybz commented Feb 7, 2023

We discussed this during the video meeting. @poettering's idea: set ratelimit on the socket source.

@fbuihuu
Copy link
Contributor

fbuihuu commented Feb 7, 2023

yes I guess that would work too.

@bluca
Copy link
Member

bluca commented Jun 27, 2023

@fbuihuu any update on this?

@fbuihuu
Copy link
Contributor

fbuihuu commented Jun 28, 2023

Not really sorry. And I have no free time to spend on this one currently.

@bluca
Copy link
Member

bluca commented Jun 28, 2023

Ok, will move to the next milestone then

@bluca
Copy link
Member

bluca commented Jun 28, 2023

#25687 removed static enablement of systemd-journald-audit.socket. It is now managed by normal enable/disable symlinks and the presets logic.

This also missed that journald uses Sockets=audit.socket so the audit socket is always implicitly enabled anyway

@bluca bluca modified the milestones: v254, v255 Jun 28, 2023
@bluca bluca modified the milestones: v255, v256 Nov 28, 2023
@bluca bluca modified the milestones: v256, v257 May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing pid1
Development

No branches or pull requests

3 participants