Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pid1: add a new method of rebooting: userspace only under the name "soft-reboot" #27435

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

poettering
Copy link
Member

@poettering poettering commented Apr 27, 2023

This adds a new mechanism for rebooting, a form of "userspace reboot"
hereby dubbed "soft-reboot". It will stop all services as in a usual
shutdown, possibly transition into a new root fs and then issue a fresh
initial transaction. The kernel is not replaced.

File descriptors can be passed over, thus opening the door for leaving
certain resources around between such reboots.

Usecase: this is an extremely quick way to reset userspace fully when
updating image based systems, without going through a full
hardware/firmware/boot loader/kernel reset. It minimizes "grayout time"
for OS updates. (In particular when combined with kernel live patching)

@poettering
Copy link
Member Author

this is not ready for merging yet. lacks all docs, and tests.

But is pretty comprehensive otherwise. Can do a supercharged reboot in an nspawn container and in qemu in no time.

@bluca
Copy link
Member

bluca commented Apr 27, 2023

This is great, and should open the door to easier testing as well here, as we can now do reboots without actually terminating qemu/nspawn and thus add special logic to the test harness.

Bikeshedding: I'd really call this just "userspace reboot", "renew" is really confusing if you don't already know what it does

@poettering
Copy link
Member Author

"systemctl userspace-reboot" is impossible to type

@bluca
Copy link
Member

bluca commented Apr 27, 2023

That's why god invented bash completion!

@bluca
Copy link
Member

bluca commented Apr 27, 2023

As a shorthand, we could also have a "systemctl uexec" alias - would pair well with kexec

@srd424
Copy link
Contributor

srd424 commented Apr 27, 2023

"systemctl userspace-reboot" is impossible to type

How about "reinit"?

@orowith2os
Copy link

reinit sounds too much like a reboot to me, from a third party (end user) pov.

I prefer userspace-reboot, it feels like it represents this feature best. It's easy to remember. I can't think of any alternative names, though...

@dtardon
Copy link
Collaborator

dtardon commented Apr 28, 2023

Maybe system-reexec (analogy to daemon-reexec)? Or just reexec?

Btw, launchd apparently supports this already; it's invoked as launchctl reboot userspace.

@bluca
Copy link
Member

bluca commented Apr 28, 2023

the main goal is to quickly transition to a new userspace snapshot, not just to re-execute the current one

@mrc0mmand
Copy link
Member

mrc0mmand commented Apr 28, 2023

You'll need to add

diff --git a/test/units/testsuite-21.sh b/test/units/testsuite-21.sh
index 36f647ca5f..a0df607377 100755
--- a/test/units/testsuite-21.sh
+++ b/test/units/testsuite-21.sh
@@ -28,6 +28,7 @@ systemctl log-level info
 # FIXME: systemd-run doesn't play well with daemon-reexec
 # See: https://github.com/systemd/systemd/issues/27204
 sed -i '/\[org.freedesktop.systemd1\]/aorg.freedesktop.systemd1.Manager:Reexecute FIXME' /etc/dfuzzer.conf
+sed -i '/\[org.freedesktop.systemd1\]/aorg.freedesktop.systemd1.Manager:Renew destructive' /etc/dfuzzer.conf
 
 # TODO
 #   * check for possibly newly introduced buses?

so dfuzzer doesn't keep triggering "renew" when fuzzing. Once this is merged (and the name of the method is settled) I'll update dfuzzer and drop the sed.

@poettering
Copy link
Member Author

Btw, launchd apparently supports this already; it's invoked as launchctl reboot userspace.

btw, anyone has any idea where to find the current sources for launchd/launchtl? did apple take that closed source? only can find a verson from 7 years ago...

@poettering
Copy link
Member Author

```diff

so dfuzzer doesn't keep triggering "renew" when fuzzing. Once this is merged (and the name of the method is settled) I'll update dfuzzer and drop the sed.

apparently people don't like the name "renew". so we are going to change this before merging anyway, i guess

@Winterhuman
Copy link
Contributor

Winterhuman commented Apr 28, 2023

@poettering Out of curiosity, how would switching to a different root work in this case? In other words, how does a non-destructive switch_root work?

EDIT: It's because it's essentially pivot_root, not switch_root, see #27450.

@topimiettinen
Copy link
Contributor

More bikeshedding suggestions:

  • Also analogously to systemctl daemon-reexec: systemctl userspace-reexec or less Yoda: systemctl reexec-userspace?
  • analogously to systemctl restart xyz.service: systemctl userspace-restart / systemctl restart-userspace
  • systemctl userspace-reboot / systemctl reboot-userspace-only/ systemctl reboot-userspace

Also the existing commands could have new, complementing aliases for symmetry:

  • systemctl reboot = systemctl system-reboot / systemctl reboot-system / systemctl system-restart / systemctl restart-system
  • systemctl kexec = systemctl kernel-reboot / systemctl reboot-kernel / systemctl kernel-restart / systemctl restart-kernel
  • systemctl poweroff = systemctl system-poweroff / systemctl poweroff-system (maybe not, what's the userspace equivalent to poweroff?)

@jevinskie
Copy link

Btw, launchd apparently supports this already; it's invoked as launchctl reboot userspace.

btw, anyone has any idea where to find the current sources for launchd/launchtl? did apple take that closed source? only can find a verson from 7 years ago...

Sadly they stopped releasing the source years ago.

@poettering
Copy link
Member Author

Note that the Linux kernel currently pins the original root fs, no matter what

Do you have any resources on the status of changing this upstream?

I know that at least one of the Linux kernel VFS maintainers is aware of the issue. I wouldn't hold my breath though.

Would an image-based system like ostree help here?

ostree is a userspace concept. It cannot avoid this kernel limitation. This needs to be fixed in kernel.

poettering added a commit to poettering/systemd that referenced this pull request May 3, 2023
@erkinalp
Copy link

erkinalp commented May 3, 2023

This should have been called systemctl --user reboot.

@smac89
Copy link

smac89 commented May 3, 2023

This should have been called systemctl --user reboot.

--user commands usually do not require super user permissions, but I have a feeling this feature will require root permissions

@bluca bluca added needs-rebase and removed please-review PR is ready for (re-)review by a maintainer labels May 3, 2023
@bluca
Copy link
Member

bluca commented May 3, 2023

needs a rebase

This adds a new mechanism for rebooting, a form of "userspace reboot"
hereby dubbed "soft-reboot". It will stop all services as in a usual
shutdown, possibly transition into a new root fs and then issue a fresh
initial transaction. The kernel is not replaced.

File descriptors can be passed over, thus opening the door for leaving
certain resources around between such reboots.

Usecase: this is an extremely quick way to reset userspace fully when
updating image based systems, without going through a full
hardware/firmware/boot loader/kernel/initrd cycle. It minimizes "grayout time"
for OS updates. (In particular when combined with kernel live patching)
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed needs-rebase labels May 3, 2023
@poettering
Copy link
Member Author

This should have been called systemctl --user reboot.

Generally, if you specify --user on the systemctl command line you will talk to the per-user instance of the service manager (as supposed to the per-system service manager, i.e. PID 1). Hence systemctl --user reboot as user lennartwould indicate "Hey, service manager of user lennart please reboot yourself!". Which doesn't really make sense, since "reboot" is not a concept defined for users, but only for the system as a whole.

@erkinalp
Copy link

erkinalp commented May 3, 2023

It would indeed make sense, by forcibly restarting everything run by the user in question.

Copy link
Member

@bluca bluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what to do about pcrphase and such - I think we do not want to re-run TPM measurements? To achieve this, and probably to help with other actions that might be needed to run only the "real" boot, what about adding a new ConditionIsSoftReboot= or ConditionBootType= or so, that allows to match and skip? To implement it, it should be doable to pass a variable through via serialization in the manager at shutdown, so that the next iteration knows it's been through a soft reboot

if (!isempty(root)) {
if (!path_is_valid(root))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS,
"New root directory must be a valid path.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include root in the log message?

(such as the file system they are backed by), thus increasing memory usage (as two versions of the
OS/application/file system might be kept in memory). Leaving processes running during a soft-reboot
operation requires disconnecting the service comprehensively from the rest of the OS, i.e. minimizing IPC
and reducing sharing of resources with the rest of the OS.</para>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe plug in portable services and nspawn as solutions that work well with this, and link their docs?


<para>Note that because
<citerefentry><refentrytitle>systemd-shutdown</refentrytitle><manvolnum>8</manvolnum></citerefentry> is
not executed the executables in <filename>/usr/lib/systemd/system-shutdown/</filename> are not executed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not executed, the executables in"

echo "wuffwuff" > "$T"
systemd-notify --fd=3 --pid=parent 3<"$T"
rm "$T"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to beef up this test, to add a normal service that is shut down as expected (and verify that it is), and one that survives (and verify that it does)

@bluca bluca added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels May 3, 2023
@orowith2os
Copy link

It would indeed make sense, by forcibly restarting everything run by the user in question.

You can log out and log back in. That should get you what you want. Userspace reboots should probably be under the normal reboot commands.

@poettering
Copy link
Member Author

I wonder what to do about pcrphase and such - I think we do not want to re-run TPM measurements?

So that's a major discussion to be had, but I think we should do that separately. That said I nowadays think the only safe and secure way is that we really should run at least the boot phase stuff, so that the n-th reboot can be securely distinguished from the n-1-th and n+1-th boot. But we need to start maintaining a proper log of measurements sooner rather than later, so that people can verify this. Soft reboots must appear as a new series of boot phases in the measurement logs I think.

Note that the keys for FDE and such are already unlocked on a soft reboot, and we don't need to unlock them again, hence the boot phase stuff should work out quite OK still: you'd bind FDE unlocking to the initrd boot phase of the first boot, and then the FDE will work in perpetuity even though we can never access the key in the TPM anymore.

A different story is the measurement of machine-id and the mount uuids into PCR 15. We probably should not repeat that, and we don't really have to if it stays the same and mounted.

To achieve this, and probably to help with other actions that might be needed to run only the "real" boot, what about adding a new ConditionIsSoftReboot= or ConditionBootType= or so, that allows to match and skip? To implement it, it should be doable to pass a variable through via serialization in the manager at shutdown, so that the next iteration knows it's been through a soft reboot

We can certainly consider that. For the fs uuid measurement an alternative would be to simply leave the service up till the very end, so that on next boot it is still up because serialization, and then we won't rerun it.

@bugaevc
Copy link
Contributor

bugaevc commented May 4, 2023

btw, anyone has any idea where to find the current sources for launchd/launchtl? did apple take that closed source? only can find a verson from 7 years ago...

As far as we (the Darling project) know, launchd has been integrated into the XPC project, largely rewritten, and its sources are nowhere to be found just like the rest of XPC.

We ship launchd-842.92.1 (apparently from OS X 10.9.4, 2014) and our own reimplementation of XPC.

@jluebbe
Copy link
Contributor

jluebbe commented May 8, 2023

Is there a defined interface for the new userspace to find out how was booted ("normally" or via soft-reboot)? Similarly, where it was booted from, analogous to /proc/cmdline (besides inspecting mountinfo in proc)?

Or would the old userspace prepare that information together with /run/nextroot somewhere readable by the new userspace?

I've not seen it in the code, but perhaps I've missed something.

@brotaxt
Copy link

brotaxt commented May 8, 2023

This adds a new mechanism for rebooting, a form of "userspace reboot"
hereby dubbed "soft-reboot". It will stop all services as in a usual
shutdown, possibly transition into a new root fs and then issue a fresh
initial transaction. The kernel is not replaced.

File descriptors can be passed over, thus opening the door for leaving
certain resources around between such reboots.

Usecase: this is an extremely quick way to reset userspace fully when
updating image based systems, without going through a full
hardware/firmware/boot loader/kernel reset. It minimizes "grayout time"
for OS updates. (In particular when combined with kernel live patching)

Could you explain what is meant by image based operating system?

@Winterhuman
Copy link
Contributor

@brotaxt For example, using two read-only SquashFS partitions (the disk "images"), one with the current root, and the other with the updated root to reboot into

@@ -129,6 +129,11 @@ Deprecations and removals:

Features:

* refuse using the switch-reboot operation without /etc/initrd-release. Now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you meant switch-root, not switch-reboot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet