Booting to multi-user.target and doing `systemctl isolate graphical.target` does not work with 253 #26364

AdamWill · 2023-02-08T17:54:41Z

systemd version the issue has been seen with

253

Used distribution

Fedora Rawhide

Linux kernel version used

6.2.0-0.rc7.20230207git05ecb680708a.51.fc38.x86_64

CPU architectures issue was seen on

x86_64

Component

systemctl, systemd

Expected behaviour you didn't see

On a Fedora Rawhide system with a graphical desktop installed that usually boots successfully to graphical.target, I instead boot to multi-user.target, and run systemctl isolate graphical.target. This should bring up the graphical desktop.

Unexpected behaviour you saw

Instead, the system simply becomes stuck at a blank screen. This started happening when systemd 253 landed in Rawhide; with 252 it was fine.

Steps to reproduce the problem

Install Fedora Rawhide (Workstation or KDE - find images at https://openqa.fedoraproject.org/nightlies.html), boot normally to verify it works, then boot to multi-user.target (by changing the default target or booting with 3 kernel arg), log in as root, and run systemctl isolate graphical.target.

journal messages from a boot reproducing the issue

Additional program output to the terminal or log subsystem illustrating the issue

No response

The text was updated successfully, but these errors were encountered:

AdamWill · 2023-02-08T17:56:58Z

Downstream report: https://bugzilla.redhat.com/show_bug.cgi?id=2165692

keszybz · 2023-02-09T14:05:32Z

When isolate is executed, we start stopping all kinds of units, incl. dbus-broker.service, which essentially brings the machine down. The first question is what changed. 017a7ba looks a bit suspicious. There aren't other changes to src/core/ that would seem related. The first answer is that isolate is known-busted. I think we need to keep it limping along, but it'd be better to just not use it.

AdamWill · 2023-02-09T16:57:06Z

I can't actually think of a way to not use it in this case.

The use case is testing update notifications. For Fedora, we have a release criterion that the desktop must not notify the user about updates when running live, but must notify the user about updates when installed. So we need to test that.

Unfortunately, the desktops - GNOME in particular - try to be very clever about when and what to notify about, so in order for the test to be reliable, we first need to prepare some stuff so we're absolutely sure that we're in a scenario where an update notification would be shown, unless we're on the live path and the "we're running live" stipulation should prevent it.

We obviously want to do that before we reach the desktop, otherwise our attempt to fiddle with things is racing with the desktop actually checking for updates and notification timers kicking in and so on.

So the test in question boots to multi-user and does a bunch of prep - setting up a repo and downgrading a package to a dummy version to ensure an update is definitely available, then fiddling with various settings to game all the desktop's heuristics to make sure it ought to notify of the update. On GNOME we even have to set the system clock in some circumstances, because GNOME has a rule that it doesn't show update notifications between midnight and 6am (boy, that was fun to track down).

Once we're done with all of that, we do systemctl isolate graphical.target to get the graphical environment to actually start so we can run the test.

On the 'installed system' path we could just reboot at that point instead, sure. But on the "live boot" path, we obviously can't - you can't reboot a live system. I can't think of any other practical way to do all this for a live scenario. Can you?

YHNdnzj · 2023-02-09T19:41:45Z

Hmm, it’s quite surprising to me that we don’t have `IgnoreOnIsolate=yes` set on `dbus.service` 🤔

This reverts commit 5d71e46. It turns out that this commit caused a noticable change in behaviour for 'systemctl isolate graphical.target' in Fedora, as found by git bisect. Reverting on top of current git also restores behaviour from v252. I don't have time to analyze this right now, so this is a quick revert to unblock Fedora and possibly allow us to release v253 in case a full solution is harder. Fixes systemd#26364.

keszybz · 2023-02-09T22:08:41Z

The problem is that isolate is a bad idea. The systemd unit model is a mix of dependency-based logic and event-based logic, incl. hardware changes and user actions like logins. The isolation logic works for a static dependency-based system, but is very hard to reconcile with units started in response to state changes outside of systemd. We have been papering this over by adding IgnoreOnIsolate on this and that, but this is not a solution. In particular, it would require that we only use isolate for one specific purpose. As soon as you have units that should be stopped in one target that might be isolated but not in some other one, this approach breaks down. This can be compared with starting of units: units are grouped into targets and arbitrary combinations can be started and stopped via Conflicts depending on what is needed.

I can't think of any other practical way to do all this for a live scenario.

Why not just do systemctl start graphical.target?

AdamWill · 2023-02-09T22:27:11Z

well, if that's supported/intended I can certainly do that. I just recall from past experience/documentation that isolate was supposed to be The Right Way to change between targets. If using start is supposed to be OK, though, I can certainly give that a shot.

AdamWill · 2023-02-09T23:01:22Z

OK, using start seems to work, so I changed the test to do that.

dtardon · 2023-02-10T08:00:33Z

well, if that's supported/intended I can certainly do that. I just recall from past experience/documentation that isolate was supposed to be The Right Way to change between targets.

Only if one wants to mimic the "runlevels" behavior, i.e., only units needed by the new target should continue to run. If one wants to run something in addition to the current set, then systemctl start ... is the right way. There's no difference in this case anyway, as graphical.target requires multi-user.target. Therefore, a boot into graphical.target and a boot into multi-user.target followed by systemctl start graphical.target
should have the same effect. The second one just splits the operation into two steps, that's all.

poettering · 2023-02-10T12:03:56Z

We need debug logs for this. i.e. systemd-analyze log-level debug before you trigger the issue. Otherwise there's nothing we can do.

…gered by units we keep running Inspired by: systemd#26364 (this might even "fix" systemd#26364, but without debug logs it's hard to make such claims)

YHNdnzj · 2023-02-10T13:25:56Z

I reproduced this with debug log enabled in a Fedora VM: https://fars.ee/y5Ps

However, I was not able to reproduce this on Arch - dbus.service is not stopped there (tried both freedesktop/dbus and dbus-broker).

bluca · 2023-02-10T13:42:47Z

I reproduced this with debug log enabled in a Fedora VM: https://fars.ee/y5Ps

However, I was not able to reproduce this on Arch - dbus.service is not stopped there (tried both freedesktop/dbus and dbus-broker).

@YHNdnzj are you able to reproduce it after applying the fix from #26388 ?

keszybz · 2023-02-10T13:43:01Z

well, if that's supported/intended I can certainly do that. I just recall from past experience/documentation that isolate was supposed to be The Right Way to change between targets.

Only if one wants to mimic the "runlevels" behavior, i.e., only units needed by the new target should continue to run.

The analogy with runlevels in sysvinit is not exact. When a runlevel change was triggered, sysvinit would start a bunch of scripts (S* and K*), depending on the runlevel configuration. But stuff that was not covered by those scripts wouldn't generally be touched. There was no notion of "kill everything that doesn't have an S script for this runlevel", just because there was no notion of ownership of processes. So e.g. stuff that would have been launched in response to udev triggers would almost certainly survive a runlevel change. Similarly, stuff spawned from other services, e.g. user sessions or remote logins, would likewise not be touched. This is different from isolate, where things are either in the dependency tree of the new target, or explicitly excluded, or killed.

YHNdnzj · 2023-02-10T13:50:00Z

I reproduced this with debug log enabled in a Fedora VM: https://fars.ee/y5Ps
However, I was not able to reproduce this on Arch - dbus.service is not stopped there (tried both freedesktop/dbus and dbus-broker).

@YHNdnzj are you able to reproduce it after applying the fix from #26388 ?

I'm not really familiar with Fedora's build system 🤔

But TBH it feels weird that this doesn't trigger on Arch

bluca · 2023-02-10T13:55:18Z

I reproduced this with debug log enabled in a Fedora VM: https://fars.ee/y5Ps
However, I was not able to reproduce this on Arch - dbus.service is not stopped there (tried both freedesktop/dbus and dbus-broker).

@YHNdnzj are you able to reproduce it after applying the fix from #26388 ?

I'm not really familiar with Fedora's build system thinking

But TBH it feels weird that this doesn't trigger on Arch

@YHNdnzj I have not tried it, but there are instructions to install the packages built by the CI from that PR, so shouldn't be necessary to build it by hand, if you want to give it a shot: https://dashboard.packit.dev/results/copr-builds/614038

keszybz · 2023-02-10T14:05:04Z

No need, I'm checking Lennart's patch now.

…gered by units we keep running Inspired by: systemd#26364 (this might even "fix" systemd#26364, but without debug logs it's hard to make such claims) Fixes: systemd#23055

…gered by units we keep running Inspired by: #26364 (this might even "fix" #26364, but without debug logs it's hard to make such claims) Fixes: #23055

…gered by units we keep running Inspired by: systemd#26364 (this might even "fix" systemd#26364, but without debug logs it's hard to make such claims) Fixes: systemd#23055

…gered by units we keep running Inspired by: systemd#26364 (this might even "fix" systemd#26364, but without debug logs it's hard to make such claims) Fixes: systemd#23055 (cherry picked from commit 32d6707)

…gered by units we keep running Inspired by: systemd#26364 (this might even "fix" systemd#26364, but without debug logs it's hard to make such claims) Fixes: systemd#23055 (cherry picked from commit 32d6707) (cherry picked from commit c973e22)

…gered by units we keep running Inspired by: systemd#26364 (this might even "fix" systemd#26364, but without debug logs it's hard to make such claims) Fixes: systemd#23055 (cherry picked from commit 32d6707) (cherry picked from commit c973e22) (cherry picked from commit bfe6d1d) (cherry picked from commit 54b580e)

AdamWill added the bug 🐛 Programming errors, that need preferential fixing label Feb 8, 2023

github-actions bot added pid1 systemctl labels Feb 8, 2023

yuwata added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Feb 8, 2023

yuwata added this to the v253 milestone Feb 8, 2023

dtardon added the downstream/fedora Tracking bugs for Fedora label Feb 9, 2023

keszybz mentioned this issue Feb 9, 2023

Revert "logind: implement Type=notify-reload protocol properly" #26383

Closed

poettering mentioned this issue Feb 10, 2023

core: when isolating to a unit, also keep units running that are trig… #26388

Merged

bluca linked a pull request Feb 10, 2023 that will close this issue

core: when isolating to a unit, also keep units running that are trig… #26388

Merged

bluca closed this as completed in #26388 Feb 10, 2023

someplaceguy mentioned this issue Jul 25, 2023

/run/wrappers not mounted when boot.initrd.systemd.enable = true NixOS/nixpkgs#244737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Booting to multi-user.target and doing `systemctl isolate graphical.target` does not work with 253 #26364

Booting to multi-user.target and doing `systemctl isolate graphical.target` does not work with 253 #26364

AdamWill commented Feb 8, 2023

AdamWill commented Feb 8, 2023

keszybz commented Feb 9, 2023

AdamWill commented Feb 9, 2023 •

edited

Loading

YHNdnzj commented Feb 9, 2023 via email

keszybz commented Feb 9, 2023

AdamWill commented Feb 9, 2023

AdamWill commented Feb 9, 2023

dtardon commented Feb 10, 2023

poettering commented Feb 10, 2023

YHNdnzj commented Feb 10, 2023 •

edited

Loading

bluca commented Feb 10, 2023

keszybz commented Feb 10, 2023

YHNdnzj commented Feb 10, 2023 •

edited

Loading

bluca commented Feb 10, 2023

keszybz commented Feb 10, 2023

Booting to multi-user.target and doing systemctl isolate graphical.target does not work with 253 #26364

Booting to multi-user.target and doing systemctl isolate graphical.target does not work with 253 #26364

Comments

AdamWill commented Feb 8, 2023

systemd version the issue has been seen with

Used distribution

Linux kernel version used

CPU architectures issue was seen on

Component

Expected behaviour you didn't see

Unexpected behaviour you saw

Steps to reproduce the problem

Additional program output to the terminal or log subsystem illustrating the issue

AdamWill commented Feb 8, 2023

keszybz commented Feb 9, 2023

AdamWill commented Feb 9, 2023 • edited Loading

YHNdnzj commented Feb 9, 2023 via email

keszybz commented Feb 9, 2023

AdamWill commented Feb 9, 2023

AdamWill commented Feb 9, 2023

dtardon commented Feb 10, 2023

poettering commented Feb 10, 2023

YHNdnzj commented Feb 10, 2023 • edited Loading

bluca commented Feb 10, 2023

keszybz commented Feb 10, 2023

YHNdnzj commented Feb 10, 2023 • edited Loading

bluca commented Feb 10, 2023

keszybz commented Feb 10, 2023

Booting to multi-user.target and doing `systemctl isolate graphical.target` does not work with 253 #26364

Booting to multi-user.target and doing `systemctl isolate graphical.target` does not work with 253 #26364

AdamWill commented Feb 9, 2023 •

edited

Loading

YHNdnzj commented Feb 10, 2023 •

edited

Loading

YHNdnzj commented Feb 10, 2023 •

edited

Loading