Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plymouth does not remain stopped with v245, takes over tty1 #15091

Closed
mbiebl opened this issue Mar 11, 2020 · 22 comments
Closed

plymouth does not remain stopped with v245, takes over tty1 #15091

mbiebl opened this issue Mar 11, 2020 · 22 comments
Labels
pid1 regression ⚠️ A bug in something that used to work correctly and broke through some recent commit

Comments

@mbiebl
Copy link
Contributor

mbiebl commented Mar 11, 2020

systemd version the issue has been seen with

v245

Used distribution

Debian sid

After the upgrade from v244.3 to v245 I noticed, that plymouth is no longer stopped during boot.
tty1 continues to show the plymouth boot splash even after the system has booted fully.

The following is strange:

# dpkg -L plymouth | grep quit
/lib/systemd/system/plymouth-quit-wait.service
/lib/systemd/system/plymouth-quit.service
/lib/systemd/system/multi-user.target.wants/plymouth-quit-wait.service
/lib/systemd/system/multi-user.target.wants/plymouth-quit.service

# systemctl status plymouth-quit.service 
● plymouth-quit.service - Terminate Plymouth Boot Screen
     Loaded: loaded (/lib/systemd/system/plymouth-quit.service; static; vendor preset: enabled)
     Active: inactive (dead)

# systemctl status plymouth-start.service 
● plymouth-start.service - Show Plymouth Boot Screen
     Loaded: loaded (/lib/systemd/system/plymouth-start.service; static; vendor preset: enabled)
     Active: active (running) since Thu 2020-03-12 00:06:30 CET; 7min ago
    Process: 778 ExecStart=/usr/sbin/plymouthd --mode=boot --pid-file=/var/run/plymouth/pid --attach-to-session (code=exited, status=0/SUCCESS)
    Process: 781 ExecStartPost=/usr/bin/plymouth show-splash (code=exited, status=0/SUCCESS)
   Main PID: 780 (plymouthd)
      Tasks: 1 (limit: 19006)
     Memory: 41.3M
     CGroup: /system.slice/plymouth-start.service
             └─780 @usr/sbin/plymouthd --mode=boot --pid-file=/var/run/plymouth/pid --attach-to-session

Mär 12 00:06:30 pluto systemd[1]: Starting Show Plymouth Boot Screen...
Mär 12 00:06:30 pluto systemd[1]: Started Show Plymouth Boot Screen.

@mbiebl mbiebl added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Mar 11, 2020
@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 11, 2020

(tested with lightdm and sddm)

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

Seems to be caused by commit 097537f

@bigon
Copy link

bigon commented Mar 12, 2020

I think it's somehow worse, updating fro 244 to 245 in debian just now, without even rebooting, plymouth was restarted at once

@yuwata yuwata added the pid1 label Mar 12, 2020
@keszybz
Copy link
Member

keszybz commented Mar 12, 2020

Yes, this is well known, and plymouth needs an update to add RemainAfterExit=yes as discussed in https://bugzilla.redhat.com/show_bug.cgi?id=1807771. Distributions should not push systemd-245 without taking care of that.

@keszybz keszybz closed this as completed Mar 12, 2020
@mbiebl mbiebl reopened this Mar 12, 2020
@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

I don't think it's ok to close this.

@keszybz
Copy link
Member

keszybz commented Mar 12, 2020

Well, what exactly do you expect us to do here? Plymouth maintainers are in contact and a fix has been agreed.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

This is a significant change in behaviour. The communication regarding this change is non-existent.
Apparently it also affects other services besides plymouth.
We shouldn't just frivolously break users systems and hope downstreams mop up for us.
We should be more careful not to break systems.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

What should we do: revert, find affected software, fix them, make releases for them, then reapply the change in systemd.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

That this breakage was known makes this even worse, tbh.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

If this issue was known: why was this not communicated much more prominently, e.g. in the release notes with a big fat warning?

@keszybz
Copy link
Member

keszybz commented Mar 12, 2020

It's an issue that affects two projects: systemd and plymouth. One was fixed, the other is being fixed. This should be handled like any other bug that is detected to affect a release in a distribution: either delay introduction of the release into the distribution, or patch one of the two affected packages locally. Doing a revert in systemd upstream would be fully pointless at this point: by the time v246 is released it will be way too late to make any difference.

If this issue was known: why was this not communicated much more prominently, e.g. in the release notes with a big fat warning?

The scope of the issue wasn't fully understood.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

A revert in via v245-stable and in master seems reasonable to me.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

It's an issue that affects two projects:

Are you absolutely sure about the scope of that? @AdamWill seems to disagree here.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

If this issue was known: why was this not communicated much more prominently, e.g. in the release notes with a big fat warning?

The scope of the issue wasn't fully understood.

At the same time you are blaming distros for not being careful enough:

Yes, this is well known, and plymouth needs an update to add RemainAfterExit=yes as discussed in https://bugzilla.redhat.com/show_bug.cgi?id=1807771. Distributions should not push systemd-245 without taking care of that.

I spent quite a few hours debugging this and I'm quite cranky atm.

@keszybz
Copy link
Member

keszybz commented Mar 12, 2020

I'll add a note to NEWS.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

@keszybz what's the underlying issue here: Why does systemd try to start plymouth-start.service after it has been stopped? Isn't this a potential issue for any oneshot service which doesn not have RemainAfterExit=yes?

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

I.e., what makes you sure, only plymouth is affected.

@keszybz
Copy link
Member

keszybz commented Mar 12, 2020

Please see the linked PR and the Fedora bug. If it is still unclear, let me know.

Isn't this a potential issue for any oneshot service which doesn not have RemainAfterExit=yes

It's only an issue for those services which should not run again. systemd-network-generator.service was also affected, but it is idempotent so is not affected by the issue (except for wasting some work).

what makes you sure, only plymouth is affected.

It is certainly possible that other units are affected. But the change seems to matter most for services which are part of the initial transaction, and fail catastrophically when run again at the same time. Hopefully they aren't that common.

@akiernan
Copy link

The Yocto test suite fails for this in a couple of places - one, which is basically the same as the plymouth issue where the psplash service doesn't have RemainAfterExit.

The other is different though. systemd-timesyncd is stopped during a test for manual date changes using systemctl stop systemd-timesyncd. Post this change, systemd-timesyncd gets restarted by SSHing back into the box, which feels surprising.

@mbiebl
Copy link
Contributor Author

mbiebl commented Mar 12, 2020

@akiernan good point. This new behaviour doesn't make sense to me

@mrc0mmand
Copy link
Member

I think the issue from @akiernan should be properly addressed as well, reopening.

@mbiebl mbiebl changed the title plymouth is not stopped with v245 plymouth does not remain stopped with v245, takes over tty1 Mar 27, 2020
eworm-de pushed a commit to eworm-de/systemd that referenced this issue Jun 23, 2020
@keszybz
Copy link
Member

keszybz commented Jan 4, 2022

The other is different though. systemd-timesyncd is stopped during a test for manual date changes using systemctl stop systemd-timesyncd. Post this change, systemd-timesyncd gets restarted by SSHing back into the box, which feels surprising.

So this is how the systemd unit dependencies work and how they are supposed to work. Essentially, every time any unit it started, all the dependencies are recursively executed too. For example, if systemd-timesyncd.service is part of basic.target, and multi-user.target wants basic.target, any time anything which wants multi-user.target is started, a start job will be created for systemd-timesyncd.service.

This means that stopping a unit that deep in the dependency tree is not enough to ensure that it'll not be started. The unit would have to be masked to ensure it is not stopped, or it must not be a transitive dependency of any unit that can be started.

I think that this recursive behaviour is something that we might want to reconsider. In particular, it wastes a lot of work at runtime, because systemd will constantly scan the whole dependency tree and create noop jobs. Also, it creates various annoying situations where it's hard to do "maintainance work" while other units are active. I think we could make the dependency resolution non-recursive by default, and add a new mode where it becomes recursive upon request. But this would require a lot of thought and we'd probably uncover many cases where we depend on the recursive behaviour. The current state is that the reported behaviour is what is expected.

I'll close this bug here, because it was about plymouth and that should be long fixed now (and the other part is not a bug).

@keszybz keszybz closed this as completed Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pid1 regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Development

No branches or pull requests

6 participants