Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that Type=idle does not actually wait for all running jobs #4116

Closed
1 of 2 tasks
martinpitt opened this issue Sep 9, 2016 · 5 comments
Closed
1 of 2 tasks

Comments

@martinpitt
Copy link
Contributor

martinpitt commented Sep 9, 2016

Submission type

  • Bug report
  • Request for enhancement (RFE)

systemd version the issue has been seen with

231

Used distribution

Debian, Ubuntu

In some cases there are running jobs after the default target is reached. In this case, commands from Type=idle services are already started after multi-user.target is reached, not after all running jobs are finished. This is contrary to what the manpage says: "execution of the service binary is delayed until all jobs are dispatched".

Context: cloud-init is using a Type=idle unit to run package installations and arbitrary customization commands, which does not work well if the system is still booting up (new services installed by packages don't get started, or run into dependency loops).

This can be reproduced in a VM or with systemd-nspawn -b --bind sys (as this needs udev). First, create a Type=idle unit that shows wheter the system is already booted and which jobs are running:

# cat <<EOF > /etc/systemd/system/xxx.service
[Unit]
Description=XXX

[Service]
Type=idle
ExecStart=/bin/sh -c 'echo WASHERE; systemctl is-system-running; systemctl list-jobs'

[Install]
WantedBy=multi-user.target
EOF
systemctl enable xxx.service

Now cause a running job:

echo "/dev/bogus /mnt ext4 nofail 0 0" >> /etc/fstab

Reboot, and check:

$ sudo machinectl shell [...] /bin/systemctl status  xxx
● xxx.service - XXX
   Loaded: loaded (/etc/systemd/system/xxx.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Fri 2016-09-09 15:20:25 CEST; 1s ago
  Process: 239 ExecStart=/bin/sh -c echo WASHERE; systemctl is-system-running; systemctl list-jobs (code=exited, status=0/SUCCESS)
 Main PID: 239 (code=exited, status=0/SUCCESS)

Sep 09 15:20:19 donald systemd[1]: Started XXX.
Sep 09 15:20:25 donald sh[239]: WASHERE
Sep 09 15:20:25 donald sh[239]: starting
Sep 09 15:20:25 donald sh[239]: JOB UNIT             TYPE  STATE
Sep 09 15:20:25 donald sh[239]:  20 mnt.mount        start waiting
Sep 09 15:20:25 donald sh[239]:  21 dev-bogus.device start running
Sep 09 15:20:25 donald sh[239]: 2 jobs listed.

So we see that we have a running job dev-bogus.device and systemd (rightfully) thinks it's still in the "starting" phase, not "running". And yet the idle command already fired.

A full debug journal is at https://launchpadlibrarian.net/283485126/journal.txt with dev-sda3.device as the thing to wait for.

Downstream bug: https://launchpad.net/bugs/1621846

@martinpitt martinpitt added bug 🐛 Programming errors, that need preferential fixing pid1 labels Sep 9, 2016
@michich
Copy link
Contributor

michich commented Sep 9, 2016

Type=idle is a hack whose only purpose is cosmetic - to decrease the chance of printing stuff to the console when the login prompt is already displayed. It was never meant as a reliable ordering feature. It will not wait for more than 5 seconds before getting bored and starting the service anyway.
This should be documented better.

@martinpitt
Copy link
Contributor Author

Thanks for pointing out the "5 seconds", I wasn't aware of that. So if that's a documentation bug, then let's treat it that way.

I'm not pinned on Type=idle, it just seemed to be the most appropriate tool here. I. e. I want to start something after the initial boot transaction, as anything which you would normally make a dependency of multi-user.target will be part of the boot transaction and thus you cannot (synchronously) start new services from it (like through package installs).

Another way to enqueue a job after the initial boot transaction might be to run systemctl start --no-block xxx.service from a unit within multi-user.target. Is that a better approach? This wouldn't actually directly fix our issue yet as is-system-running would still think it's starting, not running, but we could work around that in our packaging helpers.

@fsateler
Copy link
Member

fsateler commented Sep 9, 2016

@martinpitt wouldn't it be sufficient to order the service After=multi-user.target ?

@martinpitt
Copy link
Contributor Author

@fsateler: yes, I'm currently experimenting with that and changing invoke-rc.d to look at is-active default.target instead of systemctl is-system-booted. This seems to be a more robust way indeed.

@martinpitt martinpitt changed the title Type=idle does not actually wait for all running jobs Document that Type=idle does not actually wait for all running jobs Sep 9, 2016
@martinpitt martinpitt added documentation pid1 and removed bug 🐛 Programming errors, that need preferential fixing pid1 labels Sep 9, 2016
@poettering poettering added this to the v232 milestone Oct 10, 2016
@poettering
Copy link
Member

docfix waiting in #4348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants