Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'telinit u' infinitely re-exec's if the fallback telinit the distro configured at compile time is a symlink pointing back to systemctl #31220

Closed
mbiebl opened this issue Feb 6, 2024 · 27 comments · Fixed by #31251

Comments

@mbiebl
Copy link
Contributor

mbiebl commented Feb 6, 2024

This bug report is forwarded from Daniel P. Berrangé berrange@redhat.com and has been filed originally downstream at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1063147

Running 'telinit u' within a podman container results in an infinite
loop as telinit repeatedly re-exec's itself.

This behaviour comes from systemctl.c which has this logic for handling
'telinit':

                if (sd_booted() > 0) {
                        arg_action = _ACTION_INVALID;
                        return telinit_parse_argv(argc, argv);
                } else {
                        /* Hmm, so some other init system is running, we need to forward this request to it.
                         */
                        arg_action = ACTION_TELINIT;
                        return 1;
                }

Inside a container 'sd_booted()' will (typically) indicate systemd is
NOT running, thus the 'else' clause will be followed. ACTION_TELINIT
instructs the caller to execve() the telinit binary belonging to any
non-systemd init impl, if any.

This binary is determined by the 'telinit-path' meson build option,
which defaults to /lib/sysvinit/telinit.

Debian used to override this to /usr/lib/sysvinit/telinit, but a few
months ago in Sid, this was changed to point to /usr/sbin/telinit,
which is a symlink back to /usr/bin/systemctl:

https://salsa.debian.org/systemd-team/systemd/-/commit/da95bc801088a6ab454851cf01addf97dd2c1ab3

IOW, Debian dpkg build has told systemd that the non-systemd telinit
binary is the systemd telinit binary. Hilarity now ensues as it ends
up exec'ing itself for all eternity :-)

You might ask why should anyone hit this scenario ? Well consider that
the 'libc6' package will run 'telinit u' in its postinst script, if it
sees it is in a non-systemd environment.

Not immediately a problem, since libc6 will be pre-installed in any
container or VM disk image and thus the 'postinst' script won't run
[not sure if 'postinst' is run on upgrades too ?].

Debian containers are an execellent environment for testing cross
compiles though, and thus people will install a foreign arch libc6
package in the container which does trigger the postinst script. The
following hangs (well loops forever in execve()):

  $ podman run -it debian:sid-slim
  # dpkg --add-architecture i386
  # apt-get update
  # apt-get install systemd-sysv
  # apt-get install libc6:i386

Simpler example

  $ podman run -it debian:sid-slim
  # dpkg --add-architecture i386
  # apt-get update
  # apt-get install systemd-sysv strace
  # strace -e trace=execve telinit u
  strace: Process 232065 attached
  execve("/usr/sbin/telinit", ["telinit", "u"], 0x7ffd9b26ba00 /* 24 vars */) = 0
  execve("/usr/sbin/telinit", ["telinit", "u"], 0x7ffdab55f1d0 /* 24 vars */) = 0
  execve("/usr/sbin/telinit", ["telinit", "u"], 0x7ffe79152400 /* 24 vars */) = 0
  ....1000000's more times....

Even then though, most people won't (knowingly) install the systemd-sysv
dpkg, so won't hit this problem. A few packages will pull in systemd-sysv
behind the scenes, so you can unwittingly hit the problem. For libvirt
CI, we install the open-iscsi and policykit-1 packages which both pull
in systemd-sysv, so this hangs:

  $ podman run -it debian:sid-slim
  # dpkg --add-architecture i386
  # apt-get update
  # apt-get install systemd-sysv
  # apt-get install open-iscsi:i386

Our foreign arch CI jobs that use Sid are thus suffering broken container
builds right now.

The simple solution appears to be to just remove the '-Dtelinit-path'
option from debian/rules, and leave it on systemd's built-in defaults.
The binary at this default path won't exist, and thus on a non-systemd
execution environment 'telinit u' will simply exit with an error:

  # telinit u
  Couldn't find an alternative telinit implementation to spawn.

which is a sensible behaviour and what has happened in containers with
Debian until recent Sid. Other distros (eg Fedora) leave the telinit
binary on systemd's default (non-existant) path too.

Possibly the upstream systemctl.c code should be made to protect itself
against such a mis-configuration by setting an env variable it can look
at to detect re-exec of itself.

Possibly libc6 package postinst script should skip running its
'telinit u' action if it detects it is inside a container, though that
could possibly break something if people do have a real in-systemd init
running ? Seems fairly low probability.

With regards,
Daniel

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 6, 2024

As some more background information:
/lib/init/telinit is a Debianism and was only available during the transition period from sysvinit to systemd. It has since been removed from the (still existing) sysvinit package and the sysvinit package nowadays only provides /usr/sbin/telinit

@poettering
Copy link
Member

So what is this bug about? What are you expecting us to do about this?

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 6, 2024

A straight forward fix could be to extend the check and see if /ùsr/sbin/telinit is a symlink to systemctl to determine if actually a different init system is used.

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 6, 2024

i.e. if sd_booted is false but telinit is a symlink to systemctl, do not rexec

@poettering
Copy link
Member

but why? if there's no need to chainload telinit, just turn it off?

@poettering poettering added systemctl needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer labels Feb 6, 2024
@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 6, 2024

The check with sd_booted is too simplistic to determine whether another init system is running.

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 6, 2024

but why? if there's no need to chainload telinit, just turn it off?

There might be. The still existing sysvinit-core package does provide /usr/sbin/telinit.

@bluca
Copy link
Member

bluca commented Feb 6, 2024

but why? if there's no need to chainload telinit, just turn it off?

Yes there's that, but at the same time:

Hilarity now ensues as it ends
up exec'ing itself for all eternity

This sounds broken to me, we should try to do some bare sanity check like chasing the path and checking that it's not ourself, so that we don't get in a loop like that. It's fine to say that some configuration value is wrong or broken, but there should be some minimal attempt at failing "cleanly" rather than getting stuck in a loop that requires a hard-reboot.

@bluca bluca removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Feb 6, 2024
@poettering
Copy link
Member

dunno, i am not convinced we need to protect against broken distro setups like this. i mean, there are a million ways to fuck things up, i am not sure it's really worth protecting against this one. It doesn't really happen IRL, it's a distro bug, that's all.

@poettering poettering changed the title 'telinit u' infinitely re-exec's itself inside containers 'telinit u' infinitely re-exec's if the fallback telinit the distro configured at compile time is a symlink pointing back to systemctl Feb 6, 2024
@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 6, 2024

What exactly is this code supposed to do if not signal telinit from another init? The code as-is is pretty pointless otherwise.
It should be fixed or removed.

@berrange
Copy link
Contributor

berrange commented Feb 6, 2024

What exactly is this code supposed to do if not signal telinit from another init? The code as-is is pretty pointless otherwise. It should be fixed or removed.

AFAICT, the upstream code would behaviour correctly if the distro did not mis-configure it at build time.

It would be valid to have a distro use alternatives to put a systemd and non-systemd telinit binary at the same user facing path, which is a symlink to the real path, thus allowing parallel install of systemd and non-systemd. If doing that though, the meson telinit-path option should point to the real binary location, not the alternatives symlink.

If the distro is not using alternatives and instead just creating 2 packages which provide the same clashing binary path, then only one of those 2 packages could be installed at any time. With that scenario, there's no need for the distro to set telinit-path because if the systemd impl is installed, it would prevent install of the non-systemd impl

In both cases the current systemctl code should be functionally correct already.

If there's anything systemd might change I could suggest

  • Alter the meson_options.txt description for telinit-path to make ti clear this MUST be pointing to a non-systemd telinit binary, as it is easy to mis-interpret
  • Allow telinit-path meson option to be the empty string and if so, disable building the fallback re-exec code entirely, rather than exec'ing a binary that probably won't exist.

@poettering
Copy link
Member

Well the idea is that you configure the telinit process path at build time to call out to correctly. If you point it to systemctl itself it's not going to work...

systemctl tries to be nice here, and chainloads some other telinit implementation if the system is not booted with systemd. But if you then configure things at build time to chainload systemctl itself, then what do you expect will happen except for an endless loop?

make up your mind: do you want systemctl to chainload something else if it detects that you haven#t booted systemd (i.e. run some other init system), or do you not want that? If you do want that, then pick the binary to call at systemd compile time. if you don't want that, then turn off the concept? But pointing systemctl to itself is broken, but that is your own choice there.

@poettering
Copy link
Member

If there's anything systemd might change I could suggest

* Alter the `meson_options.txt` description for `telinit-path` to make ti clear this _MUST_ be pointing to a non-systemd telinit binary, as it is easy to mis-interpret

* Allow `telinit-path` meson option to be the empty string and if so, disable building the fallback re-exec code entirely, rather than exec'ing a binary that probably won't exist.

Both make total sense to me, happy to review a patch for that.

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 6, 2024

Well the idea is that you configure the telinit process path at build time to call out to correctly. If you point it to systemctl itself it's not going to work...

Which distro ships another telinit implementation as /lib/sysvinit/telinit? Or another path besides /sbin/telinit?

If there is none, what is this "feature" supposed to provide besides being either totally useless or a way to shoot yourself in the foot?

@poettering
Copy link
Member

There used to be a bunch. For example Fedora, which is why I added this originally. If there's not interest anymore in playing nice with systems that swtch between two init systems we can certainly kill this for good.

I assumed Debian was still interested in this kind of stuff, but if noone cares anymore, we can certainly kill this and all the other sysvinit compat stuff, such as /dev/initctl support.

mrc0mmand added a commit to mrc0mmand/systemd that referenced this issue Feb 7, 2024
Let's do some basic sanity check and refuse to exec() the TELINIT binary
if it points back to us, to avoid an endless loop.

Resolves: systemd#31220
mrc0mmand added a commit to mrc0mmand/systemd that referenced this issue Feb 7, 2024
mrc0mmand added a commit to mrc0mmand/systemd that referenced this issue Feb 7, 2024
@poettering
Copy link
Member

@mbiebl are you OK if we kill the whole thing? The outcome of this would be this: on a system that has both systemd and sysvinit installed and the symlinks to the "reboot", "poweroff", "telinit", "halt" binaries pointing to systemctl, you will no longer be able to reboot the machine if you booted into sysvinit.

Or in other words: if you are running sysvinit, and are preparing to boot into systemd next, you cannot win anymore: since you would need to make the symlinks point to systemd. But once you do that, you cannot reboot sysvinit anymore.

I'd be fine with killing this. It is my understanding that this only really matters for distros that support multiple init systems, i.e. Debian. But if you clearly tells us that this doesn't matter for Debian anymore, then I'd be deligted to just remove this compat cruft.

You don't need this glue neither on pure-systemd nor on pure-sysvinit systems. It's only relevant on systems which both where you want to switch between the init systems after install freely.

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 7, 2024

As I wrote earlier, Debian stopped shipping /lib/sysvinit/telinit a couple of releases ago (in 2016) iirc.
Since then, there was no more /lib/sysvinit/telinit and nowadays you get:

# apt-file search -x telinit$
finit-sysv: /sbin/telinit                 
systemd-sysv: /usr/sbin/telinit
sysvinit-core: /sbin/telinit

It was my impression that Debian was the only distro ever shipping /lib/sysvinit/telinit (it was part of a transitional package which has since been dropped).
Are you sure, Fedora actually provides/provided /lib/sysvinit/telinit?

Anyway, as you see, there are still alternative sysvinit implementation in Debian and it would still be useful if say you switch from sysvinit-core to systemd, you can signal the running PID1 (i.e. sysvinit-core) to do a orderly shutdown, which is why -Dtelinit-path= was changed to /usr/sbin/telinit.
i.e.

  • sysvinit is installed and running as PID1
  • you install systemd, which replaces /sbin/telinit
  • you can use the telinit interface to signal the running PID1 to rexec/shutdown etc.

@berrange
Copy link
Contributor

berrange commented Feb 7, 2024

Anyway, as you see, there are still alternative sysvinit implementation in Debian and it would still be useful if say you switch from sysvinit-core to systemd, you can signal the running PID1 (i.e. sysvinit-core) to do a orderly shutdown, which is why -Dtelinit-path= was changed to /usr/sbin/telinit.
i.e.

  • sysvinit is installed and running as PID1
  • you install systemd, which replaces /sbin/telinit
  • you can use the telinit interface to signal the running PID1 to rexec/shutdown etc.

There's nothing that systemd's telinit can exec to talk to sysvinit after this sequence, since installing systemd at step 2 replaces the sysvinit telinit binary (assuming /sbin and /usr/sbin are the same path with usr-merge). They would have to be installed their respective telinit binaries in parallel at non-clashing paths for this to work, and in such a case -Dtelinit-path= would have to point to the sysvinit binary & thus wouldn't hit the re-exec loop.

@poettering
Copy link
Member

As I wrote earlier, Debian stopped shipping /lib/sysvinit/telinit a couple of releases ago (in 2016) iirc. Since then, there was no more /lib/sysvinit/telinit and nowadays you get:

# apt-file search -x telinit$
finit-sysv: /sbin/telinit                 
systemd-sysv: /usr/sbin/telinit
sysvinit-core: /sbin/telinit

Well, if people only noticed that this is an issue we can probably determine that noone uses this, and kill the concept.

It was my impression that Debian was the only distro ever shipping /lib/sysvinit/telinit (it was part of a transitional package which has since been dropped). Are you sure, Fedora actually provides/provided /lib/sysvinit/telinit?

Well, sysvinit is long gone from fedora afaics, and I am too lazy to do spelunking.

Anyway, as you see, there are still alternative sysvinit implementation in Debian and it would still be useful if say you switch from sysvinit-core to systemd, you can signal the running PID1 (i.e. sysvinit-core) to do a orderly shutdown, which is why -Dtelinit-path= was changed to /usr/sbin/telinit. i.e.

* sysvinit is installed and running as PID1

* you install systemd, which replaces /sbin/telinit

how does that work? i mean, doesn't dpkg consider this a conflict?

* you can use the telinit interface to signal the running PID1 to rexec/shutdown etc.

Well, how do you intend that to work, if we cannot chainload telinit anymore because we don't know where to find it? In the scenario you describe the only telinit we see is our own, and that's not gonna help.

So what's it now: do you want this supported or not? If you want this supported, then please make sure we can reach the original sysvinit telinit under some path still and then tell systemd about it at build time.

If you decide that you don't care anymore, then I'd kill this stuff as well as the /dev/initctl client and server side. Your choice.

But just taping over the whole mess by recognizing the loop is really useless. It just means we'll carry around unusable, essentially dead code.

@YHNdnzj
Copy link
Member

YHNdnzj commented Feb 7, 2024

If you decide that you don't care anymore, then I'd kill this stuff as well as the /dev/initctl client and server side. Your choice.

BTW, we've already announced that support for sysv scripts will be dropped. Maybe it's time that we should just declare the whole sysv compat will go instead.

Just like cgroup v1: it's unlikely that we'll actually actively remove everything soon, but let's shout out loud first.

@poettering
Copy link
Member

I think compat with sysv scripts we probably should keep longer around than compat with sysv APIS like this. This starts like something we could very well start with to delete.

@poettering
Copy link
Member

So two options:

  1. We kill initctl + telinit compat
  2. Debian just fixes its telinit chainloading, i.e. configures the right path, and we keep things as they are for a while.

I don't think we should do what @berrange proposed above, for the simple reason as it would just mean Debian would just turn off the chainloading too, and if so we might as well kill it because Debian is the last distro this mattered for.

Hence @mbiebl which way should we go?

I personally would lean to 1. We should do tha anyway sooner or later, and we might just do it now.

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 7, 2024

2. Debian just fixes its telinit chainloading, i.e. configures the right path, and we keep things as they are for a while.

Simply setting -Dtelinit-path=/lib/sysvinit/telinit will not fix chainloading as there is no /lib/syvinit/telinit anymore (and it's unlikely to come back in Debian). It will just mean that the problematic code is neutered.
And indeed it has been broken long enough that apparently nobody cares.

The changes that triggered all this is a result of
https://salsa.debian.org/systemd-team/systemd/-/commit/da95bc801088a6ab454851cf01addf97dd2c1ab3
which is relatively recent (3 months ago).
Doing some digging, apparently other distros ran into this trap as well, like Gentoo
https://bugs.gentoo.org/642724

So, either the problematic code is updated to speak to PID1 natively via /dev/initctl for a foreign PID1, or this code should be ripped out.

If that means all of the initctl compat interface, I'm not sure.
So far we talked about foreign PID1 - telinit from systemd

But there is also systemd as PID1 and software interacting with it using /dev/initctl. If that should be removed, I'm not sure.

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 7, 2024

But there is also systemd as PID1 and software interacting with it using /dev/initctl. If that should be removed, I'm not sure.

i.e. I don't know if there is software out there using this interface to interact with PID1 which would be broken if we ripped out all of the initctl compat interface

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 7, 2024

https://codesearch.debian.net/search?q=%2Fdev%2Finitctl&literal=1&perpkg=1

https://codesearch.debian.net/search?q=%2Frun%2Finitctl&literal=1

is an approximation of packages affected by this change of ripping out all of /dev/initctl

poettering added a commit to poettering/systemd that referenced this issue Feb 8, 2024
…in systemctl commands

To make transitions from sysvinit to systemd easier, we so far supported
a logic where we'd try to talk to sysvinit for certain commands if
systemctl detects that it is run from a non-systemd environment.

Apparently this is not used anymore by distributions and has been broken
for a while on some of them even, without anyone noticing.

Hence, let's just get rid of it, and simplify our codebase.

This will not remove other forms of sysvinit compat. Specifically, this
remains for now:

1. support for invoking sysvinit scripts as systemd services
2. support for emulation of the /dev/initctl protocol

The 2nd item is probably the part that should go next.

Fixes: systemd#31220
Replaces: systemd#31234
@poettering
Copy link
Member

#31249

@poettering
Copy link
Member

https://codesearch.debian.net/search?q=%2Fdev%2Finitctl&literal=1&perpkg=1

Hmm, that website is broken for me, page 3 keeps repeating forever...

poettering added a commit to poettering/systemd that referenced this issue Feb 8, 2024
…in systemctl commands

To make transitions from sysvinit to systemd easier, we so far supported
a logic where we'd try to talk to sysvinit for certain commands if
systemctl detects that it is run from a non-systemd environment.

Apparently this is not used anymore by distributions and has been broken
for a while on some of them even, without anyone noticing.

Hence, let's just get rid of it, and simplify our codebase.

This will not remove other forms of sysvinit compat. Specifically, this
remains for now:

1. support for invoking sysvinit scripts as systemd services
2. support for emulation of the /dev/initctl protocol

The 2nd item is probably the part that should go next.

Fixes: systemd#31220
Replaces: systemd#31234
mbiebl added a commit to mbiebl/systemd that referenced this issue Feb 8, 2024
This functionality relied on telinit being available in a different path
then the compat symlink shipped by systemd itself. This is no longer the
case for any known distro, so remove that code.

Fixes: systemd#31220
Replaces: systemd#31249
poettering added a commit to poettering/systemd that referenced this issue Feb 9, 2024
…in systemctl commands

To make transitions from sysvinit to systemd easier, we so far supported
a logic where we'd try to talk to sysvinit for certain commands if
systemctl detects that it is run from a non-systemd environment.

Apparently this is not used anymore by distributions and has been broken
for a while on some of them even, without anyone noticing.

Hence, let's just get rid of it, and simplify our codebase.

This will not remove other forms of sysvinit compat. Specifically, this
remains for now:

1. support for invoking sysvinit scripts as systemd services
2. support for emulation of the /dev/initctl protocol

The 2nd item is probably the part that should go next.

Fixes: systemd#31220
Replaces: systemd#31234
mbiebl added a commit to mbiebl/systemd that referenced this issue Feb 13, 2024
This functionality relied on telinit being available in a different path
then the compat symlink shipped by systemd itself. This is no longer the
case for any known distro, so remove that code.

Fixes: systemd#31220
Replaces: systemd#31249
bluca pushed a commit that referenced this issue Feb 15, 2024
This functionality relied on telinit being available in a different path
then the compat symlink shipped by systemd itself. This is no longer the
case for any known distro, so remove that code.

Fixes: #31220
Replaces: #31249
ayhamthemayhem pushed a commit to neighbourhoodie/nh-systemd that referenced this issue Mar 25, 2024
This functionality relied on telinit being available in a different path
then the compat symlink shipped by systemd itself. This is no longer the
case for any known distro, so remove that code.

Fixes: systemd#31220
Replaces: systemd#31249
chunyi-wu pushed a commit to chunyi-wu/systemd that referenced this issue Apr 3, 2024
This functionality relied on telinit being available in a different path
then the compat symlink shipped by systemd itself. This is no longer the
case for any known distro, so remove that code.

Fixes: systemd#31220
Replaces: systemd#31249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment