Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal readiness to sd-daemon #7659

Closed
wants to merge 1 commit into from
Closed

Conversation

anarcat
Copy link

@anarcat anarcat commented Jul 3, 2023

This allows systemd to notice sway's startup is complete which allows, for example, the session manager to start Wayland programs in the right order. Without this signal, users have to go through rather horrible hacks to tell systemd that it can start further units.

I've been using NotifyAccess=all in the sway.service file and exec systemd-notify --ready in my sway config to emulate this, but it's racy and error-prone.

A particularly nasty bug triggered by NotifyAccess=all in particular is when podman starts and then terminates a container. In that context, conmon(8) ends up notifying systemd it's the session master and takes over thee "Main PID" field in systemd. When it dies, systemd believes the session is over and proceeds to kill the entire session. This is explained in more details in:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039857

This might not actually be the right place to do this. We call server_run right after, and maybe there would be a better place. But server_run only calls wl_display_run and that's part of the core wayland library. I'm not sure Wayland itself is a place to do this, so for now I'm scratching my own itch and doing this in Sway itself.

This allows systemd to notice sway's startup is complete which allows,
for example, the session manager to start Wayland programs in the
right order. Without this signal, users have to go through rather
horrible hacks to tell systemd that it can start further units.

I've been using `NotifyAccess=all` in the sway.service file and `exec
systemd-notify --ready` in my sway config to emulate this, but it's
racy and error-prone.

A particularly nasty bug triggered by `NotifyAccess=all` in particular
is when podman starts and then terminates a container. In that
context, conmon(8) ends up notifying systemd it's the session master
and takes over thee "Main PID" field in systemd. When it dies, systemd
believes the session is over and proceeds to kill the entire
session. This is explained in more details in:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039857

This might not actually be the right place to do this. We call
`server_run` right after, and maybe there would be a better place. But
`server_run` only calls `wl_display_run` and that's part of the core
`wayland` library. I'm not sure Wayland itself is a place to do this,
so for now I'm scratching my own itch and doing this in Sway itself.
@emersion
Copy link
Member

emersion commented Jul 3, 2023

NACK, sorry. We will not add init-specific logic.

You can achieve the same result with an exec in the config file.

@emersion emersion closed this Jul 3, 2023
@anarcat
Copy link
Author

anarcat commented Jul 3, 2023

You can achieve the same result with an exec in the config file.

Not quite. To have this work with an exec, you need NotifyAccess=all in the systemd unit file and that leads to unpleasant crashes when some other child of sway takes over the systemd session.

I haven't proposed this in jest: there's already systemd-specific logic in sway, specifically in swaybar, which can already link to systemd.

I don't want to bug you with this at all, I just wanted to let you know I did try to propose this very carefully and it seemed like a rather modest (one-line) change...

@emersion
Copy link
Member

emersion commented Jul 3, 2023

Not quite. To have this work with an exec, you need NotifyAccess=all in the systemd unit file and that leads to unpleasant crashes when some other child of sway takes over the systemd session.

Then you need to wrap Sway with logic to pass the ready FD differently to the exec (e.g. different env var).

I haven't proposed this in jest: there's already systemd-specific logic in sway, specifically in swaybar, which can already link to systemd.

It's not systemd-specific. swaybar uses only the sd-bus library which can be provided by e.g. basu.

@anarcat
Copy link
Author

anarcat commented Jul 3, 2023

Then you need to wrap Sway with logic to pass the ready FD differently to the exec (e.g. different env var).

I'm not sure I understand what you mean here. systemd just does not allow other processes to do sd_notify, I fail to see what "pass the ready FD differently" refers to here. What sd_notify does is basically write a message to a unix socket with a predefined format. Typically key/value pairs. In this case we just send READY=1, something that, yes, you can do from an exec, but not unless you allow all child processes of sway to also send such changes. And in my case, other processes do send such changes to manage systemd's expectation, in particular conmon(8) passes MAINPID= which breaks the whole setup.

It's not systemd-specific. swaybar uses only the sd-bus library which can be provided by e.g. basu.

I understand what you mean here, but it seems to me basu could easily be patched to support sd_notify as well and in fact there are other implementations that do support it, most notably elogind.

But I understand there's probably no other service manager out there that supports receiving those messages. I don't see this as a permanent limitation though...

Anyway, I don't want to drag you down this annoying systemd lane if you don't want to. :)

@rpigott
Copy link
Member

rpigott commented Jul 3, 2023

systemd uses graphical-session.target to order units after the display server. If you want to run sway as a user service you will need to start that target anyway.

@anarcat
Copy link
Author

anarcat commented Jul 3, 2023 via email

@rpigott
Copy link
Member

rpigott commented Jul 3, 2023

You're thinking of graphical.target. graphical-session.target is indeed a user service for exactly this purpose.

@anarcat
Copy link
Author

anarcat commented Jul 3, 2023

well i guess there's many ways to do this. the way i set it up here is that sway.service BindsTo= the graphical-session.target... it does not merely start it, it is the session...

but this is getting wildly out of topic here. if the sway devs don't want this, i'll just go play elsewhere. :)

@rpigott
Copy link
Member

rpigott commented Jul 3, 2023

the way i set it up here is that sway.service BindsTo= the graphical-session.target..

That's incorrect, and your suggested patch doesn't even fix it. A valid DISPLAY or WAYLAND_DISPLAY in the user manager startup environment is an implicit assumption of graphical-session.target, which again, is the special target to "start Wayland programs in the right order" as you say. You need to manually import these vars from sway with systemctl import-environment before starting a session target, which is then bound to graphical-session.target.

If you wanted sway to handle this for you, you would need to do the SetEnvironment dbus calls within sway. Which also will not be accepted.

@anarcat
Copy link
Author

anarcat commented Jul 3, 2023 via email

@rpigott
Copy link
Member

rpigott commented Jul 3, 2023

Yes. I am aware of that as well. That's something that can actually be
done with an exec, and safely.

Not with your stated setup it can't. If graphical-session starts before the display variables are imported it is incorrect from an ordering standpoint, which is what this (incorrect) patch is trying to address. As is, it is possible to get the ordering correct with a target unit. With your patch, the order is always incorrect.

@anarcat
Copy link
Author

anarcat commented Jul 4, 2023

I might very well be incorrect, but as far as I can tell, this works. I'm actually using this patch in my session now.

I also believe the ordering is correct. I am not sure when the exec commands are ran in the sway bootstrap sequence, but I was assuming they were running before the server_run call, and therefore the variables are imported before the sd_notify is sent which, from what I have gathered, is the correct ordering. I could very well be wrong about that of course.

But again, I don't want to drag you fine folks too much into the systemd ordering world. There's a clear desire to have nothing to do at all with init systems in general and, it seems to me, with systemd --user in particularm, so I don't see discussing details of the systemd session manager as being productive here.

As you have stated, nothing here regarding systemd will get merged, even if it would be correctly implemented, so it seems rather futile for me to have to justify my reasoning when it's going to be shot down ad hominem anyway...

@rpigott
Copy link
Member

rpigott commented Jul 4, 2023

Sway guarantees that WAYLAND_DISPLAY/DISPLAY are present in the environment for exec, however services ordered after this ready notify cannot be guaranteed the presence of DISPLAY/WAYLAND_DISPLAY in the systemd user manager activation environment. If either

  1. The user does not exec systemctl import-environment [...] (totally valid usage of sway)
  2. The user does exec systemctl import-environment [...], but it doesn't complete before the notify call.

sway does not wait on exec'd processes and systemd parallelizes unit transactions aggressively, so this is still possible. To guarantee correctness, you need a target unit that is ordered after the completion of import-environment, and systemd already includes a special target you can bind to, graphical-session.target, which is used for this purpose in many unit files.

That is why this patch does not improve upon the status quo. Fortunately, I don't think there is any problem with using graphical-session.target, so it is not actually necessary to fix anything here.

As you have stated, nothing here regarding systemd will get merged, even if it would be correctly implemented, so it seems rather futile for me to have to justify my reasoning when it's going to be shot down ad hominem anyway...

There's no ad hominem, I'm just trying to explain what's wrong. Believe it or not I am plenty active on systemd too, and were we discussing this change on systemd's tracker I would have pointed out the same issue.

@emersion
Copy link
Member

emersion commented Jul 4, 2023

I'm not sure I understand what you mean here. systemd just does not allow other processes to do sd_notify, I fail to see what "pass the ready FD differently" refers to here.

I meant wrap Sway with a small script which copies the NOTIFY_SOCKET env var to another env var (e.g. SWAY_NOTIFY_SOCKET), then unsets NOTIFY_SOCKET. Then a Sway exec can use that env var instead of NOTIFY_SOCKET. That way child processes don't see NOTIFY_SOCKET.

I understand what you mean here, but it seems to me basu could easily be patched to support sd_notify as well and in fact there are other implementations that do support it, most notably elogind.

That's out-of-scope. basu is purely a D-Bus library, nothing more. basu is not a loose collection of systemd-related libraries.

But I understand there's probably no other service manager out there that supports receiving those messages. I don't see this as a permanent limitation though...

Other service managers do support receiving readiness notifications. For example s6 does. Even if a service manager doesn't, it's just one read(1) away.


Something that I would consider would be a init-independant way of doing readiness notifications. Something which wouldn't be tied to systemd, e.g. similar to swaywm/swaylock#281. However, Sway being a server (for both Wayland and Sway IPC), I'm not sure that's the right approach, maybe something like socket activation would be more appropriate, I don't know.

@anarcat
Copy link
Author

anarcat commented Jul 4, 2023 via email

anarcat added a commit to anarcat/sway that referenced this pull request Jan 9, 2024
This adds a new commandline option, --ready-fd, to send a notification
when sway is ready to run.

This allows a process supervisor to notice sway's startup is complete
which allows, for example, the session manager to start Wayland
programs in the right order. Without this signal, users have to go
through horrible hacks to order services.

For example, I've been using `NotifyAccess=all` in the `sway.service`
file and `exec systemd-notify --ready` in my sway config to emulate
this. Problem is it's racy and error-prone.

A particularly nasty bug triggered by `NotifyAccess=all` is when
`podman` starts and then terminates a container. In that context,
`conmon(8)` ends up notifying systemd it's the session master and
takes over thee `MainPID` field in systemd. When it dies, systemd
believes the session is over and proceeds to kill the entire
session. This is explained in more details in:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1039857

This might not actually be the right place to do this in the sway
startup sequence, that said. We call `server_run` right after, and
maybe somewhere below that would be a better place. But `server_run`
only calls `wl_display_run` and that's part of the core wayland
library, so that seems a little too far down. I'm not sure Wayland
itself is a place to do this, so for now I'm scratching my own itch
and doing this in Sway itself.

Note that this approach was taken instead of using the proper
`sd_notify` library call, as that approach was refused in swaywm#7659. The
`--ready-fd` approach was accepted in swaywm/swaylock#281 so it is
hoped it will be seen as acceptable here.

An alternative implementation would be to instead check the
`NOTIFY_SOCKET` environment variable and use that, if present. That
variable is used by systemd and at least the s6 supervisor to receive
readiness notifications, so it might be less disruptive.
@anarcat
Copy link
Author

anarcat commented Jan 9, 2024

Something that I would consider would be a init-independant way of doing readiness notifications. Something which wouldn't be tied to systemd, e.g. similar to swaywm/swaylock#281. However, Sway being a server (for both Wayland and Sway IPC), I'm not sure that's the right approach, maybe something like socket activation would be more appropriate, I don't know.

So I've done #7904 because it's similar to swaywm/swaylock#281 but now i see that the latter was actually dropped later, and only sends a newline, which won't work for systemd (but works for s6? go figure).

I've done this only because you offered this as an exit strategy here but i'm not particularly enthusiastic about the patch. It necessarily requires a wrapper script to parse the NOTIFY_SOCKET variable and pass it down to the --ready-fd argument which seems rather silly: fundamentally, argv and env are somewhat similar lists of things and we should just be able to parse (and reset) NOTIFY_SOCKET ourselves. But then that goes back to vendoring sd_notify or linking against it, both approaches that seem unacceptable here...

Also, with regards to:

Something which wouldn't be tied to systemd...

from what i understand, sd_notify is an IPC mechanism like any other that can be adopted (or not) by other supervisors. It's not like it's proprietary code. The suggested patch here also is completely inert and unused if someone doesn't run systemd, so I still can't quite figure out what the actual problem is with the patch...

Anyway, thanks for the feedback.

@kennylevinsen
Copy link
Member

So I've done #7904 because it's similar to swaywm/swaylock#281 but now i see that the latter was actually dropped later, and only sends a newline, which won't work for systemd (but works for s6? go figure).

It was dropped later because, as you discovered too, we found in other discussions that there was no convenient way to pass an fd for systemd's notify socket. We returned to the plain \n to not claim to support something we didn't.

I would say that this is not s6-specific. It's a generic system generally agreed by other init systems and one trivially used by wrapper scripts and other parent programs, requiring just a pipe.

from what i understand, sd_notify is an IPC mechanism like any other that can be adopted (or not) by other supervisors.

sd_notify is not "just" a readiness protocol, it's a systemd general service IPC layer, including things like controlling PID 1's stored fd polling behavior.

The general counter-argument to implementing systemd's protocols is that they are not well suited for other init system designs where there isn't a central service management daemon, and that these systemd APIs change with the wind as they just do what systemd needs at any given time - they're not formally designed and specified, so not something you really want to clone.

On the other hand, s6-to-systemd wrappers are trivial and exist.

The friction is a annoying to all of us (why couldn't systemd just play ball, ugh) - sorry.

@bluca
Copy link

bluca commented Jan 10, 2024

these systemd APIs change with the wind as they just do what systemd needs at any given time - they're not formally designed and specified, so not something you really want to clone.

There's a document called "Interface Portability and Stability Promise": https://systemd.io/PORTABILITY_AND_STABILITY/ that mentions the notify protocol, linking to its definition. It's been there for a long time. You can reimplement it at will, and it's going to always work.

@kennylevinsen
Copy link
Member

Feel free to write a service manager cloning systemd's APIs if you want. Until then, it remains a highly systemd-specific API with features unrelated to readiness-notification in the best systemd-style (one of which I added not too long ago), which other authors have found unsuitable and some outright unimplementable, stable or not.

Also nothing stopping systemd from playing adding compatibility for the much simpler s6/ready-fd-style protocol.

Who knows, maybe we'll change our stance on having systemd-specific code at some point, but "other things can clone systemd" definitely isn't going to be the winning argument.

@swaywm swaywm locked as resolved and limited conversation to collaborators Jan 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants