New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrappers: allow user mode systemd daemons #5822
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very reasonable! I only asked a question about the differences between the new user services and system services wrtd dependencies on the mount unit.
+1 - please get another review from @chipaca
And it looks like I was incorrect about enabling/disabling user services. It seems I want:
|
740f111
to
831eb09
Compare
So I worked out that I can get the user systemd instance started in the spread test by activating |
371434b
to
9d88f75
Compare
I've disabled the new spread tests on a few systems:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. This looks really nice, I added some question inline.
interfaces/systemd/backend.go
Outdated
@@ -85,7 +85,7 @@ func (b *Backend) Setup(snapInfo *snap.Info, confinement interfaces.ConfinementO | |||
} | |||
// Ensure the service is running right now and on reboots | |||
for _, service := range changed { | |||
if err := systemd.Enable(service); err != nil { | |||
if err := systemd.Enable(service, false); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a flag argument here. Reading: systemd.Enable(service, true)
does not convey much information. Something like systemd.Enable(service, systemd.UserInstance)
or systemd.Enable(service, 0)
is a bit more explicit.
echo "And the user mode systemd instance is started" | ||
systemctl start user@"$(id -u test)".service | ||
|
||
echo "It's sockets are created in the test user's directories" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a real end-to-end test here, i.e. have something connect to the socket and verify that the snap actually replies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for a similar test for system daemons, and all I could find was tests/main/install-socket-activation
, which isn't a full end-to-end test either. I agree that it would be a good idea to extend the test to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the test to use a Python daemon implementation of the service that responds differently to the different sockets. This could be reused to test socket activation of system services.
systemctl start user@"$(id -u test)".service | ||
|
||
echo "We can see the service running" | ||
as_user systemctl --user status snap.test-snapd-user-service.test-snapd-user-service|MATCH "running" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nitpick) I think we should check for Active: active
just for good measure to ensure that the thing is still running.
wrappers/services.go
Outdated
@@ -274,6 +290,10 @@ func StopServices(apps []*snap.AppInfo, reason snap.ServiceStopReason, inter int | |||
if !app.IsService() || !osutil.FileExists(app.ServiceFile()) { | |||
continue | |||
} | |||
// We can't stop services that run under a user mode systemd | |||
if app.IsUserService() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we don't stop services is different than for system services. Can this break user services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more the fact that snapd can't talk to the user mode systemd instance (or multiple instances if you happen to have a multi-seat system).
As for what could break, it'd be the same things that can break for snap applications running over an upgrade or uninstall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic. We either need to find a way to stop them or we need infrastructure enough to inform the snap that is about to be updated and for it to do the right thing with those?
@jhenstridge also does this mean that "snap start/stop" etc don't work for these services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct: "snap start/stop" will not do anything because snapd can not talk to the user session systemd instance.
So we're essentially in the same position as existing long running snap applications (e.g. a messaging app started via xdg autostart that runs minimised and is only visible through the notifications it posts).
We are in a better position than the status quo though, in that if snapd grows a user session agent to handle upgrades, there is a defined method of restarting these daemons (unlike a long running GUI application).
48be212
to
6a7a002
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes required right now. Just marking it as I want to have a careful dive on this one, given that it's a major new area being opened.
8df294d
to
58465fe
Compare
I've updated the branch to work with current master. I'm not running the snap-user-service tests on CentOS 7 for the same reason as Amazon Linux 2: it ships an old version of systemd where the current method of starting a user session (activating the |
snap/info.go
Outdated
case systemd.UserMode: | ||
return dirs.SnapUserServicesDir | ||
default: | ||
panic("unknown systemd.InstanceMode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this (and other places below) be systemd.DaemonMode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called the type InstanceMode
, which is why I included that in the message. I'm open to changing it though. There's two ways it is used at the moment:
- which instance of systemd are we talking to?
- what mode does this service run in?
# Amazon Linux 2 gives error "Unit user@12345.service not loaded." | ||
- -amazon-linux-2-* | ||
# Centos 7 gives error "Unit user@12345.service not loaded." | ||
- -centos-7-* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the user experience for someone installing a snap with a user service on these distributions (or others where user services aren't in use yet)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the method of launching the user mode systemd has changed over the course of releases.
With current versions of systemd, the supported method is to ask the pid 1 systemd instance to start the templated user@.service
service. With older releases, my understanding is that systemd --user
was invoked directly by logind.
As written, the test relies on the new method of launching the user instance, since trying directly execute it will fail with complaints about cgroups. I haven't looked into what would be necessary to get the test fixture running on these old systemd's, and wasn't sure how important it was.
My primary motive for this feature is for use in conjunction with dbus service activation. For that particular case, I plan to use the same method of supporting distros without a user instance of systemd as most services do currently: include both an Exec
and SystemdService
line in the dbus service activation file.
For other types of user services I don't have a good answer. It is basically the same situation as running a snap that uses XDG Autostart on a desktop that doesn't support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using session-tool avoid the problem of systemd changing how user sessions are started across releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response. Sounds good :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using session-tool avoid the problem?
I suspect it wouldn't. I booted up the Centos 7 live CD in a VM to see what its desktop session looked like, and there was no evidence of a user instance of systemd running.
Either it ships with a too-old systemd, or they intentionally disabled it so it wouldn't be part of their support burden.
wrappers/services.go
Outdated
@@ -114,24 +114,26 @@ func StartServices(apps []*snap.AppInfo, inter interacter) (err error) { | |||
if err == nil { | |||
return | |||
} | |||
if e := stopService(sysd, app, inter); e != nil { | |||
inter.Notify(fmt.Sprintf("While trying to stop previously started service %q: %v", app.ServiceName(), e)) | |||
if app.ServiceMode() == systemd.SystemMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are doing the right thing here (ie, not mixing in user services with the snap start|stop|restart
commands), but it makes me wonder if snap start|stop|restart
should grow a --user
option or similar for these...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not something that snapd can control though: in general snapd running as pid 0 is not going to be able to communicate with the systemd instance running as the user.
At the moment we don't do anything about snap apps running as the user during an upgrade, so I don't think it is that big a stretch to do the same for user services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is something that (iirc), @chipaca is looking into. It would be possible for snapd to communicate with userd to do something sensible on refresh. Also note that my question is not meant to block this PR; just putting it out there for a potential followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this PR from a security point of view (but had a few questions other questions inline). In general, this PR LGTM in that regard. One thing that came to mind is that, unlike system services, user services are likely to be coupled with typical desktop interfaces (eg, X11, desktop-legacy; either intentionally or devs taking a shotgun approach with their plugs) which could open up interesting (from a security POV) interactions between snaps. @jhenstridge, curious on your thoughts about adding review-tools check to warn if specifying 'daemon-mode: user' for a command that 'plugs: [ x11 ]' (or similar)?
I don't think it would be out of the ordinary for session D-Bus daemons (for instance) to want to talk to the display, so I don't think it is something review-tools should block. While this does provide a new way for a snap author to have their code executed, is it that different to e.g. the way we allow apps to launch via XDG autostart? |
...
With autostart the user should be prompted, with this PR, the user session service is just there potentially long running in the background (eg, this makes something like a persistent keylogger easier). Again, X is deeply flawed from a security POV so not saying we should block this on that (users will always need to trust the publisher when a snap plugs x11, desktop-legacy, etc and this PR doesn't change that). I just wanted your thoughts and you answered my question wrt dbus services wanting to talk to the display. |
58465fe
to
b659e12
Compare
I'm closing this until there is a proposal (on the forum) of how to stop these services at refresh like we stop any other services, and what snap start/stop etc should do with them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
Over two CI runs, I've gotten the following error:
This look like a left over session agent socket in root's XDG_RUNTIME_DIR, with nothing listening on the socket. It probably indicates another test not cleaning up correctly, but it does make me wonder if we should treat "connection reset by peer" errors on the initial dial as if that user's session agent is not present. If we can run into this problem in CI, then I wouldn't be surprised if it can happen on real systems too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey James
I'm submitting my partial review. I still have a few more thorny files to read through to make sure I didn't miss any detail.
This branch will have some interaction with my app awareness work that will likely land shortly. I will be happy to work with you on ensuring the result of merging both branches is correct. I believe that I need to adjust some of the logic used to create transient scopes as well as to, perhaps, adjust the cgroup scanner (though I think it is correct, it should be tested for sure).
My main worry is the extensive use of user.sh
that I would rather see sunset. I think using session-tool is a viable alternative and I would not be surprised if that would make all of the tests to pass on all but 14.04 systems.
I've left a few comments inline, please go through them and I promise that by tomorrow I will complete the review of the remaining parts.
touch agent-was-enabled | ||
fi | ||
snap set system experimental.user-daemons=true | ||
#shellcheck source=tests/lib/user.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this to be ported to session-tool instead. This can be done in a follow up.
# Amazon Linux 2 gives error "Unit user@12345.service not loaded." | ||
- -amazon-linux-2-* | ||
# Centos 7 gives error "Unit user@12345.service not loaded." | ||
- -centos-7-* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using session-tool avoid the problem of systemd changing how user sessions are started across releases?
"invalid %q: must have a prefix of $SNAP_DATA, $SNAP_COMMON or $XDG_RUNTIME_DIR", fieldName) | ||
switch socket.App.DaemonScope { | ||
case SystemDaemon: | ||
if !(strings.HasPrefix(path, "$SNAP_DATA/") || strings.HasPrefix(path, "$SNAP_COMMON/") || strings.HasPrefix(path, "$XDG_RUNTIME_DIR/")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curiously, I think this is wrong. System daemons cannot refer to $XDG_RUNTIME_DIR, as that is only provided to sessions. On my Focal system system services no longer run in sessions so they do not have XDG_RUNTIME_DIR.
I would be happy to have a follow up that deals with this but please add a // TODO comment if you are going to push any more patches into this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense either: given that systemd mounts a tmpfs over a user's $XDG_RUNTIME_DIR, it isn't clear the system instance of systemd can reliably provide sockets in that location (even ignoring the question of which user's XDG_RUNTIME_DIR it should refer to).
FWIW, it looks like this was introduced in PR #6327. I left it unchanged since there are probably some snaps making use of it right now (maybe something from the Mir team?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look, after this PR, because I suspect it expands to something silly (unless we expand it ignoring the environment).
@@ -372,7 +485,7 @@ func StopServices(apps []*snap.AppInfo, reason snap.ServiceStopReason, inter int | |||
|
|||
// ensure the service is really stopped on remove regardless | |||
// of stop-mode | |||
if reason == snap.StopReasonRemove && !app.StopMode.KillAll() { | |||
if reason == snap.StopReasonRemove && !app.StopMode.KillAll() && app.DaemonScope == snap.SystemDaemon { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a TODO that captures stopping of user services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in previous comments I believe this block of code is unnecessary for user services, and possibly system services too on modern distros. Unless the service has KillMode=none
or sets FinalKillSignal
to something other than SIGKILL, systemd will have gone through these steps already. As we don't allow snaps to alter those settings the sysd.Stop() call should be reliable.
From what I've gathered, this was in place to deal with old buggy versions of systemd, which would be from before they supported a user instance of systemd.
version: 42 | ||
apps: | ||
svc: | ||
command: svc1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: use svc
for consistency with the service name.
@@ -0,0 +1,53 @@ | |||
#!/usr/bin/python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need the copyright header here.
b625a4d
to
b1016df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for porting this to session-tool
I marked one case of missing --restore
and asked a question about the purge step.
This is an attempt to either fix the intermittent hang during tear down, or make it fail reliably.
@jhenstridge I merged master into your branch just now, to hopefully get a green pass. We just landed a fix for the two pulseaudio tests that were picking up some test leaks and that had a host of issues of their own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed just the spread tasks that contain now-redundant startups of .target
units that session-tool --prepare
now handles automatically.
echo "And the user mode systemd instance is started" | ||
session-tool -u test --prepare | ||
|
||
echo "And the user session has finished starting" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those two can now go away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing these caused a failure on the Ubuntu 16.04 based systems (both classic and core).
From the look of it, session-tool --prepare
will only wait for default.target
if there is a systemd controlled session bus. Ubuntu 16.04's session bus is managed by Upstart, yet it has a functioning user mode systemd instance.
There's no dependency on the D-Bus session bus for many operations, even with more recent systemd. For instance, on 20.04:
$ strace --trace=connect systemctl --user is-active snapd.session-agent.service
connect(3, {sa_family=AF_UNIX, sun_path="/run/user/1000/systemd/private"}, 33) = 0
inactive
+++ exited with 3 +++
It is communicating with systemd over a private socket.
"session-tool --prepare" was only trying to start default.target if it believed there was a systemd controlled D-Bus session bus. This is not the case on Ubuntu 16.04, yet it still has a working user instance of systemd. Instead try running "systemctl --user is-enabled default.target", which should only succeed if we can talk to the user instance of systemd.
Unfortunately I don't think this was cherry-picked to 2.45, but it will be in 2.46 now. |
The current snapd version is 2.47. Why this feature still not merged? |
This feature was merged to master on May 7. If you have more questions about this feature, I would suggest creating a new forum post at forum.snapcraft.io, where more folks will see your questions. |
This patch extends the snap daemon support to support systemd running in user mode via a
daemon-mode: user
parameter.The main differences are:
/etc/systemd/user
instead of/etc/systemd/system
$SNAP_USER_DATA
).My eventual goal is to build D-Bus activation on top of this for both user session and system daemons.
I've been trying to write a spread test for this, but so far I've failed. My initial thought was to try and manually run
/lib/systemd/systemd --user
as thetest
user and then check how the daemon runs under that instance. Unfortunately that fails, and it looks like the only supported way of starting user instances is via pam_systemd + logind. It might be possible with the logindCreateSession
method, but I haven't yet worked out what arguments to pass to get it to work.