Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrappers: allow user mode systemd daemons #5822

Merged
merged 74 commits into from May 7, 2020

Conversation

jhenstridge
Copy link
Contributor

This patch extends the snap daemon support to support systemd running in user mode via a daemon-mode: user parameter.

The main differences are:

  1. Installed to /etc/systemd/user instead of /etc/systemd/system
  2. Can not depend on system units (and conversely system units can't depend on user units)
  3. Sockets associated with user daemons are limited to user writable locations (e.g. $SNAP_USER_DATA).
  4. snapd can not enable/disable user daemons during upgrades, as it can't communicate with the user instance(s) of systemd.

My eventual goal is to build D-Bus activation on top of this for both user session and system daemons.

I've been trying to write a spread test for this, but so far I've failed. My initial thought was to try and manually run /lib/systemd/systemd --user as the test user and then check how the daemon runs under that instance. Unfortunately that fails, and it looks like the only supported way of starting user instances is via pam_systemd + logind. It might be possible with the logind CreateSession method, but I haven't yet worked out what arguments to pass to get it to work.

zyga
zyga previously approved these changes Sep 12, 2018
Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very reasonable! I only asked a question about the differences between the new user services and system services wrtd dependencies on the mount unit.

+1 - please get another review from @chipaca

wrappers/services.go Show resolved Hide resolved
@jhenstridge
Copy link
Contributor Author

And it looks like I was incorrect about enabling/disabling user services. It seems I want:

systemctl --global enable user-service-name

@jhenstridge jhenstridge force-pushed the user-daemons branch 2 times, most recently from 740f111 to 831eb09 Compare September 13, 2018 11:19
@jhenstridge
Copy link
Contributor Author

So I worked out that I can get the user systemd instance started in the spread test by activating user@12345.service in the system instance. Two simple spread tests added showing a user daemon started with the default target, and user daemon sockets being created in the expected locations.

@jhenstridge jhenstridge force-pushed the user-daemons branch 4 times, most recently from 371434b to 9d88f75 Compare September 14, 2018 01:11
@jhenstridge
Copy link
Contributor Author

I've disabled the new spread tests on a few systems:

  1. Ubuntu Core: installing a snap with user services fails because /etc/systemd/user is not in the writable-paths list. If the core snap is updated the test will probably work, but core systems don't usually have a user session.

  2. Ubuntu 14.04: it looks like the user@.service method of starting the user instance of systemd was added in version 205, while that distro release ships version 204.

  3. Amazon Linux 2: this one also fails to load user@.service in the tests. I would have thought it would have a new enough systemd, but haven't investigated further since it is unlikely people will run a user session on this distro.

Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. This looks really nice, I added some question inline.

@@ -85,7 +85,7 @@ func (b *Backend) Setup(snapInfo *snap.Info, confinement interfaces.ConfinementO
}
// Ensure the service is running right now and on reboots
for _, service := range changed {
if err := systemd.Enable(service); err != nil {
if err := systemd.Enable(service, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a flag argument here. Reading: systemd.Enable(service, true) does not convey much information. Something like systemd.Enable(service, systemd.UserInstance) or systemd.Enable(service, 0) is a bit more explicit.

snap/info_snap_yaml_test.go Outdated Show resolved Hide resolved
echo "And the user mode systemd instance is started"
systemctl start user@"$(id -u test)".service

echo "It's sockets are created in the test user's directories"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a real end-to-end test here, i.e. have something connect to the socket and verify that the snap actually replies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for a similar test for system daemons, and all I could find was tests/main/install-socket-activation, which isn't a full end-to-end test either. I agree that it would be a good idea to extend the test to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the test to use a Python daemon implementation of the service that responds differently to the different sockets. This could be reused to test socket activation of system services.

systemctl start user@"$(id -u test)".service

echo "We can see the service running"
as_user systemctl --user status snap.test-snapd-user-service.test-snapd-user-service|MATCH "running"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) I think we should check for Active: active just for good measure to ensure that the thing is still running.

@@ -274,6 +290,10 @@ func StopServices(apps []*snap.AppInfo, reason snap.ServiceStopReason, inter int
if !app.IsService() || !osutil.FileExists(app.ServiceFile()) {
continue
}
// We can't stop services that run under a user mode systemd
if app.IsUserService() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we don't stop services is different than for system services. Can this break user services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more the fact that snapd can't talk to the user mode systemd instance (or multiple instances if you happen to have a multi-seat system).

As for what could break, it'd be the same things that can break for snap applications running over an upgrade or uninstall.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic. We either need to find a way to stop them or we need infrastructure enough to inform the snap that is about to be updated and for it to do the right thing with those?

@jhenstridge also does this mean that "snap start/stop" etc don't work for these services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct: "snap start/stop" will not do anything because snapd can not talk to the user session systemd instance.

So we're essentially in the same position as existing long running snap applications (e.g. a messaging app started via xdg autostart that runs minimised and is only visible through the notifications it posts).

We are in a better position than the status quo though, in that if snapd grows a user session agent to handle upgrades, there is a defined method of restarting these daemons (unlike a long running GUI application).

wrappers/services.go Show resolved Hide resolved
@jhenstridge jhenstridge force-pushed the user-daemons branch 4 times, most recently from 48be212 to 6a7a002 Compare September 18, 2018 15:21
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes required right now. Just marking it as I want to have a careful dive on this one, given that it's a major new area being opened.

@jhenstridge jhenstridge force-pushed the user-daemons branch 2 times, most recently from 8df294d to 58465fe Compare November 23, 2018 03:11
@jhenstridge
Copy link
Contributor Author

I've updated the branch to work with current master. I'm not running the snap-user-service tests on CentOS 7 for the same reason as Amazon Linux 2: it ships an old version of systemd where the current method of starting a user session (activating the user@$uid.service unit) doesn't work.

snap/info.go Outdated
case systemd.UserMode:
return dirs.SnapUserServicesDir
default:
panic("unknown systemd.InstanceMode")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this (and other places below) be systemd.DaemonMode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called the type InstanceMode, which is why I included that in the message. I'm open to changing it though. There's two ways it is used at the moment:

  1. which instance of systemd are we talking to?
  2. what mode does this service run in?

# Amazon Linux 2 gives error "Unit user@12345.service not loaded."
- -amazon-linux-2-*
# Centos 7 gives error "Unit user@12345.service not loaded."
- -centos-7-*
Copy link

@jdstrand jdstrand Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the user experience for someone installing a snap with a user service on these distributions (or others where user services aren't in use yet)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the method of launching the user mode systemd has changed over the course of releases.

With current versions of systemd, the supported method is to ask the pid 1 systemd instance to start the templated user@.service service. With older releases, my understanding is that systemd --user was invoked directly by logind.

As written, the test relies on the new method of launching the user instance, since trying directly execute it will fail with complaints about cgroups. I haven't looked into what would be necessary to get the test fixture running on these old systemd's, and wasn't sure how important it was.

My primary motive for this feature is for use in conjunction with dbus service activation. For that particular case, I plan to use the same method of supporting distros without a user instance of systemd as most services do currently: include both an Exec and SystemdService line in the dbus service activation file.

For other types of user services I don't have a good answer. It is basically the same situation as running a snap that uses XDG Autostart on a desktop that doesn't support it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using session-tool avoid the problem of systemd changing how user sessions are started across releases?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed response. Sounds good :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using session-tool avoid the problem?

I suspect it wouldn't. I booted up the Centos 7 live CD in a VM to see what its desktop session looked like, and there was no evidence of a user instance of systemd running.

Either it ships with a too-old systemd, or they intentionally disabled it so it wouldn't be part of their support burden.

@@ -114,24 +114,26 @@ func StartServices(apps []*snap.AppInfo, inter interacter) (err error) {
if err == nil {
return
}
if e := stopService(sysd, app, inter); e != nil {
inter.Notify(fmt.Sprintf("While trying to stop previously started service %q: %v", app.ServiceName(), e))
if app.ServiceMode() == systemd.SystemMode {
Copy link

@jdstrand jdstrand Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are doing the right thing here (ie, not mixing in user services with the snap start|stop|restart commands), but it makes me wonder if snap start|stop|restart should grow a --user option or similar for these...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not something that snapd can control though: in general snapd running as pid 0 is not going to be able to communicate with the systemd instance running as the user.

At the moment we don't do anything about snap apps running as the user during an upgrade, so I don't think it is that big a stretch to do the same for user services.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is something that (iirc), @chipaca is looking into. It would be possible for snapd to communicate with userd to do something sensible on refresh. Also note that my question is not meant to block this PR; just putting it out there for a potential followup.

Copy link

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this PR from a security point of view (but had a few questions other questions inline). In general, this PR LGTM in that regard. One thing that came to mind is that, unlike system services, user services are likely to be coupled with typical desktop interfaces (eg, X11, desktop-legacy; either intentionally or devs taking a shotgun approach with their plugs) which could open up interesting (from a security POV) interactions between snaps. @jhenstridge, curious on your thoughts about adding review-tools check to warn if specifying 'daemon-mode: user' for a command that 'plugs: [ x11 ]' (or similar)?

@jhenstridge
Copy link
Contributor Author

@jhenstridge, curious on your thoughts about adding review-tools check to warn if specifying 'daemon-mode: user' for a command that 'plugs: [ x11 ]' (or similar)?

I don't think it would be out of the ordinary for session D-Bus daemons (for instance) to want to talk to the display, so I don't think it is something review-tools should block.

While this does provide a new way for a snap author to have their code executed, is it that different to e.g. the way we allow apps to launch via XDG autostart?

@jdstrand
Copy link

@jhenstridge, curious on your thoughts about adding review-tools check to warn if specifying 'daemon-mode: user' for a command that 'plugs: [ x11 ]' (or similar)?

...

While this does provide a new way for a snap author to have their code executed, is it that different to e.g. the way we allow apps to launch via XDG autostart?

With autostart the user should be prompted, with this PR, the user session service is just there potentially long running in the background (eg, this makes something like a persistent keylogger easier). Again, X is deeply flawed from a security POV so not saying we should block this on that (users will always need to trust the publisher when a snap plugs x11, desktop-legacy, etc and this PR doesn't change that). I just wanted your thoughts and you answered my question wrt dbus services wanting to talk to the display.

@pedronis
Copy link
Collaborator

I'm closing this until there is a proposal (on the forum) of how to stop these services at refresh like we stop any other services, and what snap start/stop etc should do with them

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

@jhenstridge
Copy link
Contributor Author

Over two CI runs, I've gotten the following error:

##[error]2020-04-21 07:15:48 Error executing google:ubuntu-core-16-64:tests/main/snap-user-service (apr210650-369708) : 
...
+ snap install --dangerous /home/gopath/src/github.com/snapcore/snapd/tests/lib/snaps/test-snapd-user-service/test-snapd-user-service_1.0_all.snap
error: cannot perform the following tasks:
- Make snap "test-snapd-user-service" (unset) available to the system (Post http://0/v1/service-control: read unix @->/run/user/0/snapd-session-agent.socket: read: connection reset by peer)

This look like a left over session agent socket in root's XDG_RUNTIME_DIR, with nothing listening on the socket. It probably indicates another test not cleaning up correctly, but it does make me wonder if we should treat "connection reset by peer" errors on the initial dial as if that user's session agent is not present. If we can run into this problem in CI, then I wouldn't be surprised if it can happen on real systems too.

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey James

I'm submitting my partial review. I still have a few more thorny files to read through to make sure I didn't miss any detail.

This branch will have some interaction with my app awareness work that will likely land shortly. I will be happy to work with you on ensuring the result of merging both branches is correct. I believe that I need to adjust some of the logic used to create transient scopes as well as to, perhaps, adjust the cgroup scanner (though I think it is correct, it should be tested for sure).

My main worry is the extensive use of user.sh that I would rather see sunset. I think using session-tool is a viable alternative and I would not be surprised if that would make all of the tests to pass on all but 14.04 systems.

I've left a few comments inline, please go through them and I promise that by tomorrow I will complete the review of the remaining parts.

touch agent-was-enabled
fi
snap set system experimental.user-daemons=true
#shellcheck source=tests/lib/user.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this to be ported to session-tool instead. This can be done in a follow up.

# Amazon Linux 2 gives error "Unit user@12345.service not loaded."
- -amazon-linux-2-*
# Centos 7 gives error "Unit user@12345.service not loaded."
- -centos-7-*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using session-tool avoid the problem of systemd changing how user sessions are started across releases?

usersession/agent/rest_api.go Show resolved Hide resolved
"invalid %q: must have a prefix of $SNAP_DATA, $SNAP_COMMON or $XDG_RUNTIME_DIR", fieldName)
switch socket.App.DaemonScope {
case SystemDaemon:
if !(strings.HasPrefix(path, "$SNAP_DATA/") || strings.HasPrefix(path, "$SNAP_COMMON/") || strings.HasPrefix(path, "$XDG_RUNTIME_DIR/")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously, I think this is wrong. System daemons cannot refer to $XDG_RUNTIME_DIR, as that is only provided to sessions. On my Focal system system services no longer run in sessions so they do not have XDG_RUNTIME_DIR.

I would be happy to have a follow up that deals with this but please add a // TODO comment if you are going to push any more patches into this branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense either: given that systemd mounts a tmpfs over a user's $XDG_RUNTIME_DIR, it isn't clear the system instance of systemd can reliably provide sockets in that location (even ignoring the question of which user's XDG_RUNTIME_DIR it should refer to).

FWIW, it looks like this was introduced in PR #6327. I left it unchanged since there are probably some snaps making use of it right now (maybe something from the Mir team?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look, after this PR, because I suspect it expands to something silly (unless we expand it ignoring the environment).

tests/main/user-session-env/task.yaml Outdated Show resolved Hide resolved
@@ -372,7 +485,7 @@ func StopServices(apps []*snap.AppInfo, reason snap.ServiceStopReason, inter int

// ensure the service is really stopped on remove regardless
// of stop-mode
if reason == snap.StopReasonRemove && !app.StopMode.KillAll() {
if reason == snap.StopReasonRemove && !app.StopMode.KillAll() && app.DaemonScope == snap.SystemDaemon {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a TODO that captures stopping of user services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in previous comments I believe this block of code is unnecessary for user services, and possibly system services too on modern distros. Unless the service has KillMode=none or sets FinalKillSignal to something other than SIGKILL, systemd will have gone through these steps already. As we don't allow snaps to alter those settings the sysd.Stop() call should be reliable.

From what I've gathered, this was in place to deal with old buggy versions of systemd, which would be from before they supported a user instance of systemd.

version: 42
apps:
svc:
command: svc1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: use svc for consistency with the service name.

@@ -0,0 +1,53 @@
#!/usr/bin/python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need the copyright header here.

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for porting this to session-tool

I marked one case of missing --restore and asked a question about the purge step.

tests/main/snap-user-service/task.yaml Show resolved Hide resolved
tests/main/snap-user-service-upgrade-failure/task.yaml Outdated Show resolved Hide resolved
tests/main/snap-user-service-socket-activation/task.yaml Outdated Show resolved Hide resolved
@zyga
Copy link
Collaborator

zyga commented May 6, 2020

@jhenstridge I merged master into your branch just now, to hopefully get a green pass. We just landed a fix for the two pulseaudio tests that were picking up some test leaks and that had a host of issues of their own.

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed just the spread tasks that contain now-redundant startups of .target units that session-tool --prepare now handles automatically.

tests/main/snap-user-service-restart-on-upgrade/task.yaml Outdated Show resolved Hide resolved
echo "And the user mode systemd instance is started"
session-tool -u test --prepare

echo "And the user session has finished starting"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two can now go away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing these caused a failure on the Ubuntu 16.04 based systems (both classic and core).

From the look of it, session-tool --prepare will only wait for default.target if there is a systemd controlled session bus. Ubuntu 16.04's session bus is managed by Upstart, yet it has a functioning user mode systemd instance.

There's no dependency on the D-Bus session bus for many operations, even with more recent systemd. For instance, on 20.04:

$ strace --trace=connect systemctl --user is-active snapd.session-agent.service
connect(3, {sa_family=AF_UNIX, sun_path="/run/user/1000/systemd/private"}, 33) = 0
inactive
+++ exited with 3 +++

It is communicating with systemd over a private socket.

tests/main/snap-user-service-start-on-install/task.yaml Outdated Show resolved Hide resolved
tests/main/snap-user-service-upgrade-failure/task.yaml Outdated Show resolved Hide resolved
tests/main/snap-user-service/task.yaml Outdated Show resolved Hide resolved
"session-tool --prepare" was only trying to start default.target if it
believed there was a systemd controlled D-Bus session bus.  This is not
the case on Ubuntu 16.04, yet it still has a working user instance of
systemd.

Instead try running "systemctl --user is-enabled default.target", which
should only succeed if we can talk to the user instance of systemd.
@mvo5 mvo5 merged commit 21050a4 into snapcore:master May 7, 2020
@anonymouse64 anonymouse64 modified the milestones: 2.45, 2.46 Jul 29, 2020
@anonymouse64
Copy link
Member

Unfortunately I don't think this was cherry-picked to 2.45, but it will be in 2.46 now.

@EndrII
Copy link

EndrII commented Nov 6, 2020

Unfortunately I don't think this was cherry-picked to 2.45, but it will be in 2.46 now.

The current snapd version is 2.47. Why this feature still not merged?

@anonymouse64
Copy link
Member

The current snapd version is 2.47. Why this feature still not merged?

This feature was merged to master on May 7. If you have more questions about this feature, I would suggest creating a new forum post at forum.snapcraft.io, where more folks will see your questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Samuele review Needs a review from Samuele before it can land
Projects
None yet
10 participants