Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sessionagent: add a REST interface with socket activation #6954

Merged
merged 22 commits into from Jul 25, 2019

Conversation

@jhenstridge
Copy link
Contributor

commented Jun 5, 2019

This pull request grew out of discussion on the forum thread A user session agent for snapd.

This adds a unix socket based HTTP/REST interface to snap userd that supports socket activation on systems running a user instance of systemd (i.e. all modern Linux distros, but not some of the older ones we support). This makes it possible for the root user (i.e. snapd) to issue commands to userd, which could be used to coordinate service restarts over upgrades, for instance.

At present, only a single endpoint is exposed: /v1/agent-info, which returns snap userd's version number.

The REST code is also split over the daemon and userd subpackages, which is not ideal. I'd prefer to have put it all in userd, but didn't want to copy/paste large chunks of common code over, or change privacy of various types/functions at this point. That will need to be cleaned up.

@pedronis pedronis requested a review from chipaca Jun 5, 2019

@jhenstridge jhenstridge changed the title userd: add a REST interface with socket activation WIP: userd: add a REST interface with socket activation Jun 5, 2019

@jhenstridge jhenstridge marked this pull request as ready for review Jun 5, 2019

@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

Changed status from draft to "ready to review" because Travis seems to ignore draft pull requests.

@jhenstridge jhenstridge force-pushed the jhenstridge:user-session-agent branch from a9f9848 to 3fa03f0 Jun 5, 2019

@zyga
zyga approved these changes Jun 5, 2019
Copy link
Contributor

left a comment

First quick pass, looks good.

I see nothing wrong, please ask @chipaca for another review as he's far more experienced with the API layer. I made a few suggestions about ! foo, please apply those before merging.

daemon/session_agent.go Outdated Show resolved Hide resolved
data/systemd-user/snap-userd.service.in Outdated Show resolved Hide resolved
tests/main/snap-userd-socket-activation/task.yaml Outdated Show resolved Hide resolved
tests/main/snap-userd-socket-activation/task.yaml Outdated Show resolved Hide resolved
su -l -c "XDG_RUNTIME_DIR=\"${USER_RUNTIME_DIR}\" systemctl --user $*" test
}
echo "Initially snap userd is not running"
! systemctl_user is-active snap-userd.service

This comment has been minimized.

Copy link
@zyga

zyga Jun 5, 2019

Contributor

Please use not: see #6908 and #6926 for context

Suggested change
! systemctl_user is-active snap-userd.service
not systemctl_user is-active snap-userd.service

This comment has been minimized.

Copy link
@jhenstridge

jhenstridge Jun 7, 2019

Author Contributor

So, I made this change but it actually causes the check to malfunction. As "systemctl_user" is a shell function, it wasn't available to the "not" shell script. The test continued to pass since the "command not found" error was inverted to a success.

I guess changing this to if ! command; then exit 1; fi or similar would work.

tests/main/snap-userd-socket-activation/task.yaml Outdated Show resolved Hide resolved
tests/main/snap-userd-socket-activation/task.yaml Outdated Show resolved Hide resolved
@chipaca
Copy link
Member

left a comment

Taken a quick first look (will look deeper later), i've got questions around the code in daemon.

If you don't need ucrednet I (or you! but let me know) can go ahead and split the listener bits out to netutil, and with that and the new shutdown bits you shouldn't need daemon at all.

If you do need ucrednet ... what for? :-) maybe we can split it as well.

daemon/session_agent.go Outdated Show resolved Hide resolved
daemon/session_agent.go Outdated Show resolved Hide resolved
daemon/session_agent.go Outdated Show resolved Hide resolved

@jhenstridge jhenstridge force-pushed the jhenstridge:user-session-agent branch 3 times, most recently from 1acd5c6 to 27ab25c Jun 6, 2019

@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2019

Thank you for the quick reviews on this code. I've done some refactoring as requested:

  • moved SessionAgent from daemon to userd, which entailed:
    • Moving the socket activation helpers to netutil
    • Converting SessionAgent's use of shutdownServer to the newer Go 1.8 http.Server API.
    • Removing use of ucrednet. I wasn't using it, and we probably only want connection level security rather than request level.
  • include a copy/paste of some of daemon's Response types. Importing daemon into userd has some unwanted side effects in the form of init functions from other imported modules. This was the cause of many spread test failures. Maybe a common portion might fit in netutil?

As is, it looks like the test is failing on Xenial due to the unusual user session configuration. While Xenial has a running user instance of systemd, it is not managing the D-Bus session bus. Furthermore, the session bus address is not exported to the systemd environment. So while a socket activated REST API is possible on Xenial, it can't be provided by a process that also needs to offer services on the session bus. I guess that means the session agent needs to be its own process rather than part of userd.

@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2019

Also: this means we won't be able to post desktop notifications from the session agent on Xenial. Controlling systemd user units is fine though.

@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2019

On IRC, @zyga asked whether it would be possible to somehow give a systemd managed snap userd access to the session bus on Xenial so the system would behave closer to modern distributions.

My first thought was to modify the Upstart job that launches dbus-daemon to publish the session bus address by executing something like:

systemctl --user set-environment DBUS_SESSION_BUS_ADDRESS=...

On the face of it, this would be enough to set a socket activated userd publish its D-Bus services (ignoring the possible race conditions). However, if userd is instead activated by a D-Bus call, it will instead be launched directly by dbus-daemon. In this case, we can't have systemd hand off the listening REST socket. In fact, the checks for a stale socket in GetListener() are likely to try and start a competing systemd managed instance of userd.

I'm not sure there is a minimal non-invasive change we can make to get this working. In contrast, as separate processes things should just work.

@jhenstridge jhenstridge force-pushed the jhenstridge:user-session-agent branch from 27ab25c to ec0d9d2 Jun 7, 2019

@jhenstridge jhenstridge changed the title WIP: userd: add a REST interface with socket activation WIP: sessionagent: add a REST interface with socket activation Jun 7, 2019

@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

I've refactored the branch to run the session agent as its own process, which now works on Xenial too.

As a test, I'm enabling the spread test for Ubuntu Core to see how things turn out. I suspect it will fail on core18 due to missing unit files, but it looks like there might be enough on core16. I'm not sure how important this is, given that no one is running a user session on core devices.

@jhenstridge jhenstridge force-pushed the jhenstridge:user-session-agent branch 6 times, most recently from 436ddac to f7c1772 Jun 7, 2019

@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jun 10, 2019

And now I've got Ubuntu Core 18 working: it will now copy the user units from /usr/lib/systemd/user in the snapd snap to /etc/systemd/user, similar to how system units are handled. Now the spread test is enabled and working for everything except:

  1. Ubuntu 14.04, where there is no systemd user instance.
  2. Centos 7/Amazon Linux 2, which seem to have an old systemd that doesn't support the user@UID.service unit. It is possible that things work there, but require a different method of starting the user instance.

@jhenstridge jhenstridge changed the title WIP: sessionagent: add a REST interface with socket activation sessionagent: add a REST interface with socket activation Jun 10, 2019

@pedronis

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2019

@jhenstridge can't we have on Xenial on the dbus/upstart side a thin wrapper or script that pokes the systemd side of things, so that in the end we just have the systemd managed service?

@zyga
Copy link
Contributor

left a comment

Quick pass, please look at the questions inline and ping me for another review.

agent sessionagent.SessionAgent
}

var shortSessionAgentHelp = i18n.G("Start the session-agent service")

This comment has been minimized.

Copy link
@zyga

zyga Jun 24, 2019

Contributor

CC @degville on wording and consistency with other commands.

This comment has been minimized.

Copy link
@jhenstridge

jhenstridge Jun 25, 2019

Author Contributor

I'm not actually sure whether this string is ever displayed. Like userd, I set this command to be hidden, so it doesn't show in snap help --all. It also doesn't seem to show for snap help session-agent.

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 3, 2019

Contributor

It needs to follow the style rules even if it's not displayed atm.

cmd/snap/cmd_session_agent_test.go Outdated Show resolved Hide resolved
cmd/snap/cmd_session_agent_test.go Outdated Show resolved Hide resolved
cmd/snap/cmd_session_agent_test.go Outdated Show resolved Hide resolved
cmd/snap/cmd_session_agent_test.go Outdated Show resolved Hide resolved
systemd/systemd.go Outdated Show resolved Hide resolved
systemd/systemd.go Outdated Show resolved Hide resolved

environment:
TEST_UID: $(id -u test)
USER_RUNTIME_DIR: /run/user/${TEST_UID}

This comment has been minimized.

Copy link
@zyga

zyga Jun 24, 2019

Contributor

Wow, I'm surprised this works!

wrappers/core18.go Show resolved Hide resolved
@zyga
Copy link
Contributor

left a comment

One more comment

wrappers/core18.go Show resolved Hide resolved
@chipaca

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Can we split this PR? I thought we'd said we'd split the netutil refactor out but I can't find it. Also perhaps (not sure) the systemd.InstanceMode work can be separated out also.

@pedronis

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

Can we split this PR? I thought we'd said we'd split the netutil refactor out but I can't find it. Also perhaps (not sure) the systemd.InstanceMode work can be separated out also.

yes, this will need my approval (at least the high level of it), much easier if those two sets of changes are split out

@pedronis pedronis self-requested a review Jul 1, 2019

@pedronis

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2019

I wonder if the invocation should use the term "user" in it. Session is a fairly generic term, I understand we have a problem with the package name that we don't want to be too long. Maybe:

  • userd => usersession/userd
    autostart logic is quite separate, I think, in there so might be moved under usersession/autostart
  • sessionagent => usersession/agent

?

For symmetry with "/system-info", maybe "agent-info" should be already "session-info" ?

It will be slightly confusing, but we have already userd --autostart anyway, which is not quite just mediation as well, maybe starting this should use userd --agent.

Some of this can be done in follow-ups (except the command bits).

@chipaca
chipaca approved these changes Jul 4, 2019
Copy link
Member

left a comment

I think this is OK to go in. I'm not super happy with what daemon is doing wrt response et al (some of my earliest code in this project), and this code duplicates that, but once there's an obvious way to improve one it should be straightforward to also improve this, and at least it's consistent. So +1.

@chipaca

This comment has been minimized.

Copy link
Member

commented Jul 4, 2019

(note pedronis stil needs to +1 this as per his request)

@pedronis
Copy link
Contributor

left a comment

thank you, some things to consider/change but looking good overall

data/systemd-user/snapd.session-agent.socket Outdated Show resolved Hide resolved
@@ -262,6 +263,7 @@ func SetRootDir(rootdir string) {

SnapBinariesDir = filepath.Join(SnapMountDir, "bin")
SnapServicesDir = filepath.Join(rootdir, "/etc/systemd/system")

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 8, 2019

Contributor

I have no strong opinion on making that change or not, -0 overall.

sessionagent/response.go Outdated Show resolved Hide resolved
type errorValue interface{}

type errorResult struct {
Message string `json:"message"` // note no omitempty

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 8, 2019

Contributor

I think this is mandatory basically, so probably a good idea to say that indeed

}

type respJSON struct {
Type ResponseType `json:"type"`

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 8, 2019

Contributor

I have mixed feelings here, the consistency is good, I wouldn't design this at all this way though if I started fresh. @chipaca should we try to redesign this how we would do a v3 of the general API or not worth the confusion?

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 18, 2019

Contributor

I'm still thinking about this

This comment has been minimized.

Copy link
@jhenstridge

jhenstridge Jul 22, 2019

Author Contributor

I'd be happy to review the overall API design, but does that need to be tied to this initial PR? At present, the only API endpoint is so trivial I don't think we'd need to worry about third parties relying on it.

This PR is already quite large, so I've been avoiding building anything more on top of it while it is in flux (e.g. exit on idle, connecting to the session bus, etc).

}

type resp struct {
Status int `json:"status-code"`

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 8, 2019

Contributor

is this in itself ever JSON marshalled? are they used in the test for unmarshalling?

This comment has been minimized.

Copy link
@jhenstridge

jhenstridge Jul 9, 2019

Author Contributor

It is marshalled in the snapd API, so I left it as is here. For better or worse, it is currently used by snapd-glib in place of the HTTP status code:

https://github.com/snapcore/snapd-glib/blob/master/snapd-glib/requests/snapd-json.c#L196

We could omit it in this interface, but that might be better suited to a larger discussion of cleaning up the API.

This comment has been minimized.

Copy link
@jhenstridge

jhenstridge Jul 10, 2019

Author Contributor

And it looks like snapd-glib behaves this way because the Go client library does the same: Client.do discards the http.Response without checking the error code: it only checks the json body to detect error responses.

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 18, 2019

Contributor

sorry, I was referring to resp vs respJSON, don't we marshal only the latter?

This comment has been minimized.

Copy link
@pedronis

pedronis Jul 18, 2019

Contributor

@jhenstridge I am a bit confused about the snapd-glib reference, this will be mostly be talked to from snapd itself, no?

This comment has been minimized.

Copy link
@jhenstridge

jhenstridge Jul 22, 2019

Author Contributor

Due to where you attached this comment, I initially thought you were asking specifically about the resp.Status member, rather than the whole resp struct. So my comments were more about "why is the status code marshalled in the response body?". That's why I was looking at the two main clients for the snapd REST API.

I agree that it is unlikely libsnapd-glib would talk to the user session agent, at least with the initial scope of functionality.

sessionagent/rest_api.go Outdated Show resolved Hide resolved
sessionagent/session_agent.go Outdated Show resolved Hide resolved
sessionagent/session_agent.go Outdated Show resolved Hide resolved
@pedronis
Copy link
Contributor

left a comment

couple more comments

sessionagent/session_agent_test.go Outdated Show resolved Hide resolved
sessionagent/session_agent.go Outdated Show resolved Hide resolved
@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

Okay, I think this now covers all the feedback that's come in.

@pedronis pedronis self-requested a review Jul 11, 2019

@pedronis
Copy link
Contributor

left a comment

sorry, another issue, this one related with the tests

usersession/agent/rest_api_test.go Show resolved Hide resolved
usersession/agent/rest_api_test.go Outdated Show resolved Hide resolved

@pedronis pedronis self-requested a review Jul 22, 2019

simplify responses removing status and status-code
add a TODO about later removing unneeded bits further
@pedronis

This comment has been minimized.

Copy link
Contributor

commented Jul 24, 2019

@jhenstridge I pushed changes to remove status/status-code from responses. Also added a TODO about cleaning maybe response.go further at a later point, when we know more of what is unneeded/unused.

@jhenstridge

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2019

Your changes look fine. It looks like CI got held up on another unreliable test: google:ubuntu-14.04-64:tests/main/sanitycheck this time.

@mvo5
mvo5 approved these changes Jul 25, 2019
Copy link
Collaborator

left a comment

Looks good, thank you!

@mvo5 mvo5 merged commit cb72ed3 into snapcore:master Jul 25, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@@ -731,6 +731,9 @@ popd
%{_unitdir}/snapd.autoimport.service
%{_unitdir}/snapd.failure.service
%{_unitdir}/snapd.seeded.service
%{_userunitdir}/snapd.session-agent.service
%{_userunitdir}/snapd.session-agent.socket
%{_userunitdir}/sockets.target.wants/snapd.session-agent.socket

This comment has been minimized.

Copy link
@Conan-Kudo

Conan-Kudo Jul 28, 2019

Contributor

This is not allowed in Fedora. We need to get the user-preset added in Fedora for this.

@@ -413,6 +416,9 @@ fi
%{_unitdir}/snapd.seeded.service
%{_unitdir}/snapd.service
%{_unitdir}/snapd.socket
%{_userunitdir}/snapd.session-agent.service
%{_userunitdir}/snapd.session-agent.socket
%{_userunitdir}/sockets.target.wants/snapd.session-agent.socket

This comment has been minimized.

Copy link
@Conan-Kudo

Conan-Kudo Jul 28, 2019

Contributor

This is not allowed in openSUSE.

@@ -366,6 +367,8 @@ fi
%dir %{_sharedstatedir}/snapd/snaps
%dir %{_systemd_system_env_generator_dir}
%dir %{_systemdgeneratordir}
%dir %{_userunitdir}
%dir %{_userunitdir}/sockets.target.wants

This comment has been minimized.

Copy link
@Conan-Kudo

Conan-Kudo Jul 28, 2019

Contributor

This is not allowed in openSUSE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.