api, ifacestate: resolve disconnect early #3212

Merged
merged 9 commits into from May 11, 2017

Conversation

Projects
None yet
4 participants
Contributor

stolowski commented Apr 19, 2017

Resolve connections affected by "disconnect" early in the api, before calling Disconnect, instead of delaying it till doDisconnect. This is required for disconnect hooks as all snaps need to be known upfront when hook tasks are created (hooks will be introduced in followup PRs).

With this change multiple "disconnect" tasksets will be created for affected snaps, instead of a single taskset. This also fixes the bug described here https://forum.snapcraft.io/t/interface-hooks-disconnect-hooks-wip/179/3

stolowski added some commits Apr 19, 2017

@stolowski stolowski changed the title from api, ifacemgr: resolve disconnect early to api, ifacestate: resolve disconnect early Apr 19, 2017

Looks good except for one detail. I don't know what to do about it that isn't terrible though :)

+ if err == nil {
+ for _, connRef := range conns {
+ var ts *state.TaskSet
+ ts, err = ifacestate.Disconnect(st, connRef.PlugRef.Snap, connRef.PlugRef.Name, connRef.SlotRef.Snap, connRef.SlotRef.Name)
@zyga

zyga Apr 20, 2017

Contributor

The only potential downside I can think of is that each disconnect will run in parallel but they need to synchronize on the common snap and I'm not sure this happens already. It will also setup security of one of the snaps multiple times (typically core so I think that's okay, though it will send N requests for udev to reload)

@stolowski

stolowski Apr 20, 2017

Contributor

Wasn't checkChangeConflct supposed to achieve that in the old implementation (except of course it wouldn't really work because of that aforementioned bug)? Yes, this looks like something I need to address.

@pedronis

pedronis Apr 26, 2017

Contributor

as we discussed in some standup, at most one doDisconnect task runs at a time (not even per snap but globally)

@stolowski

stolowski Apr 27, 2017

Contributor

Yes, thanks for reminder, this is achieved with SetBlocked lambda in ifacemgr. So not a problem at all.

@pedronis

pedronis Apr 28, 2017

Contributor

what we said is true, and this is older than the PR, but there is something weird about the CheckChangeConflict in ifacestate.Disconnect given that CheckChangeConflict itself will not care about disconnect/connect tasks, mmh, we need to rethink that, whether we should drop those checks

@@ -3473,30 +3485,18 @@ func (s *apiSuite) TestDisconnectPlugFailureNoSuchPlug(c *check.C) {
c.Assert(err, check.IsNil)
rec := httptest.NewRecorder()
interfacesCmd.POST(interfacesCmd, req, nil).ServeHTTP(rec, req)
- c.Check(rec.Code, check.Equals, 202)
+ c.Check(rec.Code, check.Equals, 400)
@zyga

zyga Apr 20, 2017

Contributor

Nice, I like this property a lot!

@@ -458,13 +444,9 @@ func (m *InterfaceManager) doDisconnect(task *state.Task, _ *tomb.Tomb) error {
return err
}
- affectedConns, err := m.repo.ResolveDisconnect(plugRef.Snap, plugRef.Name, slotRef.Snap, slotRef.Name)
@zyga

zyga Apr 20, 2017

Contributor

One thing that we might see (this is probably very unlikely) is new snapd handling a task that was created by old snapd. If that were to happen this code will misbehave.

@stolowski

stolowski Apr 20, 2017

Contributor

Very good point. I think we can either keep this logic here for such cases, or have a patch to convert old tasks to new approach. Let's discuss in HO/forum.

@zyga

zyga Apr 24, 2017

Contributor

I personally would lean to a patch that splits such tasks.

@stolowski

stolowski Apr 24, 2017

Contributor

Right. That sounds better than keeping logic that cannot be properly exercised under normal circumstances and is 99% dead for most of the time. Thanks, I'll look at patching.

@stolowski

stolowski Apr 24, 2017

Contributor

Ok, as discussed in the standup this is a no-issue after all, we will let disconnect fail in such rare cases. No need for patching.

@niemeyer

niemeyer Apr 26, 2017

Contributor

Can we please just ensure that it will error properly instead of arbitrarily? In other words, check that the refs look properly resolved and if not return an error such as "snapd changed, please retry the operation".

@stolowski

stolowski Apr 27, 2017

Contributor

Ok, fixed to error out with such message on Disconnect call. I also realized it's better to call repo.Disconnect() than repo.DisconnectAll(), fixed that as well.

+ if err != nil {
+ break
+ }
+ ts.JoinLane(st.NewLane())
@zyga

zyga Apr 20, 2017

Contributor

This is also very nice. Each disconnect will be undone separately if something fails and once we introduce the hook for disconnect this will become essential. Thanks!

overlord/ifacestate/handlers.go
- if err != nil {
- return err
- }
+ affectedConns := []interfaces.ConnRef{{PlugRef: plugRef, SlotRef: slotRef}}
m.repo.DisconnectAll(affectedConns)
@pedronis

pedronis Apr 26, 2017

Contributor

so if I understand correctly, this never fails, at most it's a noop?

@zyga

zyga Apr 27, 2017

Contributor

Yes, that's correct.

Two trivials and LGTM.

daemon/api.go
+ var conns []interfaces.ConnRef
+ repo := c.d.overlord.InterfaceManager().Repository()
+ // note: disconnect can affect multiple connections, so unlike with 'connect' above we cannot give a good summary -
+ // just include the snaps/plug/slot names given to the API, before they are resolved.
@niemeyer

niemeyer Apr 26, 2017

Contributor

Can we drop the comment? This is a pretty good summary! :)

@@ -458,13 +444,9 @@ func (m *InterfaceManager) doDisconnect(task *state.Task, _ *tomb.Tomb) error {
return err
}
- affectedConns, err := m.repo.ResolveDisconnect(plugRef.Snap, plugRef.Name, slotRef.Snap, slotRef.Name)
@zyga

zyga Apr 20, 2017

Contributor

One thing that we might see (this is probably very unlikely) is new snapd handling a task that was created by old snapd. If that were to happen this code will misbehave.

@stolowski

stolowski Apr 20, 2017

Contributor

Very good point. I think we can either keep this logic here for such cases, or have a patch to convert old tasks to new approach. Let's discuss in HO/forum.

@zyga

zyga Apr 24, 2017

Contributor

I personally would lean to a patch that splits such tasks.

@stolowski

stolowski Apr 24, 2017

Contributor

Right. That sounds better than keeping logic that cannot be properly exercised under normal circumstances and is 99% dead for most of the time. Thanks, I'll look at patching.

@stolowski

stolowski Apr 24, 2017

Contributor

Ok, as discussed in the standup this is a no-issue after all, we will let disconnect fail in such rare cases. No need for patching.

@niemeyer

niemeyer Apr 26, 2017

Contributor

Can we please just ensure that it will error properly instead of arbitrarily? In other words, check that the refs look properly resolved and if not return an error such as "snapd changed, please retry the operation".

@stolowski

stolowski Apr 27, 2017

Contributor

Ok, fixed to error out with such message on Disconnect call. I also realized it's better to call repo.Disconnect() than repo.DisconnectAll(), fixed that as well.

stolowski added some commits Apr 27, 2017

there is an issue I fear about the fact that repo.Disconnect can fail because the ref is not resolved (changed snapd), but also because one of the snaps or both has gone away since we created the change

overlord/ifacestate/handlers.go
if err != nil {
- return err
+ return fmt.Errorf("snapd changed, please retry the operation: %v", err)
@pedronis

pedronis Apr 28, 2017

Contributor

I fear this is a bit more complicated, given that we don't use conflicts, we can get this here when plugRef.Snap or slotRef.Snap don't exist anymore at all, in which case the old code would just skip, see below, I think we need to change order of things around do the snapstate.Gets first, and then this, otherwise we would produce this error misleadingly in some corner cases

@stolowski

stolowski May 4, 2017

Contributor

Ok, I think I addressed this problem be refactoring/reordering things there.

stolowski added some commits May 2, 2017

Get snapstates for both snaps before Disconnect to provide ignore an …
…error if snap was removed before disconnect task gets execed.
Contributor

niemeyer commented May 10, 2017

@pedronis Ready for another look I think.

@stolowski Tests broken it seems.

stolowski added some commits May 11, 2017

looks good, as we discussed we should fix CheckChangConflict to detect conflicts with connect/disconnect as well but the problem preexists this branch and appears only if people use interface hooks so can be done in a follow up

@zyga zyga merged commit f90393f into snapcore:master May 11, 2017

7 checks passed

artful-amd64 autopkgtest finished (success)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
xenial-amd64 autopkgtest finished (success)
Details
xenial-i386 autopkgtest finished (success)
Details
xenial-ppc64el autopkgtest finished (success)
Details
yakkety-amd64 autopkgtest finished (success)
Details
zesty-amd64 autopkgtest finished (success)
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment