overlord/ifacestate: fix auto-connect during core snap transition #3145

Closed
wants to merge 13 commits into
from

Conversation

Projects
None yet
5 participants
Contributor

zyga commented Apr 5, 2017

We found that during the transition of the ubuntu-core snap to the new core snap the hook on the core snap is no longer auto-connected to core (this is for network-bind). This has negative side effects and it happens because during the transition process there are two candidates that supply the network-bind slot, one from the ubuntu-core and and one from the core snap.

This branch adds a quirk where we ignore candidates from ubuntu-core if candidates from core exist.

zyga added some commits Apr 5, 2017

overlord: rename osSnapYaml to ubuntuCoreSnapYaml
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
overlord/ifacestate: ignore "ubuntu-core" if "core" candidates exist.
This patch changes how the interface manager processes auto-connect
candidates. In the case when both old ubuntu-core and the new core snaps
are installed (e.g. during a core transition) we now ignore any
candidates from the ubuntu-core snap. This way there is again only one
candidate and it can be auto-connected.

Fixes: https://bugs.launchpad.net/snappy/+bug/1680097
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

@zyga zyga requested review from mvo5 and fgimenez Apr 5, 2017

@zyga zyga added this to the 2.24 milestone Apr 5, 2017

overlord/ifacestate: deduplicate and aggregate test YAMLs
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

mvo5 approved these changes Apr 5, 2017

Thanks for tracking this one down!

@zyga \o/ thanks a lot for fixing this! Could you please update tests/main/classic-ubuntu-core-transition with https://github.com/zyga/snapd/commit/097759295b71fb4b89f2aaf9e3d69ea6eeba7c0b.diff ? With those changes it currently fails in master and passes in your branch.

Cheers!

tests: adjust tests for connecting network-bind to core
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Apr 6, 2017

@fgimenez done, thanks!

Contributor

fgimenez commented Apr 6, 2017

@zyga thanks, it seems that with your changes we are losing the check for the network interface that we had before, could you please readd it? Maybe not as in the diff proposed above, we could check explicitly for multiple spaces after the slot name: ":network +test-snapd-python-webserver", what do you think?

tests: re-add the tests for network
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

Thanks!

zyga added some commits Apr 6, 2017

interfaces: deduplicate return value from Repository.Connected
Because of missing validation of plug/slot names we don't really have
the guarantee that a given snap has unique plug or slot names. That
is fixed separately but we need to limit the damage. The aforementioned
method would return duplicate results when a snap is self-connected
and the plug and slot name is the same, as can happen with the "core"
snap and the "network-bind" plug and slot. This patch removes such
duplicates.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
overlord/ifacestate: connect "network-bind" to core on startup
This patch cures the damage caused by a bug in the ubuntu-core -> core
transition. During that transition the new core snap would not get auto
connected and any new plugs it may have (such as the network-bind plug)
would stay disconnected. To fix this, the interface manager is now
automatically connecting (as if it was specified in the state) the
"core:network-bind" plug to the slot with the same name (yes this
should be forbidden but that is another bug)

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

mvo5 approved these changes Apr 6, 2017

Thanks for chasing this down

interfaces/repo.go
@@ -550,7 +550,17 @@ func (r *Repository) connected(snapName, plugOrSlotName string) ([]ConnRef, erro
conns = append(conns, connRef)
}
}
- return conns, nil
+ // remove duplicates, this is caused by duplicate plug/slot name on core
+ // and missing validation on plug/snap name when adding core.
@niemeyer

niemeyer Apr 6, 2017

Contributor

Validation of names is unrelated to this problem.

Then, rather than building a list with duplications and then building a map to then build a list without duplications, we can do it at once above by just skipping entries already seen in the first place.

@zyga

zyga Apr 6, 2017

Contributor

Ah, good idea, will do.

@zyga

zyga Apr 7, 2017

Contributor

Validation is now done in #3153

overlord/ifacestate/helpers.go
@@ -98,6 +101,24 @@ func (m *InterfaceManager) addSnaps() error {
return nil
}
+// fixDisconnectedCorePlugs will re-connect the network-bind plug to the
+// network-bind slot (on core). This cures the effects of bug
+// https://bugs.launchpad.net/snappy/+bug/1680097
@niemeyer

niemeyer Apr 6, 2017

Contributor

Kudos for the good comment and function name.

overlord/ifacestate/helpers.go
+ if m.repo.Plug("core", "network-bind") != nil {
+ // connect it to the slot (connect is a no-op if connected)
+ connRef := interfaces.ConnRef{
+ PlugRef: interfaces.PlugRef{Snap: "core", Name: "network-bind"},
@niemeyer

niemeyer Apr 6, 2017

Contributor

This looks wrong and needs fixing right away. I have no idea how we allowed duplicated slot+plug names, but that's part of the foundation of the system. We cannot allow that to happen at all.

@zyga

zyga Apr 6, 2017

Contributor

Ok, I'll coordinate with @mvo5 so that the next core snap will use the new suffixed plug names.

@zyga

zyga Apr 7, 2017

Contributor

The core snap will be corrected on next upload but we also have this fix #3154

zyga added some commits Apr 6, 2017

interfaces: simplify deduplication process
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
overlord/ifacestate: auto-connect network-bind-plug (rename)
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
overlord/ifacestate: auto-connect core-support to core-support-plug
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Apr 6, 2017

I updated this slightly to connect both network-bind and core-support, just in case we missed some error that would cause core-support to be disconnected. Have a look. Oh, I also named plugs -plug but this has to happen in the core snap separately.

overlord/ifacestate: update comment
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
+func (m *InterfaceManager) fixDisconnectedCorePlugs() error {
+ const coreName = "core"
+ for _, slotName := range []string{"network-bind", "core-support"} {
+ plugName := fmt.Sprintf("%s-plug", slotName)
@mvo5

mvo5 Apr 7, 2017

Collaborator

Pardon my ignorance - this means we need to modify the core snap to have a new {network-bind,core-support}-plug name, correct?

@zyga

zyga Apr 7, 2017

Contributor

Yes, that's correct

@mvo5

mvo5 Apr 10, 2017

Collaborator

AIUI landing the core snap change is not mandatory for this so we should (IMO) leave it for now.

@mvo5

mvo5 Apr 10, 2017

Collaborator

The core snap change has landed in any case.

+// fixDisconnectedCorePlugs will re-connect the network-bind-plug and
+// core-support-plug to the corresponding slots on the core snap.
+// This cures the effects of bug https://bugs.launchpad.net/snappy/+bug/1680097
+func (m *InterfaceManager) fixDisconnectedCorePlugs() error {
@pedronis

pedronis Apr 10, 2017

Contributor

do I understand correctly that either this will do something or #3154, the latter will do something only if there are connections but with the old names?

@pedronis

pedronis Apr 10, 2017

Contributor

after thinking with @mvo it seems we should not need this but only the bit in autoConnect, at some point we will run autoConnect with new core on the new core before trying to run the hook

+ case candidates[1].Snap.Name() == "ubuntu-core" && candidates[0].Snap.Name() == "core":
+ candidates = candidates[0:1]
+ }
+ }
@pedronis

pedronis Apr 10, 2017

Contributor

couldn't we do something directly in transitionConnectionsCoreMigration instead of keeping this special case here?

it's because it's too late there? we need again for the configure hook?

@zyga

zyga Apr 10, 2017

Contributor

that place is just dealing with connections in the state. We'd have to somehow convey (from the part that installs core) that ubuntu-core is special and we should not auto-connect to it. I don't think it would be cleaner.

I think we need epoch updates to be able to remove hacks like this.

@pedronis

pedronis Apr 10, 2017

Contributor

as discussed with @mvo it seems once we fixe slot-side auto connect we need something like this for core-support anyway during the transition and as long as we have the transition, we will have a plug (on core) and two slots (on core and ubuntu-core) and sane autoconnect rules would/should refuse that independent on which side we look at (slots or plugs)

+// This cures the effects of bug https://bugs.launchpad.net/snappy/+bug/1680097
+func (m *InterfaceManager) fixDisconnectedCorePlugs() error {
+ const coreName = "core"
+ for _, slotName := range []string{"network-bind", "core-support"} {
@pedronis

pedronis Apr 10, 2017

Contributor

so we found out (and understood) that this is an issue only for network-bind

@@ -537,17 +537,24 @@ func (r *Repository) connected(snapName, plugOrSlotName string) ([]ConnRef, erro
if r.plugs[snapName][plugOrSlotName] == nil && r.slots[snapName][plugOrSlotName] == nil {
return nil, fmt.Errorf("snap %q has no plug or slot named %q", snapName, plugOrSlotName)
}
- // Collect all the relevant connections
+ // Collect all the relevant connections but be on the lookout for duplicates.
@niemeyer

niemeyer Apr 10, 2017

Contributor

Given the fixes we discussed and are planning on merging imminently, we should never have duplicates here, so this doesn't look right.

+// fixDisconnectedCorePlugs will re-connect the network-bind-plug and
+// core-support-plug to the corresponding slots on the core snap.
+// This cures the effects of bug https://bugs.launchpad.net/snappy/+bug/1680097
+func (m *InterfaceManager) fixDisconnectedCorePlugs() error {
@pedronis

pedronis Apr 10, 2017

Contributor

do I understand correctly that either this will do something or #3154, the latter will do something only if there are connections but with the old names?

@pedronis

pedronis Apr 10, 2017

Contributor

after thinking with @mvo it seems we should not need this but only the bit in autoConnect, at some point we will run autoConnect with new core on the new core before trying to run the hook

+ case candidates[1].Snap.Name() == "ubuntu-core" && candidates[0].Snap.Name() == "core":
+ candidates = candidates[0:1]
+ }
+ }
@pedronis

pedronis Apr 10, 2017

Contributor

couldn't we do something directly in transitionConnectionsCoreMigration instead of keeping this special case here?

it's because it's too late there? we need again for the configure hook?

@zyga

zyga Apr 10, 2017

Contributor

that place is just dealing with connections in the state. We'd have to somehow convey (from the part that installs core) that ubuntu-core is special and we should not auto-connect to it. I don't think it would be cleaner.

I think we need epoch updates to be able to remove hacks like this.

@pedronis

pedronis Apr 10, 2017

Contributor

as discussed with @mvo it seems once we fixe slot-side auto connect we need something like this for core-support anyway during the transition and as long as we have the transition, we will have a plug (on core) and two slots (on core and ubuntu-core) and sane autoconnect rules would/should refuse that independent on which side we look at (slots or plugs)

@mvo5 mvo5 modified the milestones: 2.25, 2.24 Apr 11, 2017

Contributor

zyga commented Apr 11, 2017

Closing now as a simpler variant of this was merged.

@zyga zyga closed this Apr 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment