2.1.1 treat errors correctly 1656326 #7062

Merged
merged 4 commits into from Mar 2, 2017

Conversation

Projects
None yet
3 participants
Owner

jameinel commented Mar 2, 2017

Description of change

The code around setting up network interfaces for a container had a statement of "if err != nil { // its not fatal, fallback to the local bridge". Which I really wanted to get rid of. However, the last time I did that, it turned out to break deployments.

This, in turn, turned out to be 2 more bugs:

  • On Trusty machines, container_initialization is installing the 'lxd' package, which creates the 'lxdbr0' bridge. Before that point, we don't have the bridge in the machine inventory, so it isn't listed as a possible bridge to use. So we need to do a SetHostMachineNetworkConfig() call after calling container.Initialize.
  • We had a bug in Provisioner.SetObservedNetworkConfig(), in that it is a different copy of the same code used by the Machiner.SetObservedNetworkConfig(). In Juju 2.1 I had fixed the latter one to allow recording machine devices even when we don't have Provider information. However, I didn't fix this one, because I didn't know about it. (I filed https://bugs.launchpad.net/juju/+bug/1669397 about fixing that in a more permanent way.)

There is a bunch of other cleanups and logging tweaks that were helpful while debugging all of this.

QA steps

 $ juju bootstrap aws/gce/azure
 $ juju deploy cloud-city/repository/scale2-lxd.yaml

That should create 2 trusty machines, and on the second one create 2 containers. They should be bridged onto lxdbr0. This is the same behavior as before with 2.1, but it is being done via a different code path. You should be able to see in juju debug-log that we are consciously selecting lxdbr0 for the containers. Rather than a line like: failed to prepare container 1/lxd/0 network config

Documentation changes

Ideally this isn't user facing. Other than decreasing the number of times people end up with a container that can't be accessed. Instead errors that we would have suppressed should show up as container provisioning errors.

Bug reference

lp:1656326

jameinel added some commits Mar 1, 2017

Properly record machine network config before launching containers.
Added a step so that after we 'initialize' container runtimes, we then revalidate
the host's network config. This is what was broken on 'trusty' machines, because
they don't start with an lxdbr0.

We also find that Provisioner.SetObservedNetworkConfig was a copy&paste rather
than sharing the Machiner implementation, which explains why the second retry
of provisioning containers still didn't notice that we had lxdbr0.
Owner

jameinel commented Mar 2, 2017

$$merge$$

Contributor

jujubot commented Mar 2, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Mar 2, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10404

Owner

jameinel commented Mar 2, 2017

$$merge$$ failure appears to be Windows "could not find mongo" failures.

Contributor

jujubot commented Mar 2, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 5378952 into juju:2.1 Mar 2, 2017

1 check passed

github-check-merge-juju Built PR, ran unit tests, and tested LXD deploy. Use !!.*!! to request another build. IE, !!build!!, !!retry!!
Details

@jameinel jameinel deleted the jameinel:2.1.1-no-fallback-again-1656326 branch Apr 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment