Generate ENI and push to LXD container before start #6276

Merged
merged 1 commit into from Sep 28, 2016

Conversation

Projects
None yet
5 participants
Contributor

macgreagoir commented Sep 19, 2016

Uses PushFile to write the correct /etc/network/interfaces config to the
container rootfs between (lxc) init and start.

This prevents the default LXD behaviour, which is to expect DHCP on
eth0, breaking container provisioning when DHCP is not available in the
eth0 space.

Fixes lp:1611981

QA steps:

  • Test in a MAAS environment with two NICs per node, where the NIC with the higher sorting name is on the PXE/DHCP space, and ensuring the other space does not serve DHCP
    • NICs can be renamed in MAAS on a node's page
    • For example, if ens3 and ens4 are named by default, and ens3 is the PXE/DHCP interface, rename ens3 as 'nicZ' and ens4 as 'nicA'
    • You may also like to assign a static IP addr to the second ('nicA' in this example) interface
  • juju add-machine lxd:<host> for a xenial container
    • The container should start
    • In the container, /etc/network/interfaces should configure its interfaces and nameservers correct for the environment (eth0 with an address in nicA's space, in this example, and eth1 in nicZ's) and should not source /etc/network/interfaces.d/*.cfg
  • Repeat with a trusty container, juju add-machine --series trusty lxd:<host>

Code looks good but lack of unit tests in the area worries me. Especially, lack of checking the error conditions.

Please get another network-savy reviewer to PTAL as well :D

@@ -202,17 +202,19 @@ func PrepareNetworkConfigFromInterfaces(interfaces []network.InterfaceInfo) *Pre
// might include per-interface networking config if both networkConfig
// is not nil and its Interfaces field is not empty.
@anastasiamac

anastasiamac Sep 20, 2016

Member

Is this comment still valid? Looks like a change in behavior, I would have liked to see a change in comment too \o/

@macgreagoir

macgreagoir Sep 20, 2016

Contributor

To my reading of this comment, this change makes it more correct, in that we can now have networkConfig that is nil.

- userData, err := containerinit.CloudInitUserData(instanceConfig, networkConfig)
+ // Do not pass networkConfig, as we want to directly inject our own ENI
+ // rather than using cloud-init.
+ userData, err := containerinit.CloudInitUserData(instanceConfig, nil)
if err != nil {
return
@anastasiamac

anastasiamac Sep 20, 2016

Member

I think you've changed err scope here because it is used in := statement. Please provide an explicit return as above with errors.Trace \o/

@macgreagoir

macgreagoir Sep 20, 2016

Contributor

I stand to be corrected :-) but think it's OK. err is already declared in the signature and err here is not in a new lexical block, so is only assigned.

I am not keen to introduce a mixture of bare and explicit returns in this function, if we don't need to, but I'll let the second reviewer you ask for push me that way too :-)

@@ -143,6 +145,7 @@ func (manager *containerManager) CreateContainer(
return
@anastasiamac

anastasiamac Sep 20, 2016

Member

I'd change this to provide an explicit return as a drive-by ;)

@macgreagoir

macgreagoir Sep 20, 2016

Contributor

(See comment above about not mixing returns.)

container/lxd/lxd.go
@@ -143,6 +145,7 @@ func (manager *containerManager) CreateContainer(
return
}
+ // TODO This might be dead code. Do we always get len(nics) > 0?
@anastasiamac

anastasiamac Sep 20, 2016

Member

Please add a "juju" TODO - TODO(your nick) so that we'd know who to ask if questions arise.
Also this one looks like it might be worth a bug/a card/a tech-debt card... Please add one of these and reference it in TODO as well.

@macgreagoir

macgreagoir Sep 20, 2016

Contributor

Done, and I have a leankit card referencing this.

container/lxd/lxd.go
+ eni, err := containerinit.GenerateNetworkConfig(networkConfig)
+ if err != nil {
+ err = errors.Annotatef(err, "failed to generate /etc/network/interfaces content")
+ return
@anastasiamac

anastasiamac Sep 20, 2016

Member

Here too - explicit return would be nice.

@macgreagoir

macgreagoir Sep 20, 2016

Contributor

(See comment above about not mixing returns.)

Generate ENI and push to LXD container before start
Uses PushFile to write the correct /etc/network/interfaces config to the
container rootfs between (lxc) init and start.

This prevents the default LXD behaviour, which is to expect DHCP on
eth0, breaking container provisioning when DHCP is not available in the
eth0 space.

Fixes lp:1611981

My current experience with this branch is that with two NICs, with PXE subnet on the "second" NIC, a container will fail to get an address at all. That's both with the "first" NIC as an unconfigured NIC and a NIC on a different subnet.

I'm doing:

juju switch controller
juju add-machine lxd:0

voidspace commented Sep 26, 2016

This is the rendered (non-functional) /e/n/i

ubuntu@nonacidic-candyce:~$ sudo cat /var/lib/lxd/containers/juju-9aedd5-0-lxd-0/rootfs/etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# Source interfaces
# Please check /etc/network/interfaces.d before changing this file
# as interfaces may have been defined in /etc/network/interfaces.d
# See LP: #1262951
source /etc/network/interfaces.d/*.cfg
ubuntu@nonacidic-candyce:~$ sudo lxc list
+---------------------+---------+------+------+------------+-----------+
|        NAME         |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+---------------------+---------+------+------+------------+-----------+
| juju-9aedd5-0-lxd-0 | RUNNING |      |      | PERSISTENT | 0         |
+---------------------+---------+------+------+------------+-----------+
ubuntu@nonacidic-candyce:~$ 

Contributor

macgreagoir commented Sep 26, 2016

I've retested with rc1 and with rc1 plus this patch, with and without a statically configured address on the second interface (and with PXE/DHCP on the higher-sorted, first, interface in all cases). I can reproduce the bug without this patch and see the bug fixed with this patch. Let's compare MAAS environments and see what we're doing differently.

So, the QA instructions suggest to rename the NIC on the PXE subnet to ethZ so that it sorts "second" - which is what I have been doing. I've also been using auto-assign rather than "static" - which may not work without DHCP but shouldn't cause a non-functioning container.

I'm using a MAAS controller inside a KVM container, with two virsh networks, and nodes that are also KVM containers.

QA OK and LGTM - tested with xenial, trusty, and precise containers

Contributor

macgreagoir commented Sep 28, 2016

$$merge$$

Contributor

jujubot commented Sep 28, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit e48d4de into juju:master Sep 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment