network: add net-bond-reconfigure-delay model config option #6870

Merged
merged 18 commits into from Jan 29, 2017

Conversation

Projects
None yet
3 participants
Contributor

frobware commented Jan 25, 2017

This change introduces a model-config setting called net-bond-reconfigure-delay.

The value represents the amount of time in seconds to sleep between ifdown and ifup when bridging bonded interfaces; this is a platform bug and all of this can go away when bugs #1594855 and #1269921 are fixed. Without a delay between ifdown and ifup bonded interfaces in 802.3ad LACP mode fail to come back up, rendering the network incomplete and therefore broken.

The default is set to 17 seconds. For a long time the bridge script hard-coded a value of 3 second but some setups now require an even longer duration. The last reported issue was fixed with a 10 second timeout, however, we're increasing that again because this issue (and solution) is not very discoverable and we want bridging to work out-of-the-box.

QA steps

This setup requires your MAAS setup has nodes with bonded interfaces.

  1. bootstrap on MAAS
  2. Switch to the controller model
  3. Verify the default value of net-bond-reconfigure-delay:
    $ juju model-config | grep net-bond-reconfigure-delay (should report 17)
  4. Switch to the default model and change the value in this model
    $ juju model-config net-bond-reconfigure-delay=42
    $ juju model-config | grep net-bond-reconfigure-delay (should report 42)
  5. Switch back to the controller and verify that it still has its own model config value of 17
  6. In the default model add a container:
    $ juju add-machine lxd:0 --constraints spaces=space-0
  7. On machine 0 run:
    $ sudo grep bridgescript /var/log/juju/machine-1.log
    You should observe that the bridge script has been invoked with --net-bond-reconfigure-delay=42
  8. Change the value again .
    $ juju model-config net-bond-reconfigure-delay=67
  9. Repeat steps (5) and (6); in (6) you should see your new period (67)

The documentation at: https://jujucharms.com/docs/2.0/models-config#list-of-model-keys needs updating to include the new key, its value and an explanation.

Fixes LP:#1657579

@frobware frobware changed the title from 2.1 dynamic bridges fix lp1657579 to network: add bond-reconfigure-delay model config option Jan 25, 2017

So I feel like I'd like to see the "if device.is_bond" that is currently only in the python script pulled all the way up into the API Server code, and have it decide whether ReconfigureDelay should be 0 or BoundReconfigureDelay.
And then the stack below it does the same aggregation etc below it, just without the word "bond".

Thoughts?

apiserver/params/network.go
- BridgeName string `json:"bridge-name"`
+ HostDeviceName string `json:"host-device-name"`
+ BridgeName string `json:"bridge-name"`
+ BondReconfigureDelay int `json:"bond-reconfigure-delay"`
@jameinel

jameinel Jan 25, 2017

Owner

I feel like this should just be "ReconfigureDelay" and the script we run should always just sleep that much.
Its just that we should have a different value if we know that the thing being bridged is a bond.

@@ -512,17 +535,6 @@ func (c *Config) mustString(name string) string {
return value
}
-// mustInt returns the named attribute as an integer, panicking if
@jameinel

jameinel Jan 25, 2017

Owner

If nothing failed with this deleted, it definitely deserves to be gone.

@frobware frobware changed the title from network: add bond-reconfigure-delay model config option to network: add net-bond-reconfigure-delay model config option Jan 26, 2017

@jameinel jameinel changed the base branch from 2.1-dynamic-bridges to 2.1 Jan 26, 2017

the rest is pending the new renames and threading of config through the various layers.

network/add-juju-bridge.py
@@ -393,6 +393,7 @@ def arg_parser():
parser.add_argument('--interfaces-to-bridge', help="interfaces to bridge; space delimited", type=str, required=True)
parser.add_argument('--dry-run', help="dry run, no activation", action='store_true', default=False, required=False)
parser.add_argument('--bridge-name', help="bridge name", type=str, required=False)
+ parser.add_argument('--bond-reconfigure-delay', help="delay in seconds before raising bonded interfaces", type=int, required=False, default=30)
@jameinel

jameinel Jan 26, 2017

Owner

make sure we change this argument as well

for _, version := range []string{
"/usr/bin/python2",
"/usr/bin/python3",
"/usr/bin/python",
} {
if _, err := os.Stat(version); err == nil {
- return version
+ result = append(result, version)
@jameinel

jameinel Jan 26, 2017

Owner

The only small bug is that likely both 'python2' and 'python' exist on the path because that was the official symlink for python2. Its a little weird to call the same one twice.

@frobware

frobware Jan 26, 2017

Contributor

This was very deliberate. Back when xenial was cooking there was a period when there was a symlink, then there wasn't. I was trying to ensure that the script is invoked with whatever is available on the deployed platform.

Andrew McDermott and others added some commits Jan 26, 2017

Bridge script no longer responsible for bond-raise-delay
Note: this is missing additional state tests for the new
reconfigureDelay parameter passed to FindMissingBridgesForContainer().
Owner

jameinel commented Jan 29, 2017

$$merge$$

Contributor

jujubot commented Jan 29, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 8af1a5a into juju:2.1 Jan 29, 2017

1 check passed

github-check-merge-juju Built PR, ran unit tests, and tested LXD deploy. Use !!.*!! to request another build. IE, !!build!!, !!retry!!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment