Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network: add net-bond-reconfigure-delay model config option #6870

Merged
merged 18 commits into from Jan 29, 2017

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Jan 25, 2017

This change introduces a model-config setting called net-bond-reconfigure-delay.

The value represents the amount of time in seconds to sleep between ifdown and ifup when bridging bonded interfaces; this is a platform bug and all of this can go away when bugs #1594855 and #1269921 are fixed. Without a delay between ifdown and ifup bonded interfaces in 802.3ad LACP mode fail to come back up, rendering the network incomplete and therefore broken.

The default is set to 17 seconds. For a long time the bridge script hard-coded a value of 3 second but some setups now require an even longer duration. The last reported issue was fixed with a 10 second timeout, however, we're increasing that again because this issue (and solution) is not very discoverable and we want bridging to work out-of-the-box.

QA steps

This setup requires your MAAS setup has nodes with bonded interfaces.

  1. bootstrap on MAAS
  2. Switch to the controller model
  3. Verify the default value of net-bond-reconfigure-delay:
    $ juju model-config | grep net-bond-reconfigure-delay (should report 17)
  4. Switch to the default model and change the value in this model
    $ juju model-config net-bond-reconfigure-delay=42
    $ juju model-config | grep net-bond-reconfigure-delay (should report 42)
  5. Switch back to the controller and verify that it still has its own model config value of 17
  6. In the default model add a container:
    $ juju add-machine lxd:0 --constraints spaces=space-0
  7. On machine 0 run:
    $ sudo grep bridgescript /var/log/juju/machine-1.log
    You should observe that the bridge script has been invoked with --net-bond-reconfigure-delay=42
  8. Change the value again .
    $ juju model-config net-bond-reconfigure-delay=67
  9. Repeat steps (5) and (6); in (6) you should see your new period (67)

The documentation at: https://jujucharms.com/docs/2.0/models-config#list-of-model-keys needs updating to include the new key, its value and an explanation.

Fixes LP:#1657579

@frobware frobware changed the title 2.1 dynamic bridges fix lp1657579 network: add bond-reconfigure-delay model config option Jan 25, 2017
Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I feel like I'd like to see the "if device.is_bond" that is currently only in the python script pulled all the way up into the API Server code, and have it decide whether ReconfigureDelay should be 0 or BoundReconfigureDelay.
And then the stack below it does the same aggregation etc below it, just without the word "bond".

Thoughts?

BridgeName string `json:"bridge-name"`
HostDeviceName string `json:"host-device-name"`
BridgeName string `json:"bridge-name"`
BondReconfigureDelay int `json:"bond-reconfigure-delay"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should just be "ReconfigureDelay" and the script we run should always just sleep that much.
Its just that we should have a different value if we know that the thing being bridged is a bond.

@@ -512,17 +535,6 @@ func (c *Config) mustString(name string) string {
return value
}

// mustInt returns the named attribute as an integer, panicking if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nothing failed with this deleted, it definitely deserves to be gone.

@frobware frobware changed the title network: add bond-reconfigure-delay model config option network: add net-bond-reconfigure-delay model config option Jan 26, 2017
@jameinel jameinel changed the base branch from 2.1-dynamic-bridges to 2.1 January 26, 2017 11:49
Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest is pending the new renames and threading of config through the various layers.

for _, version := range []string{
"/usr/bin/python2",
"/usr/bin/python3",
"/usr/bin/python",
} {
if _, err := os.Stat(version); err == nil {
return version
result = append(result, version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only small bug is that likely both 'python2' and 'python' exist on the path because that was the official symlink for python2. Its a little weird to call the same one twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was very deliberate. Back when xenial was cooking there was a period when there was a symlink, then there wasn't. I was trying to ensure that the script is invoked with whatever is available on the deployed platform.

@@ -393,6 +393,7 @@ def arg_parser():
parser.add_argument('--interfaces-to-bridge', help="interfaces to bridge; space delimited", type=str, required=True)
parser.add_argument('--dry-run', help="dry run, no activation", action='store_true', default=False, required=False)
parser.add_argument('--bridge-name', help="bridge name", type=str, required=False)
parser.add_argument('--bond-reconfigure-delay', help="delay in seconds before raising bonded interfaces", type=int, required=False, default=30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure we change this argument as well

Andrew McDermott and others added 3 commits January 26, 2017 15:49
Note: this is missing additional state tests for the new
reconfigureDelay parameter passed to FindMissingBridgesForContainer().
@jameinel
Copy link
Member

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Jan 29, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 8af1a5a into juju:2.1 Jan 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants