New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network: add net-bond-reconfigure-delay model config option #6870
network: add net-bond-reconfigure-delay model config option #6870
Conversation
Not currently used; this is just preparation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I feel like I'd like to see the "if device.is_bond" that is currently only in the python script pulled all the way up into the API Server code, and have it decide whether ReconfigureDelay should be 0 or BoundReconfigureDelay.
And then the stack below it does the same aggregation etc below it, just without the word "bond".
Thoughts?
BridgeName string `json:"bridge-name"` | ||
HostDeviceName string `json:"host-device-name"` | ||
BridgeName string `json:"bridge-name"` | ||
BondReconfigureDelay int `json:"bond-reconfigure-delay"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should just be "ReconfigureDelay" and the script we run should always just sleep that much.
Its just that we should have a different value if we know that the thing being bridged is a bond.
@@ -512,17 +535,6 @@ func (c *Config) mustString(name string) string { | |||
return value | |||
} | |||
|
|||
// mustInt returns the named attribute as an integer, panicking if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing failed with this deleted, it definitely deserves to be gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest is pending the new renames and threading of config through the various layers.
for _, version := range []string{ | ||
"/usr/bin/python2", | ||
"/usr/bin/python3", | ||
"/usr/bin/python", | ||
} { | ||
if _, err := os.Stat(version); err == nil { | ||
return version | ||
result = append(result, version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only small bug is that likely both 'python2' and 'python' exist on the path because that was the official symlink for python2. Its a little weird to call the same one twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was very deliberate. Back when xenial was cooking there was a period when there was a symlink, then there wasn't. I was trying to ensure that the script is invoked with whatever is available on the deployed platform.
@@ -393,6 +393,7 @@ def arg_parser(): | |||
parser.add_argument('--interfaces-to-bridge', help="interfaces to bridge; space delimited", type=str, required=True) | |||
parser.add_argument('--dry-run', help="dry run, no activation", action='store_true', default=False, required=False) | |||
parser.add_argument('--bridge-name', help="bridge name", type=str, required=False) | |||
parser.add_argument('--bond-reconfigure-delay', help="delay in seconds before raising bonded interfaces", type=int, required=False, default=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure we change this argument as well
Note: this is missing additional state tests for the new reconfigureDelay parameter passed to FindMissingBridgesForContainer().
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
This change introduces a model-config setting called net-bond-reconfigure-delay.
The value represents the amount of time in seconds to sleep between ifdown and ifup when bridging bonded interfaces; this is a platform bug and all of this can go away when bugs #1594855 and #1269921 are fixed. Without a delay between ifdown and ifup bonded interfaces in 802.3ad LACP mode fail to come back up, rendering the network incomplete and therefore broken.
The default is set to 17 seconds. For a long time the bridge script hard-coded a value of 3 second but some setups now require an even longer duration. The last reported issue was fixed with a 10 second timeout, however, we're increasing that again because this issue (and solution) is not very discoverable and we want bridging to work out-of-the-box.
QA steps
This setup requires your MAAS setup has nodes with bonded interfaces.
$ juju model-config | grep net-bond-reconfigure-delay (should report 17)
$ juju model-config net-bond-reconfigure-delay=42
$ juju model-config | grep net-bond-reconfigure-delay (should report 42)
$ juju add-machine lxd:0 --constraints spaces=space-0
$ sudo grep bridgescript /var/log/juju/machine-1.log
You should observe that the bridge script has been invoked with --net-bond-reconfigure-delay=42
$ juju model-config net-bond-reconfigure-delay=67
The documentation at: https://jujucharms.com/docs/2.0/models-config#list-of-model-keys needs updating to include the new key, its value and an explanation.
Fixes LP:#1657579