Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
provider/maas: fix zone logic in node selection #8082
Conversation
jameinel
approved these changes
Nov 15, 2017
Since you confirmed it live, I'm tempted to just accept it. But I would like you to confirm that "--to zone=" is still handled correctly.
Specifically, in the GUI Maas case, it should fail to provision if you specify "juju deploy ubuntu --to zone=azone" rather than fall back to "default" zone even though we specifically requested azone.
It probably would be good if we had a better internal understanding of whether we're picking a zone because we are distributing across several zones, or if we are picking a zone because the user asked for it.
| switch { | ||
| - case placement.zoneName != "": | ||
| - availabilityZone = placement.zoneName |
jameinel
Nov 15, 2017
Owner
I'm a little concerned that this would break "--to zone=BLAH" but as long as we're testing that we're doing the right thing, I don't care whether this is here, or whether it triggers in DeriveAZ.
axw
Nov 15, 2017
Member
I've just confirmed with a live test that we do the right thing (i.e. --to zone=azone fails).
| + ) | ||
| + if args.AvailabilityZone != "" && isConflictError(err) { | ||
| + logger.Debugf("could not acquire a node in zone %q", args.AvailabilityZone) | ||
| + return nil, environs.ErrAvailabilityZoneFailed |
jameinel
Nov 15, 2017
Owner
Is it ok to upgrade any error to AZ error? Why do we need AZ error in the first place. Shouldn't any failure to provision be tried in another zone?
The old code only promoted to AZ error if it knew (with an off-by-one bug) that there was another AZ that could be tried.
Do we have any way to do something similar here?
axw
Nov 15, 2017
Member
Yeah, I don't like that particular part. It's not possible at the moment, because the provisioner relies on ErrAvailabilityZoneFailed to decide whether or not to attempt another zone. i.e. anything that's not ErrAvailabilityZoneFailed is expected to be a zone-independent failure, and is terminal.
It's not for this PR, but I think we should revisit this. One idea I have is to introduce a couple of error interfaces to the environs package, in the vein of net.Error, that both the provisioner and bootstrap code can rely upon:
type StartInstanceError interface {
error
// Permanent reports whether or not the error is permanent.
// Permanent errors will not be retried by the provisioner.
Permanent() bool
}
type AvailabilityZoneError interface {
error
// AvailabilityZoneIndependent reports whether or not the
// error is related to a specific availability zone.
AvailabilityZoneIndependent() bool
}The provisioner would assume that all errors are transient unless they implement StartInsanceError and Permanent returns true. Similarly, it would assume that all errors are zone-specific and attempt the next zone, unless they implement AvailabilityZoneError and AvailabilityZoneIndependent returns true.
|
$$merge$$ |
|
Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju |
axw commentedNov 15, 2017
Description of change
Update the selectNode methods to accept a single
availability zone (or none), and return the special
ErrAvailabilityZoneFailed error iff a zone was
specified and acquiring a node in that zone failed
to find a node matching the constraints. This makes
the code simpler, and eliminates the code that
contained an off-by-one error.
QA steps
(should fail before the change, succeed after)
(should get 2 units in "default" zone, since "azone" has no nodes)
Documentation changes
None.
Bug reference
Fixes https://bugs.launchpad.net/juju/+bug/1732021