provider/maas: fix zone logic in node selection #8082

Merged
merged 1 commit into from Nov 15, 2017

Conversation

Projects
None yet
3 participants
Member

axw commented Nov 15, 2017

Description of change

Update the selectNode methods to accept a single
availability zone (or none), and return the special
ErrAvailabilityZoneFailed error iff a zone was
specified and acquiring a node in that zone failed
to find a node matching the constraints. This makes
the code simpler, and eliminates the code that
contained an off-by-one error.

QA steps

  1. Create an empty zone in MAAS called "azone" (or anything else lexicographically < "default")
  2. juju bootstrap
    (should fail before the change, succeed after)
  3. juju deploy -n 2 ubuntu
    (should get 2 units in "default" zone, since "azone" has no nodes)

Documentation changes

None.

Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1732021

provider/maas: fix zone logic in node selection
Update the selectNode methods to accept a single
availability zone (or none), and return the special
ErrAvailabilityZoneFailed error iff a zone was
specified and acquiring a node in that zone failed
to find a node matching the constraints. This makes
the code simpler, and eliminates the code that
contained an off-by-one error.

Fixes https://bugs.launchpad.net/juju/+bug/1732021

Since you confirmed it live, I'm tempted to just accept it. But I would like you to confirm that "--to zone=" is still handled correctly.
Specifically, in the GUI Maas case, it should fail to provision if you specify "juju deploy ubuntu --to zone=azone" rather than fall back to "default" zone even though we specifically requested azone.

It probably would be good if we had a better internal understanding of whether we're picking a zone because we are distributing across several zones, or if we are picking a zone because the user asked for it.

switch {
- case placement.zoneName != "":
- availabilityZone = placement.zoneName
@jameinel

jameinel Nov 15, 2017

Owner

I'm a little concerned that this would break "--to zone=BLAH" but as long as we're testing that we're doing the right thing, I don't care whether this is here, or whether it triggers in DeriveAZ.

@axw

axw Nov 15, 2017

Member

I've just confirmed with a live test that we do the right thing (i.e. --to zone=azone fails).

+ )
+ if args.AvailabilityZone != "" && isConflictError(err) {
+ logger.Debugf("could not acquire a node in zone %q", args.AvailabilityZone)
+ return nil, environs.ErrAvailabilityZoneFailed
@jameinel

jameinel Nov 15, 2017

Owner

Is it ok to upgrade any error to AZ error? Why do we need AZ error in the first place. Shouldn't any failure to provision be tried in another zone?
The old code only promoted to AZ error if it knew (with an off-by-one bug) that there was another AZ that could be tried.
Do we have any way to do something similar here?

@axw

axw Nov 15, 2017

Member

Yeah, I don't like that particular part. It's not possible at the moment, because the provisioner relies on ErrAvailabilityZoneFailed to decide whether or not to attempt another zone. i.e. anything that's not ErrAvailabilityZoneFailed is expected to be a zone-independent failure, and is terminal.

It's not for this PR, but I think we should revisit this. One idea I have is to introduce a couple of error interfaces to the environs package, in the vein of net.Error, that both the provisioner and bootstrap code can rely upon:

type StartInstanceError interface {
    error

    // Permanent reports whether or not the error is permanent.
    // Permanent errors will not be retried by the provisioner.
    Permanent() bool
}

type AvailabilityZoneError interface {
    error

    // AvailabilityZoneIndependent reports whether or not the
    // error is related to a specific availability zone.
    AvailabilityZoneIndependent() bool
}

The provisioner would assume that all errors are transient unless they implement StartInsanceError and Permanent returns true. Similarly, it would assume that all errors are zone-specific and attempt the next zone, unless they implement AvailabilityZoneError and AvailabilityZoneIndependent returns true.

Member

axw commented Nov 15, 2017

$$merge$$

Contributor

jujubot commented Nov 15, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

@jujubot jujubot merged commit 790a046 into juju:develop Nov 15, 2017

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment