Add new option to remove-machine to tell the provisioner not to stop the instance #7728

Merged
merged 2 commits into from Aug 14, 2017

Conversation

Projects
None yet
3 participants
Owner

wallyworld commented Aug 10, 2017

Description of change

Sometimes MAAS will fail a machine deployment after already telling Juju it will succeed. This can result in Juju having 2 machines in the model with the same instance id (one running, one not). Removing the bad machine in Juju then kills the good machine in MAAS.

Without significant MAAS/Juju changes, this issue cannot be fully fixed. Instead this PR adds some tooling to make things easier to deal with:

  1. new "machine-id" instance tag to reflect the juju machine id (enables the user to correlate a cloud instance with the juju machine record)
  2. log a warning when we write a duplicate instance id
  3. a new argument to juju remove-machine: --keep-instance

When --keep-instance is used, the machine is removed from Juju but the provisioner will not call StopInstance() in the cloud, hence the cloud machine will be left running. This allows a failed machine which happens to have a duplicate instance id to be removed from Juju without stopping a cloud instance belonging to a different Juju machine.

QA steps

bootstrap
add some machines (with and without units)
check the machine tags to see they include the new machine id tag
on an empty machine:
juju remove-machine --keep-instance
(verify that the machine is removed from juju but not killed in the cloud)
on a machine with units:
juju remove-machine --keep-instance --force
(verify that the machine is removed from juju but not killed in the cloud)

bootstrap juju 2.2
add a machine
juju remove-machine --keep-instance
-> error that juju doesn't support --keep-instance

Documentation changes

The new --keep-instance argument to remove-machine needs to be documented.

Bug reference

https://bugs.launchpad.net/juju/+bug/1671588

@wallyworld wallyworld requested a review from howbazaar Aug 10, 2017

api/machinemanager/machinemanager.go
+// is determined by the force and keep parameters.
+// TODO(wallyworld) - for Juju 3.0, this should be the preferred api to use.
+func (client *Client) DestroyMachinesWithParams(force, keep bool, machines ...string) ([]params.DestroyMachineResult, error) {
+ if !keep {
@howbazaar

howbazaar Aug 13, 2017

Owner

Why bother keeping this block? Since we need to handle keep and !keep in the call below, this block seems pointless.

@wallyworld

wallyworld Aug 14, 2017

Owner

fair enough

+ Keep: keep,
+ MachineTags: make([]string, 0, len(machines)),
+ }
+ allResults := make([]params.DestroyMachineResult, len(machines))
@howbazaar

howbazaar Aug 13, 2017

Owner

Given that the API server needs to do these checks anyway, why duplicate them and make the call site more complicated, on the client side?

@wallyworld

wallyworld Aug 14, 2017

Owner

Was being consistent with existing code in this facade. The original thinking would have been to fail early if possible, avoiding an unnecessary api call if we know up front the tags are invalid. In fact, these sorts of checks are done in many other facades also. It's not uncommon to duplicate checks like this in different layers. We'll often also perform repeated checks in state and apiserver layers for example. Or CLI and api etc.

-func (mm *MachineManagerAPI) destroyMachine(args params.Entities, force bool) (params.DestroyMachineResults, error) {
+// DestroyMachineWithParams removes a set of machines from the model.
+func (mm *MachineManagerAPIV4) DestroyMachineWithParams(args params.DestroyMachinesParams) (params.DestroyMachineResults, error) {
+ logger.Criticalf("%+v", args)
@howbazaar

howbazaar Aug 13, 2017

Owner

This should be removed.

@wallyworld

wallyworld Aug 14, 2017

Owner

doh, thanks

@@ -198,6 +229,12 @@ func (mm *MachineManagerAPI) destroyMachine(args params.Entities, force bool) (p
if err != nil {
return nil, err
}
+ if keep {
+ logger.Infof("destroy machine %v but keep instance", machineTag.Id())
+ if err := machine.SetKeepInstance(keep); err != nil {
@howbazaar

howbazaar Aug 13, 2017

Owner

If we never pass through false, why pass a parameter?

@wallyworld

wallyworld Aug 14, 2017

Owner

This is one potential call site, albeit the only one right now. However, why hard code an API in a lower layer just because right now we only set the value to true?

+ },
+ }},
+ })
+}
@howbazaar

howbazaar Aug 13, 2017

Owner

It would be good if you could ask the mock machine if keep instance was set to true.

@wallyworld

wallyworld Aug 14, 2017

Owner

Enhanced the mocks to allow this.

@@ -817,6 +817,37 @@ func (s *withoutControllerSuite) TestSeries(c *gc.C) {
})
}
+func (s *withoutControllerSuite) TestKeepInstance(c *gc.C) {
+ // Add a machine with keep-instance = true.
+ foobarMachine, err := s.State.AddMachine("foobar", state.JobHostUnits)
@howbazaar

howbazaar Aug 13, 2017

Owner

Why not use the factory.MakeMachine?

@wallyworld

wallyworld Aug 14, 2017

Owner

Was based on the test above. I changed that test to also use factory.

if err != nil {
return nil, err
}
- if root.BestFacadeVersion("MachineManager") >= 3 {
+ if root.BestFacadeVersion("MachineManager") < 4 && c.KeepInstance {
+ return nil, errors.New("this version of Juju doesn't support --keep-instance")
@howbazaar

howbazaar Aug 13, 2017

Owner

Why have this knowledge in two places? You already return an error from the client facade.

@wallyworld

wallyworld Aug 14, 2017

Owner

I want to fail early before making an api call. Also, there will be a generic "Not Implemented" return from the client facade, but that could be for other reasons as well. Here is where we have the knowledge of what's been asked for and what's supported. This pattern is used in several other CLI commands as well. See other comment below also.

- if c.Force {
- destroy = client.ForceDestroyMachines
+ var results []params.DestroyMachineResult
+ if c.KeepInstance {
@howbazaar

howbazaar Aug 13, 2017

Owner

Why encode all this here? You have already done this in the client and server code.

The whole point of encoding that in the api client is so you can just call DestroyMachinesWithParams on client for all.

@wallyworld

wallyworld Aug 14, 2017

Owner

The issue is that there are 2 facades that could be used - api/client and api/machinemanager. Only quite recent versions of Juju have the machinemanager api. So it's easier to use the knowledge that if keep is specified and we have the relevant machinemanager facade version available, call that directly. Otherwise we need to fall back to using the removeMachineAdapter which knows how to translate the differing return results from the api/client and api/machinemanager implementations.

+
+// KeepInstance reports whether a machine, when removed from
+// Juju, will cause the corresponding cloud instance to be stopped.
+func (m *Machine) KeepInstance() (bool, error) {
@howbazaar

howbazaar Aug 13, 2017

Owner

I don't see why this method should return an error?

Why does it not just return the value in the doc?

@wallyworld

wallyworld Aug 14, 2017

Owner

There isn't a value in the machine doc. It's in the machine instance doc (along with instance id, hardware info etc).

Owner

wallyworld commented Aug 14, 2017

$$merge$$

Contributor

jujubot commented Aug 14, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 1dd8823 into juju:2.2 Aug 14, 2017

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment