Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-3226] Fix destroy-model/destroy-controller to handle --force better. #15328

Merged
merged 3 commits into from
Jun 19, 2023

Conversation

hpidcock
Copy link
Member

--force and --timeout/--model-timeout now properly control the destruction of models so that --force with --timeout properly progresses despite the status of the model/environment.

--force without --timeout now only propagates force to the cleanup of model entities, rather than the model removal itself.

Since model destruction cannot be stopped once it has started, --timeout no longer makes sense to be passed for non-forceful model destruction.

Graceful model destroys now wait indefinitely
for the entities to cleanup and the cloud resources to cleanup before removing the model from state. Due to this change it is now required to be able to change the model destruction parameters while a model destroy is being processed by the undertaker. The undertaker will now restart itself when ForceDestroy or Timeout changes.

QA steps

General bootstrap and model-destroy/controller-destroy, with and without --force/--timeout.

Specifically test the following:

  • lxc config set core.trust_password <mypassword>
  • lxc config set core.https_address '[::]'
  • lxc remote add lxd2 <myip> --password <mypassword>
  • juju bootstrap localhost
  • juju add-cloud lxd2 on controller
  • juju autoload-credentials for lxd2 cloud on controller
  • juju add-model a lxd2
  • juju deploy ubuntu
  • wait for clean deploy
  • lxc config trust list
  • lxc config trust remove <lxd2 trust>
  • juju destroy-model a
  • watch that it never finishes
  • new terminal window
  • juju destroy-model a --force
  • watch that it also never finishes
  • juju destroy-model a --force --timeout 1m
  • watch that it destroys in around 2m and all the other terminals exit gracefully

Documentation changes

Command documentation and possibly something else.
Upgrading a controller with broken models (environ with broken/no credentials) will require some hacky steps due to controller upgrade prechecks requiring functioning environ.

Bug reference

https://bugs.launchpad.net/juju/+bug/2009648

@hpidcock hpidcock added the 2.9 label Mar 23, 2023
@hpidcock hpidcock changed the title Fix destroy-model/destroy-controller to handle --force better. [JUJU-3225] Fix destroy-model/destroy-controller to handle --force better. Mar 23, 2023
Copy link
Member

@wallyworld wallyworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial feedback - I am concerned about the shift away from CloudDestroyer and think it will break on caas.

apiserver/facades/controller/undertaker/register.go Outdated Show resolved Hide resolved
cmd/juju/controller/destroy.go Show resolved Hide resolved
cmd/juju/model/destroy.go Outdated Show resolved Hide resolved
cmd/juju/model/destroy.go Show resolved Hide resolved
@@ -472,15 +482,15 @@ func CAASManifolds(config ManifoldsConfig) dependency.Manifolds {
modelTag := agentConfig.Model()
manifolds := dependency.Manifolds{
// The undertaker is currently the only ifNotAlive worker.
undertakerName: ifNotUpgrading(ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{
undertakerName: ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want the undertaker to run if the controllers are upgrading

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the model upgrading not the controller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but we still don't want it running

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model upgrading is for the environ only, we don't run the environ upgrader when the model is dying. Otherwise we could never destroy the model if the environ was dying and not upgradable.

version/version.go Outdated Show resolved Hide resolved
worker/undertaker/manifold.go Outdated Show resolved Hide resolved
@hpidcock hpidcock force-pushed the fix-undertaker branch 2 times, most recently from d872120 to 671ca75 Compare March 24, 2023 07:42
Copy link
Member

@wallyworld wallyworld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing. I'd like to request a change to be very strict with what param combinations are accepted, ie --force with no timeout does a clean destroy should not be allowed - error in facade and caught in the CLI

@@ -472,15 +482,15 @@ func CAASManifolds(config ManifoldsConfig) dependency.Manifolds {
modelTag := agentConfig.Model()
manifolds := dependency.Manifolds{
// The undertaker is currently the only ifNotAlive worker.
undertakerName: ifNotUpgrading(ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{
undertakerName: ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right but we still don't want it running

worker/undertaker/undertaker.go Outdated Show resolved Hide resolved
// Even if ForceDestroyed is true, if we don't have a timeout, we treat them the same
// as a non-force destroyed model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we catch this and raise an error. Allowing client code / CLI to submit what they believe is --force but then does a clean destroy is rather disingenuous. We should be very strict about ensuring only valid param combinations are passed given the number of permutations and real potential for confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--force without --timeout is a perfectly valid default, it does still force, but it waits for cleanup ops/tear down dance to finish first.

The undertaker is just there to ensure the cloud resources are indeed destroyed + the model is removed from state.

"fmt"
"time"

"github.com/juju/clock"
"github.com/juju/errors"
"github.com/juju/worker/v3/catacomb"
"gopkg.in/retry.v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we're trying to move away from this and juju/retry is the preferred lib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to loop back and use juju/retry. I feel like "gopkg.in/retry.v1" works better here, but will propose a seperate PR to drop it.

--force and --timeout/--model-timeout now properly control
the destruction of models so that --force with --timeout
properly progresses despite the status of the model/environment.

--force without --timeout now only propagates force to the
cleanup of model entities, rather than the model removal
itself.

Since model destruction cannot be stopped once it has started,
--timeout no longer makes sense to be passed for non-forceful
model destruction.

Graceful model destroys now wait indefinitely
for the entities to cleanup and the cloud resources to cleanup
before removing the model from state. Due to this change it
is now required to be able to change the model destruction
parameters while a model destroy is being processed by the
undertaker. The undertaker will now restart itself when
ForceDestroy or Timeout changes.
@hpidcock hpidcock changed the title [JUJU-3225] Fix destroy-model/destroy-controller to handle --force better. [JUJU-3226] Fix destroy-model/destroy-controller to handle --force better. Jun 19, 2023
@hpidcock
Copy link
Member Author

/build

@hpidcock
Copy link
Member Author

/merge

2 similar comments
@hpidcock
Copy link
Member Author

/merge

@hpidcock
Copy link
Member Author

/merge

@jujubot jujubot merged commit c6783de into juju:2.9 Jun 19, 2023
17 of 19 checks passed
@hpidcock hpidcock mentioned this pull request Jun 29, 2023
jujubot added a commit that referenced this pull request Jun 29, 2023
#15831

Forward ports:
- #15731
- #15755
- #15770
- #15328
- #15762
- #15783
- #15797
- #15827
- #15828

Conflicts:
- api/client/modelmanager/modelmanager.go
- api/client/modelmanager/modelmanager_test.go
- apiserver/facades/client/modelupgrader/upgrader.go
- apiserver/facades/client/modelupgrader/upgrader_test.go
- apiserver/facades/controller/undertaker/register.go
- apiserver/facades/controller/undertaker/undertaker.go
- apiserver/facades/controller/undertaker/undertaker_test.go
- cmd/juju/controller/destroy.go
- cmd/juju/controller/destroy_test.go
- cmd/juju/model/destroy.go
- cmd/juju/model/destroy_test.go
- tests/includes/juju.sh
@hpidcock hpidcock mentioned this pull request Jun 30, 2023
jujubot added a commit that referenced this pull request Jun 30, 2023
#15834

Forward ports:
- #15731
- #15755
- #15770
- #15328
- #15762
- #15783
- #15797
- #15827
- #15828
- #15831

Conflicts:
- api/client/modelmanager/modelmanager_test.go
- provider/openstack/firewaller.go
@hpidcock hpidcock mentioned this pull request Jun 30, 2023
jujubot added a commit that referenced this pull request Jun 30, 2023
#15835

Forward ports:
- #15731
- #15755
- #15770
- #15328
- #15762
- #15783
- #15797
- #15793
- #15815
- #15816
- #15827
- #15828
- #15398
- #15823
- #15831
- #15834

Conflicts:
- cmd/juju/controller/destroy.go
- cmd/juju/model/destroy.go
- state/applicationoffers.go
@ycliuhw ycliuhw mentioned this pull request Jul 3, 2023
jujubot added a commit that referenced this pull request Jul 3, 2023
#15847

Merge 3.3 -> main:
- #15731
- #15755
- #15770
- #15328
- #15762
- #15783
- #15797
- #15793
- #15815
- #15816
- #15766
- #15821
- #15827
- #15828
- #15398
- #15823
- #15831
- #15834
- #15835
- #15818
- #15837
- #15839
- #15842
- #15830
- #15844
- #15846

Conflicts:
- cmd/juju/application/deploy_test.go
- cmd/juju/status/status_internal_test.go
jujubot added a commit that referenced this pull request Jul 6, 2023
#15864

#15328 made a miss-step in making the upgrader only run when the model is alive, since many things hang off the upgraded flag, including the cleaner worker, which needs to run during model destruction.

## QA steps

`./main.sh -v -s '"test_block_commands,test_display_clouds,test_model_config,test_model_defaults,test_unregister"' cli test_local_charms`

## Documentation changes

N/A

## Bug reference

https://jenkins.juju.canonical.com/job/test-cli-test-local-charms-lxd/1336/consoleText
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants