-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JUJU-3226] Fix destroy-model/destroy-controller to handle --force better. #15328
Conversation
bb6da29
to
30a1d78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial feedback - I am concerned about the shift away from CloudDestroyer and think it will break on caas.
@@ -472,15 +482,15 @@ func CAASManifolds(config ManifoldsConfig) dependency.Manifolds { | |||
modelTag := agentConfig.Model() | |||
manifolds := dependency.Manifolds{ | |||
// The undertaker is currently the only ifNotAlive worker. | |||
undertakerName: ifNotUpgrading(ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{ | |||
undertakerName: ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want the undertaker to run if the controllers are upgrading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the model upgrading not the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right but we still don't want it running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model upgrading is for the environ only, we don't run the environ upgrader when the model is dying. Otherwise we could never destroy the model if the environ was dying and not upgradable.
d872120
to
671ca75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing. I'd like to request a change to be very strict with what param combinations are accepted, ie --force with no timeout does a clean destroy should not be allowed - error in facade and caught in the CLI
@@ -472,15 +482,15 @@ func CAASManifolds(config ManifoldsConfig) dependency.Manifolds { | |||
modelTag := agentConfig.Model() | |||
manifolds := dependency.Manifolds{ | |||
// The undertaker is currently the only ifNotAlive worker. | |||
undertakerName: ifNotUpgrading(ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{ | |||
undertakerName: ifNotAlive(undertaker.Manifold(undertaker.ManifoldConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right but we still don't want it running
// Even if ForceDestroyed is true, if we don't have a timeout, we treat them the same | ||
// as a non-force destroyed model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we catch this and raise an error. Allowing client code / CLI to submit what they believe is --force but then does a clean destroy is rather disingenuous. We should be very strict about ensuring only valid param combinations are passed given the number of permutations and real potential for confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--force without --timeout is a perfectly valid default, it does still force, but it waits for cleanup ops/tear down dance to finish first.
The undertaker is just there to ensure the cloud resources are indeed destroyed + the model is removed from state.
"fmt" | ||
"time" | ||
|
||
"github.com/juju/clock" | ||
"github.com/juju/errors" | ||
"github.com/juju/worker/v3/catacomb" | ||
"gopkg.in/retry.v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW we're trying to move away from this and juju/retry
is the preferred lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to loop back and use juju/retry. I feel like "gopkg.in/retry.v1" works better here, but will propose a seperate PR to drop it.
671ca75
to
3a8f071
Compare
3a8f071
to
6b6db0f
Compare
6b6db0f
to
77e03c1
Compare
4d8add3
to
1899c46
Compare
--force and --timeout/--model-timeout now properly control the destruction of models so that --force with --timeout properly progresses despite the status of the model/environment. --force without --timeout now only propagates force to the cleanup of model entities, rather than the model removal itself. Since model destruction cannot be stopped once it has started, --timeout no longer makes sense to be passed for non-forceful model destruction. Graceful model destroys now wait indefinitely for the entities to cleanup and the cloud resources to cleanup before removing the model from state. Due to this change it is now required to be able to change the model destruction parameters while a model destroy is being processed by the undertaker. The undertaker will now restart itself when ForceDestroy or Timeout changes.
/build |
/merge |
2 similar comments
/merge |
/merge |
#15831 Forward ports: - #15731 - #15755 - #15770 - #15328 - #15762 - #15783 - #15797 - #15827 - #15828 Conflicts: - api/client/modelmanager/modelmanager.go - api/client/modelmanager/modelmanager_test.go - apiserver/facades/client/modelupgrader/upgrader.go - apiserver/facades/client/modelupgrader/upgrader_test.go - apiserver/facades/controller/undertaker/register.go - apiserver/facades/controller/undertaker/undertaker.go - apiserver/facades/controller/undertaker/undertaker_test.go - cmd/juju/controller/destroy.go - cmd/juju/controller/destroy_test.go - cmd/juju/model/destroy.go - cmd/juju/model/destroy_test.go - tests/includes/juju.sh
#15847 Merge 3.3 -> main: - #15731 - #15755 - #15770 - #15328 - #15762 - #15783 - #15797 - #15793 - #15815 - #15816 - #15766 - #15821 - #15827 - #15828 - #15398 - #15823 - #15831 - #15834 - #15835 - #15818 - #15837 - #15839 - #15842 - #15830 - #15844 - #15846 Conflicts: - cmd/juju/application/deploy_test.go - cmd/juju/status/status_internal_test.go
#15864 #15328 made a miss-step in making the upgrader only run when the model is alive, since many things hang off the upgraded flag, including the cleaner worker, which needs to run during model destruction. ## QA steps `./main.sh -v -s '"test_block_commands,test_display_clouds,test_model_config,test_model_defaults,test_unregister"' cli test_local_charms` ## Documentation changes N/A ## Bug reference https://jenkins.juju.canonical.com/job/test-cli-test-local-charms-lxd/1336/consoleText
--force and --timeout/--model-timeout now properly control the destruction of models so that --force with --timeout properly progresses despite the status of the model/environment.
--force without --timeout now only propagates force to the cleanup of model entities, rather than the model removal itself.
Since model destruction cannot be stopped once it has started, --timeout no longer makes sense to be passed for non-forceful model destruction.
Graceful model destroys now wait indefinitely
for the entities to cleanup and the cloud resources to cleanup before removing the model from state. Due to this change it is now required to be able to change the model destruction parameters while a model destroy is being processed by the undertaker. The undertaker will now restart itself when ForceDestroy or Timeout changes.
QA steps
General bootstrap and model-destroy/controller-destroy, with and without --force/--timeout.
Specifically test the following:
lxc config set core.trust_password <mypassword>
lxc config set core.https_address '[::]'
lxc remote add lxd2 <myip> --password <mypassword>
juju bootstrap localhost
juju add-cloud lxd2
on controllerjuju autoload-credentials
for lxd2 cloud on controllerjuju add-model a lxd2
juju deploy ubuntu
lxc config trust list
lxc config trust remove <lxd2 trust>
juju destroy-model a
juju destroy-model a --force
juju destroy-model a --force --timeout 1m
Documentation changes
Command documentation and possibly something else.
Upgrading a controller with broken models (environ with broken/no credentials) will require some hacky steps due to controller upgrade prechecks requiring functioning environ.
Bug reference
https://bugs.launchpad.net/juju/+bug/2009648