Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy machine when last principal unit removed. #52

Merged
merged 2 commits into from Jun 27, 2014

Conversation

cmars
Copy link
Contributor

@cmars cmars commented Jun 10, 2014

This change causes a machine to be automatically removed when the last principal unit running on that machine is removed (remove-unit). If the machine is a parent to any containers, it is left alone.

Background:
https://bugs.launchpad.net/juju-core/+bug/1206532
https://bugs.launchpad.net/juju-core/+bug/1183309/comments/8

This could use more test coverage, but I'd like to get a general feel for how it's looking so far before I go much further with it. This is my first time mucking around with state transactions. Please let me know if I've got it right!

I haven't added an option to keep empty machines/containers around after removing the last unit. I'd like to see if we can avoid it. If we do need the option, I think it should be opt in. Let me know.

func (m *Machine) ForceDestroy() error {
// forceDestroyOps returns the transaction operations necessary to completely
// remove the machine along with its units and containers.
func (m *Machine) forceDestroyOps() ([]txn.Op, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to use "forceDestroyOps" in something that should be a clean teardown. Maybe call this "destroyOps", and have a different one for forceDestroyOps which includes the regular cleanup?
I guess my concern is that we should only be tearing down the machine automatically when it is already empty (but dirty), so we shouldn't need to run all the extra "make sure everything really is gone". Because if everything isn't gone, we shouldn't be tearing down the machine.

@jameinel
Copy link
Member

So NOT LGTM as is, mostly because it has 0 tests. The logic seems like it would do what we want, though my personal feeling is that it is overly forceful.
The one reason I can think of to do a more forceful destruction is to handle subordinates that should be cleaned up when the final primary is removed.
But definitely we need to be testing the overall behavior (create a machine, add a unit into it, destroy the unit, notice that the machine is destroyed as well). And then testing the edge cases "create a machine, add 2 units, destroy one and the machine stays around, destroy the second and it goes away", "create a machine, put a container on it, put a unit in the container, destroy the last unit in the container, watch the container go away, but not the machine", etc. I can come up with more if you need thoughts about what the edge cases should be.
I'd also like William to chime in about the model of this change.

@cmars
Copy link
Contributor Author

cmars commented Jun 10, 2014

I've simplified this quite a bit by deferring Destroy() on the unit's assigned machine in Unit.Destroy -- if the machine is eligible for destruction, for which I've refactored logic out of Machine.Destroy(), into Machine.CheckDestroy().

PTAL, thanks!

// machine is not voting)
return fmt.Errorf("machine %s is required by the environment", m.doc.Id)
}
if m.doc.HasVote {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grr, we shouldn't have both these checks. not actionable in this PR though.

@fwereade
Copy link
Contributor

NOT LGTM yet, we need to be obsessively careful about state changes. It might be helpful for you to refresh your memory of the "lifecycles", "death-and-destruction", and "hacking-state" documentation (in the doc/ dir); but you should also, definitely, come talk to me about anything that's not crystal clear.

ops = append(ops, svcOp)

if removeMachine {
ops = append(ops, txn.Op{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not keen on having two ops referring to the same document. Might be able to expand on this comment usefully when I've seen more.

@fwereade
Copy link
Contributor

This is converging fast, but needs a couple of code tweaks (and much heavier testing than is currently in place).

@cmars
Copy link
Contributor Author

cmars commented Jun 12, 2014

Still wip, need to add test cases with transaction hooks

@cmars
Copy link
Contributor Author

cmars commented Jun 13, 2014

PTAL

Update: bson.D{{"$set", bson.D{{"life", Dying}}}},
})
} else {
assert = keepAssert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we'd want to $or together the various keepAssert possibilities, wouldn't we? ie if one of them no longer applied, but another did, we should be able to proceed as we did before.

@cmars
Copy link
Contributor Author

cmars commented Jun 17, 2014

PTAL

ops = append(ops, txn.Op{
C: u.st.containerRefs.Name,
Id: m.doc.Id,
Assert: bson.D{{"children", containers}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be checking for not-empty, rather than exact-match, here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@cmars
Copy link
Contributor Author

cmars commented Jun 20, 2014

Didn't see replies to earlier comments until now.

var keepAsserts []bson.D
m, err := u.st.Machine(u.doc.MachineId)
if err != nil {
if errors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly an indication that something is up -- the machine doc doesn't match the unit. I think this should return txn.ErrTransientFailure -- it should rebuild the ops and try again in the hope of seeing consistent db state.

I'd ask for an explicit test for this situation, but I'm not sure we can construct one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but it caused a failure in CleanupSuite.TestCleanupForceDestroyedMachineWithContainer, cleanup_test.go:232.

@cmars
Copy link
Contributor Author

cmars commented Jun 25, 2014

PTAL

result = append(result, tc)
}
{
tc := destroyMachineTestCase{desc: "principal has subordinate", destroyed: true}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we should probably drop this test case. One reason is that unit destruction should take out the subordinates anyway...

@fwereade
Copy link
Contributor

LGTM, thank you for your patience :).

@cmars
Copy link
Contributor Author

cmars commented Jun 27, 2014

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Jun 27, 2014

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot
Copy link
Collaborator

jujubot commented Jun 27, 2014

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/279

Casey Marshall added 2 commits June 27, 2014 15:16
Either way, assert that the state assumptions for being eligible or
not-eligible hold true for the transaction.

Improved state assertion in removeUnitOps

More test cases, renamed destroyHostOps

destroyHostOps fix, allow machine removal on retry

If a unit's host cannot be removed, make that condition the only
assertion on the transaction, so that we can re-evaluate the state
completely on retry. Examples:
- host container removed after check, before txn exec
- host gives up voting rights after check, before txn exec

Unit tests added for the above, and the converse (host qualifies on
check, disqualified before txn exec).

Fix import

Add thrashing state testcase.

Consolidate destroyHostOps machine ops.

$or together all the "keep host alive" assertions, so that a transaction
can succeed if the individual "keep" criteria change in a transaction
but the outcome remains the same.

Improve destroyHostOps test cases.

Test coverage for removing an unassigned unit. Use assertLife helper
function. More descriptive test names. More assertion checks.

Improve non-empty containers assertion.

MongoDB makes it kind of weird.
https://stackoverflow.com/questions/14789684/find-mongodb-records-where-array-field-is-not-empty-using-mongoose

Clean up tests, log removal of unassigned unit.

Removed test cruft, .Refresh() from txn hooks.

Colocated unit set up outside of the transaction hook in
TestRemoveUnitMachineRetryOrCond.

Remove TestDestroyCleanRelations.

Check unit destroyed in TestRemoveUnitMachineRetryOrCond

Consolidate & improve destroyHostOps assert conds.

Refactor out unit test suite method setMachineVote.  More comments.

Remove subordinate unit test case.
Fixed service removal test by calling State.Cleanup.
@cmars
Copy link
Contributor Author

cmars commented Jun 27, 2014

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Jun 27, 2014

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot
Copy link
Collaborator

jujubot commented Jun 27, 2014

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/281

@cmars
Copy link
Contributor Author

cmars commented Jun 27, 2014

Rolling again...

@cmars
Copy link
Contributor Author

cmars commented Jun 27, 2014

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Jun 27, 2014

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot
Copy link
Collaborator

jujubot commented Jun 27, 2014

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/282

@cmars
Copy link
Contributor Author

cmars commented Jun 27, 2014

@cmars
Copy link
Contributor Author

cmars commented Jun 27, 2014

$$merge$$

@jujubot
Copy link
Collaborator

jujubot commented Jun 27, 2014

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

jujubot added a commit that referenced this pull request Jun 27, 2014
Destroy machine when last principal unit removed.

This change causes a machine to be automatically removed when the last principal unit running on that machine is removed (remove-unit). If the machine is a parent to any containers, it is left alone.

Background:
https://bugs.launchpad.net/juju-core/+bug/1206532
https://bugs.launchpad.net/juju-core/+bug/1183309/comments/8

This could use more test coverage, but I'd like to get a general feel for how it's looking so far before I go much further with it. This is my first time mucking around with state transactions. Please let me know if I've got it right!

I haven't added an option to keep empty machines/containers around after removing the last unit. I'd like to see if we can avoid it. If we do need the option, I think it should be opt in. Let me know.
@jujubot jujubot merged commit 6bc78e7 into juju:master Jun 27, 2014
ericsnowcurrently pushed a commit to ericsnowcurrently/juju that referenced this pull request May 26, 2015
hmlanigan pushed a commit to hmlanigan/juju that referenced this pull request Aug 26, 2019
Make model optional in blockdevice schema

MAAS2 sometimes sends null for the "model" field in blockdevice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants