Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix instance mutator test #10605

Merged
merged 4 commits into from Sep 5, 2019
Merged

Conversation

howbazaar
Copy link
Contributor

Fix a few intermittent test failures in the instance mutater.

Since the goroutines that were being created were ensured to be complete before the catacomb loop function terminated, there was a race in the setting of the errors, and the mock controlled test runners.

The test would progress after all the expected calls had been called, and would kill and wait for the worker. The race is that the loop could well finish, and wait on the worker return before the inner goroutine had called catacomb.Kill with the error, giving the intermititent failure of nil.

This is the same reason one of the other tests was also intermittently failing. This has now also been updated.

I also removed the mocking of the logging as it is unnecessary code.

QA steps

Run the tests with the stress script. Without the change it would fail after several thousand attempts.
With the fix the test run successfully passed 200k runs.

Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small issue with names being confusing, but otherwise LGTM

@@ -106,13 +109,16 @@ func (m *mutater) startMachines(tags []names.MachineTag) error {
return nil
}

func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine) {
func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine, dying <-chan struct{}, done func()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given 'died' 'dying' and 'done', maybe better names might be:
'processDying', 'machineDied', 'cleanup' ?
or maybe 'stop', 'apiDied', 'cleanup'?
I'm not sure on the names, it is just a little confusing as to what is signalling this code that it should stop, vs where is this code signaling others

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'removed' is modelCache.removeMachine has been called for the watched lxd machine or container. 'died' is the machine hosting the containers is going away. With my comment at 120, dying isn't necessary.

Perhaps:
died = hostMachineDied
removed = machineDying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dying is necessary, otherwise the goroutine doesn't end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't see comment below, will use that.

@@ -174,9 +176,12 @@ type mutaterWorker struct {
}

func (w *mutaterWorker) loop() error {
var wg sync.WaitGroup
defer wg.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem with wg.Wait is that it will hang if we are supposed to be dying if our code has bugs in it. But i think you've wired through dying already, so its probably ok.

defer func() {
// We can't just send on the dead channel because the
// central loop might be trying to write to us on the
// removed channel.
for {
select {
case <-dying:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest:
case <- machine.context.dying()

The machines and the worker share a context, no need to pass it in. It's buried down in mutaterWorker newMachineContext(). Ran the race stress 500+ successfully.

@@ -106,13 +109,16 @@ func (m *mutater) startMachines(tags []names.MachineTag) error {
return nil
}

func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine) {
func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine, dying <-chan struct{}, done func()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'removed' is modelCache.removeMachine has been called for the watched lxd machine or container. 'died' is the machine hosting the containers is going away. With my comment at 120, dying isn't necessary.

Perhaps:
died = hostMachineDied
removed = machineDying

@howbazaar
Copy link
Contributor Author

$$merge$$

@jujubot jujubot merged commit 87dceb6 into juju:develop Sep 5, 2019
@howbazaar howbazaar deleted the fix-instance-mutator-test branch September 6, 2019 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants