New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix instance mutator test #10605
Fix instance mutator test #10605
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small issue with names being confusing, but otherwise LGTM
worker/instancemutater/mutater.go
Outdated
@@ -106,13 +109,16 @@ func (m *mutater) startMachines(tags []names.MachineTag) error { | |||
return nil | |||
} | |||
|
|||
func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine) { | |||
func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine, dying <-chan struct{}, done func()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given 'died' 'dying' and 'done', maybe better names might be:
'processDying', 'machineDied', 'cleanup' ?
or maybe 'stop', 'apiDied', 'cleanup'?
I'm not sure on the names, it is just a little confusing as to what is signalling this code that it should stop, vs where is this code signaling others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'removed' is modelCache.removeMachine has been called for the watched lxd machine or container. 'died' is the machine hosting the containers is going away. With my comment at 120, dying isn't necessary.
Perhaps:
died = hostMachineDied
removed = machineDying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dying is necessary, otherwise the goroutine doesn't end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't see comment below, will use that.
@@ -174,9 +176,12 @@ type mutaterWorker struct { | |||
} | |||
|
|||
func (w *mutaterWorker) loop() error { | |||
var wg sync.WaitGroup | |||
defer wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only problem with wg.Wait is that it will hang if we are supposed to be dying if our code has bugs in it. But i think you've wired through dying already, so its probably ok.
worker/instancemutater/mutater.go
Outdated
defer func() { | ||
// We can't just send on the dead channel because the | ||
// central loop might be trying to write to us on the | ||
// removed channel. | ||
for { | ||
select { | ||
case <-dying: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest:
case <- machine.context.dying()
The machines and the worker share a context, no need to pass it in. It's buried down in mutaterWorker newMachineContext(). Ran the race stress 500+ successfully.
worker/instancemutater/mutater.go
Outdated
@@ -106,13 +109,16 @@ func (m *mutater) startMachines(tags []names.MachineTag) error { | |||
return nil | |||
} | |||
|
|||
func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine) { | |||
func runMachine(machine MutaterMachine, removed <-chan struct{}, died chan<- instancemutater.MutaterMachine, dying <-chan struct{}, done func()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'removed' is modelCache.removeMachine has been called for the watched lxd machine or container. 'died' is the machine hosting the containers is going away. With my comment at 120, dying isn't necessary.
Perhaps:
died = hostMachineDied
removed = machineDying
|
Fix a few intermittent test failures in the instance mutater.
Since the goroutines that were being created were ensured to be complete before the catacomb loop function terminated, there was a race in the setting of the errors, and the mock controlled test runners.
The test would progress after all the expected calls had been called, and would kill and wait for the worker. The race is that the loop could well finish, and wait on the worker return before the inner goroutine had called catacomb.Kill with the error, giving the intermititent failure of nil.
This is the same reason one of the other tests was also intermittently failing. This has now also been updated.
I also removed the mocking of the logging as it is unnecessary code.
QA steps
Run the tests with the stress script. Without the change it would fail after several thousand attempts.
With the fix the test run successfully passed 200k runs.