New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added waitgroups for autoupdate workers to complete before stopping #613
Added waitgroups for autoupdate workers to complete before stopping #613
Conversation
834bd82
to
53edde3
Compare
/assign @wking |
for i := 0; i < workers; i++ { | ||
// FIXME: actually wait until these complete if the Context is canceled. And possibly add utilruntime.HandleCrash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A runtime.HandleCrash()
is not required here because it's already part of the JitterUntil function which is further down the call stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression was that HandleCrash()
would ideally be called whenever we launch a new goroutine. Walking the stack, I don't see any goroutines being launched before our UntilWithContext
call and the underlying JitterUntil
, but I dunno if we want to rely on that current lack of intermediate goroutines as part of the library's committed API. Does it hurt to have an extra call here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a HandleCrash()
here would recover from any panics in the apimachinery code, because the user function is already covered by the inner HandleCrash()
. This is an issue because the wait.UntilWithContext
would return and the autoupdate worker would terminate. If this happens in all workers then basically autoupdate would stop working. It would instead be better if program panicked so that the pod would restart. We can expect anything in the apimachinery code to be well tested and shouldn't be guarding against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs:
HandleCrash actually crashes, after calling the handlers and logging the panic message.
So I expect we will still exit and get restarted even if we have a HandleCrash
in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then the only thing HandleCrash()
does(without any additional handlers) is to klog
the stack trace. 🤷🏼
Will add the HandleCrash()
here as well.
/retest |
53edde3
to
f15f84e
Compare
pkg/autoupdate/autoupdate.go
Outdated
defer wg.Done() | ||
defer utilruntime.HandleCrash() | ||
wait.UntilWithContext(ctx, ctrl.worker, time.Second) | ||
}() | ||
} | ||
|
||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we have a wait-group blocking on our child goroutines, we can probably drop this direct block on the passed ctx
.
The wait.UntilWithContext will repeatedly run the function passed to it till the context also passed to it is done. At which point it will return. Adding a wait group after the UntilWithContext ensures that the top level goroutine does not return till all the spawned goroutines terminate. Also added `HandleCrash()` to log any panics in inner goroutines.
f15f84e
to
0fee80b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arjunrn, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
The
wait.UntilWithContext
will repeatedly run the function passed to ittill the context also passed to it is done. At which point it will
return. Adding a wait group after the UntilWithContext ensures that
the top level goroutine does not return till all the spawned goroutines
terminate.