Added waitgroups for autoupdate workers to complete before stopping #613

arjunrn · 2021-06-18T07:03:28Z

The wait.UntilWithContext will repeatedly run the function passed to it
till the context also passed to it is done. At which point it will
return. Adding a wait group after the UntilWithContext ensures that
the top level goroutine does not return till all the spawned goroutines
terminate.

arjunrn · 2021-06-18T10:50:41Z

/assign @wking

arjunrn · 2021-06-18T11:07:46Z

pkg/autoupdate/autoupdate.go

 	for i := 0; i < workers; i++ {
-		// FIXME: actually wait until these complete if the Context is canceled.  And possibly add utilruntime.HandleCrash.


A runtime.HandleCrash() is not required here because it's already part of the JitterUntil function which is further down the call stack.

My impression was that HandleCrash() would ideally be called whenever we launch a new goroutine. Walking the stack, I don't see any goroutines being launched before our UntilWithContext call and the underlying JitterUntil, but I dunno if we want to rely on that current lack of intermediate goroutines as part of the library's committed API. Does it hurt to have an extra call here?

Adding a HandleCrash() here would recover from any panics in the apimachinery code, because the user function is already covered by the inner HandleCrash(). This is an issue because the wait.UntilWithContext would return and the autoupdate worker would terminate. If this happens in all workers then basically autoupdate would stop working. It would instead be better if program panicked so that the pod would restart. We can expect anything in the apimachinery code to be well tested and shouldn't be guarding against it.

From the docs:

HandleCrash actually crashes, after calling the handlers and logging the panic message.

So I expect we will still exit and get restarted even if we have a HandleCrash in here.

So then the only thing HandleCrash() does(without any additional handlers) is to klog the stack trace. 🤷🏼
Will add the HandleCrash() here as well.

arjunrn · 2021-06-29T13:12:03Z

/retest

wking · 2021-07-08T04:37:13Z

pkg/autoupdate/autoupdate.go

+			defer wg.Done()
+			defer utilruntime.HandleCrash()
+			wait.UntilWithContext(ctx, ctrl.worker, time.Second)
+		}()
 	}

 	<-ctx.Done()


now that we have a wait-group blocking on our child goroutines, we can probably drop this direct block on the passed ctx.

The wait.UntilWithContext will repeatedly run the function passed to it till the context also passed to it is done. At which point it will return. Adding a wait group after the UntilWithContext ensures that the top level goroutine does not return till all the spawned goroutines terminate. Also added `HandleCrash()` to log any panics in inner goroutines.

wking

/lgtm

openshift-ci · 2021-07-12T16:32:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arjunrn, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking · 2021-07-12T16:32:37Z

/retest

openshift-bot · 2021-07-12T22:39:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2021-07-12T23:39:01Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci bot requested review from jottofar and sdodson June 18, 2021 07:03

arjunrn force-pushed the wait-autoupdate-complete branch from 834bd82 to 53edde3 Compare June 18, 2021 10:49

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2021

arjunrn changed the title ~~[WIP] Added waitgroups for autoupdate workers to complete before stopping~~ Added waitgroups for autoupdate workers to complete before stopping Jun 18, 2021

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2021

openshift-ci bot assigned wking Jun 18, 2021

arjunrn commented Jun 18, 2021

View reviewed changes

arjunrn force-pushed the wait-autoupdate-complete branch from 53edde3 to f15f84e Compare June 30, 2021 08:31

wking reviewed Jul 8, 2021

View reviewed changes

arjunrn force-pushed the wait-autoupdate-complete branch from f15f84e to 0fee80b Compare July 8, 2021 08:32

wking approved these changes Jul 12, 2021

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2021

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2021

openshift-merge-robot merged commit b54ea51 into openshift:master Jul 13, 2021

arjunrn deleted the wait-autoupdate-complete branch July 13, 2021 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added waitgroups for autoupdate workers to complete before stopping #613

Added waitgroups for autoupdate workers to complete before stopping #613

arjunrn commented Jun 18, 2021 •

edited

arjunrn commented Jun 18, 2021

arjunrn Jun 18, 2021 •

edited

wking Jun 18, 2021

arjunrn Jun 21, 2021

wking Jun 29, 2021

arjunrn Jun 29, 2021 •

edited

arjunrn commented Jun 29, 2021

wking Jul 8, 2021

wking left a comment

openshift-ci bot commented Jul 12, 2021

wking commented Jul 12, 2021

openshift-bot commented Jul 12, 2021

openshift-bot commented Jul 12, 2021

		for i := 0; i < workers; i++ {
		// FIXME: actually wait until these complete if the Context is canceled. And possibly add utilruntime.HandleCrash.

Added waitgroups for autoupdate workers to complete before stopping #613

Added waitgroups for autoupdate workers to complete before stopping #613

Conversation

arjunrn commented Jun 18, 2021 • edited

arjunrn commented Jun 18, 2021

arjunrn Jun 18, 2021 • edited

Choose a reason for hiding this comment

wking Jun 18, 2021

Choose a reason for hiding this comment

arjunrn Jun 21, 2021

Choose a reason for hiding this comment

wking Jun 29, 2021

Choose a reason for hiding this comment

arjunrn Jun 29, 2021 • edited

Choose a reason for hiding this comment

arjunrn commented Jun 29, 2021

wking Jul 8, 2021

Choose a reason for hiding this comment

wking left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jul 12, 2021

wking commented Jul 12, 2021

openshift-bot commented Jul 12, 2021

openshift-bot commented Jul 12, 2021

arjunrn commented Jun 18, 2021 •

edited

arjunrn Jun 18, 2021 •

edited

arjunrn Jun 29, 2021 •

edited