Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added waitgroups for autoupdate workers to complete before stopping #613

Merged

Conversation

arjunrn
Copy link
Contributor

@arjunrn arjunrn commented Jun 18, 2021

The wait.UntilWithContext will repeatedly run the function passed to it
till the context also passed to it is done. At which point it will
return. Adding a wait group after the UntilWithContext ensures that
the top level goroutine does not return till all the spawned goroutines
terminate.

@openshift-ci openshift-ci bot requested review from jottofar and sdodson June 18, 2021 07:03
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2021
@arjunrn arjunrn changed the title [WIP] Added waitgroups for autoupdate workers to complete before stopping Added waitgroups for autoupdate workers to complete before stopping Jun 18, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2021
@arjunrn
Copy link
Contributor Author

arjunrn commented Jun 18, 2021

/assign @wking

for i := 0; i < workers; i++ {
// FIXME: actually wait until these complete if the Context is canceled. And possibly add utilruntime.HandleCrash.
Copy link
Contributor Author

@arjunrn arjunrn Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A runtime.HandleCrash() is not required here because it's already part of the JitterUntil function which is further down the call stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression was that HandleCrash() would ideally be called whenever we launch a new goroutine. Walking the stack, I don't see any goroutines being launched before our UntilWithContext call and the underlying JitterUntil, but I dunno if we want to rely on that current lack of intermediate goroutines as part of the library's committed API. Does it hurt to have an extra call here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a HandleCrash() here would recover from any panics in the apimachinery code, because the user function is already covered by the inner HandleCrash(). This is an issue because the wait.UntilWithContext would return and the autoupdate worker would terminate. If this happens in all workers then basically autoupdate would stop working. It would instead be better if program panicked so that the pod would restart. We can expect anything in the apimachinery code to be well tested and shouldn't be guarding against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs:

HandleCrash actually crashes, after calling the handlers and logging the panic message.

So I expect we will still exit and get restarted even if we have a HandleCrash in here.

Copy link
Contributor Author

@arjunrn arjunrn Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then the only thing HandleCrash() does(without any additional handlers) is to klog the stack trace. 🤷🏼
Will add the HandleCrash() here as well.

@arjunrn
Copy link
Contributor Author

arjunrn commented Jun 29, 2021

/retest

defer wg.Done()
defer utilruntime.HandleCrash()
wait.UntilWithContext(ctx, ctrl.worker, time.Second)
}()
}

<-ctx.Done()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have a wait-group blocking on our child goroutines, we can probably drop this direct block on the passed ctx.

The wait.UntilWithContext will repeatedly run the function passed to it
till the context also passed to it is done. At which point it will
return. Adding a wait group after the UntilWithContext ensures that
the top level goroutine does not return till all the spawned goroutines
terminate. Also added `HandleCrash()` to log any panics in inner
goroutines.
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arjunrn, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2021
@wking
Copy link
Member

wking commented Jul 12, 2021

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit b54ea51 into openshift:master Jul 13, 2021
@arjunrn arjunrn deleted the wait-autoupdate-complete branch July 13, 2021 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants