Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added waitgroups for autoupdate workers to complete before stopping #613

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 9 additions & 4 deletions pkg/autoupdate/autoupdate.go
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"fmt"
"sort"
"sync"
"time"

"github.com/blang/semver/v4"
Expand Down Expand Up @@ -92,13 +93,17 @@ func (ctrl *Controller) Run(ctx context.Context, workers int) error {
if !cache.WaitForCacheSync(ctx.Done(), ctrl.cacheSynced...) {
return fmt.Errorf("caches never synchronized: %w", ctx.Err())
}

var wg sync.WaitGroup
wg.Add(workers)
for i := 0; i < workers; i++ {
// FIXME: actually wait until these complete if the Context is canceled. And possibly add utilruntime.HandleCrash.
Copy link
Contributor Author

@arjunrn arjunrn Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A runtime.HandleCrash() is not required here because it's already part of the JitterUntil function which is further down the call stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression was that HandleCrash() would ideally be called whenever we launch a new goroutine. Walking the stack, I don't see any goroutines being launched before our UntilWithContext call and the underlying JitterUntil, but I dunno if we want to rely on that current lack of intermediate goroutines as part of the library's committed API. Does it hurt to have an extra call here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a HandleCrash() here would recover from any panics in the apimachinery code, because the user function is already covered by the inner HandleCrash(). This is an issue because the wait.UntilWithContext would return and the autoupdate worker would terminate. If this happens in all workers then basically autoupdate would stop working. It would instead be better if program panicked so that the pod would restart. We can expect anything in the apimachinery code to be well tested and shouldn't be guarding against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs:

HandleCrash actually crashes, after calling the handlers and logging the panic message.

So I expect we will still exit and get restarted even if we have a HandleCrash in here.

Copy link
Contributor Author

@arjunrn arjunrn Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then the only thing HandleCrash() does(without any additional handlers) is to klog the stack trace. 🤷🏼
Will add the HandleCrash() here as well.

go wait.UntilWithContext(ctx, ctrl.worker, time.Second)
go func() {
defer wg.Done()
defer utilruntime.HandleCrash()
wait.UntilWithContext(ctx, ctrl.worker, time.Second)
}()
}

<-ctx.Done()
wg.Wait()
return nil
}

Expand Down