New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean shutdown of kcm, ccm and scheduler #110207
Clean shutdown of kcm, ccm and scheduler #110207
Conversation
@wojtek-t: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
424b0a8
to
9e78d54
Compare
4c483eb
to
c09bf96
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@@ -227,13 +232,14 @@ func Run(c *config.CompletedConfig, stopCh <-chan struct{}) error { | |||
controllerContext.ObjectOrMetadataInformerFactory.Start(stopCh) | |||
close(controllerContext.InformersStarted) | |||
|
|||
select {} | |||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated, but doesn't feel that this run
closure has a mix of contexts and stopCh that is really confusing?
mainly because IIUIC this ctx.Done() is derived from the stopCh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to translate between function that expect stopCh and those that expect ctx (and ctx.Done()).
I think that's fine
select {} | ||
<-stopCh | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment in line 178 says
// Run runs the KubeControllerManagerOptions. This should never exit.
is ok to exit now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comment.
If one want it to never exit, they can pass wait.NeverStop as an argument (which we do in line 138).
In tests, we want it to stop actually.
eventBroadcaster.StartStructuredLogging(0) | ||
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you need these 2 lines too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need them - but they were moved to Run() - see lines 185-189 in controllermanager.go above.
if errCh != nil { | ||
err, ok := <-errCh | ||
if ok && err != nil { | ||
klog.ErrorS(err, "Failed to run test server clearly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/run/shutdown/ ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// We need to start scheduleOne loop in a dedicated goroutine, | ||
// because scheduleOne function hangs on getting the next item | ||
// from the SchedulingQueue. | ||
// If there are no new pods to schedule, it will be hanging there | ||
// and if done in this goroutine it will be blocking closing | ||
// SchedulingQueue, in effect causing a deadlock on shutdown. | ||
go wait.UntilWithContext(ctx, sched.scheduleOne, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @alculquicondor
run(context.TODO(), controllerInitializers) | ||
panic("unreachable") | ||
ctx, _ := wait.ContextForChannel(stopCh) | ||
run(ctx, controllerInitializers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the run
seems that will be running forever, startControllers
doesn't leverage the context neither the stopCh ... that is the other funny part, we duplicate stop signals 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They actually do utilize it:
- we're passing ctx.Done() to startControllers
- they are using it (as stopCh) in two places
But thanks for catching the select{} - fixed that now.
eventBroadcaster.StartStructuredLogging(0) | ||
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do - we we do that now as part of Run() - see lines 146-150 in controllermanager.go above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea - comments addressed - PTAL
@@ -227,13 +232,14 @@ func Run(c *config.CompletedConfig, stopCh <-chan struct{}) error { | |||
controllerContext.ObjectOrMetadataInformerFactory.Start(stopCh) | |||
close(controllerContext.InformersStarted) | |||
|
|||
select {} | |||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to translate between function that expect stopCh and those that expect ctx (and ctx.Done()).
I think that's fine
select {} | ||
<-stopCh | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comment.
If one want it to never exit, they can pass wait.NeverStop as an argument (which we do in line 138).
In tests, we want it to stop actually.
eventBroadcaster.StartStructuredLogging(0) | ||
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need them - but they were moved to Run() - see lines 185-189 in controllermanager.go above.
if errCh != nil { | ||
err, ok := <-errCh | ||
if ok && err != nil { | ||
klog.ErrorS(err, "Failed to run test server clearly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
run(context.TODO(), controllerInitializers) | ||
panic("unreachable") | ||
ctx, _ := wait.ContextForChannel(stopCh) | ||
run(ctx, controllerInitializers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They actually do utilize it:
- we're passing ctx.Done() to startControllers
- they are using it (as stopCh) in two places
But thanks for catching the select{} - fixed that now.
eventBroadcaster.StartStructuredLogging(0) | ||
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do - we we do that now as part of Run() - see lines 146-150 in controllermanager.go above
c09bf96
to
4e8ffc6
Compare
4e8ffc6
to
fe3616c
Compare
/test pull-kubernetes-verify-govet-levee pod scheduling timeout |
/lgtm |
/triage accepted |
Ref #108483
/kind cleanup
/priority important-longterm
/sig apps
/sig scheduler