-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health checks for controller managers. #104667
Health checks for controller managers. #104667
Conversation
// HealthCheckable defines a controller that allows the controller manager | ||
// to include it in the health checks. | ||
// | ||
// If a controller implements HealthCheckable, and the returned HealthChecker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor: These details seem to belong with the godoc for the below HealthChecker()
func declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is both. At the type level, the point is that the controller still has a chance to disable health check even if it implements this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that to HealthChecker()
too
if ctrl != nil { | ||
if healthCheckable, ok := ctrl.(controller.HealthCheckable); ok { | ||
if realCheck := healthCheckable.HealthChecker(); realCheck != nil { | ||
check = realCheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could two controllers provide health checks with the same name? What would happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard healthz
mechanism disallows that. It will panic when two checks try to mount on the same path. e.g. two checks named "foo", first mounts on "/healthz/foo", second mounts on the same path, which will panic (from implementation of mux).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately, if two controllers provide the same health checks, the controller manager would panic on start, catching the error during any e2e test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good. Would it possible for us to validate the names in the provider checks match the controller name? If not, what we have might be okay, but if we enforce the naming that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is another way. We won't let the controller decide the name (with a new interface instead of healthz.HealthChecker). The controller manager decide the name of a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent.
for controller managers.
in KCM and CCM.
/lgtm |
/assign @cheftako I think this is ready for approval. |
/lgtm |
/assign @sttts For |
/approve this doesn't actually implement any health checks yet, right? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheftako, jiahuif, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@liggitt |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR is recreated from #95454 to support health checks for each controller, in controller managers.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
That MAY be true for
cloud-controller-manager
. CCM SHOULD include health checks. However, impl. of CCM is up to each cloud provider.Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: