-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose kubelet health checks using new prometheus endpoint #61369
Expose kubelet health checks using new prometheus endpoint #61369
Conversation
/assign @loburm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks OK for me.
@@ -58,6 +58,17 @@ func (r Result) String() string { | |||
} | |||
} | |||
|
|||
func (r Result) ToPrometheusType() float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Name: "probe_result", | ||
Help: "The result of a liveness or readiness probe for a container.", | ||
}, | ||
[]string{"probe_type", "container_name", "pod_name", "namespace", "pod_uid"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we need both pod_uid and namespace+pod_name each of this should uniquely identify pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in #58827 we decided that it would be good to have both?
365e4a8
to
b860cad
Compare
/assign @tallclair |
pkg/kubelet/server/server.go
Outdated
@@ -66,6 +67,7 @@ import ( | |||
const ( | |||
metricsPath = "/metrics" | |||
cadvisorMetricsPath = "/metrics/cadvisor" | |||
containerHealthPath = "/containerHealth" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this exposed on a new endpoint, rather than /metrics
? Minimally I think this should be /metrics/probes
, but maybe it can just be in metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loburm did not want this under /metrics. His reasoning was that anything under that endpoint should be metrics about the operation of kubelet only, not about its probes. Maybe we should revisit this discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question was discussed on sig-instrumentation meeting and we have agreed that /metrics endpoint should not expose information about state of containers running on the node. This endpoint contains only kubelet operational metrics and we wanted to keep it like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. What do you think about /metrics/probes
then? /metrics/cadvisor
already has per-container metrics, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with /metrics/probes. @loburm does that sound okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
a2e4115
to
01d51ba
Compare
01d51ba
to
c0c5566
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor nit, then LGTM
@@ -31,6 +32,16 @@ import ( | |||
"k8s.io/kubernetes/pkg/kubelet/util/format" | |||
) | |||
|
|||
// ProberResults stores the results of a probe as prometheus metrics. | |||
var ProberResults = prometheus.NewGaugeVec( | |||
prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this set a namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of the metrics declared in pkg/kubelet set the namespace so to be consistent I don't think I should here.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rramkumar1, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
Automatic merge from submit-queue (batch tested with PRs 61894, 61369). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Hi, next time when adding metrics please tag sig-instrumentation, as these metrics don't comply with the Kubernetes metrics instrumentation guidelines. Adding these metrics to the metrics overhaul targeted for 1.14 (kubernetes/enhancements#655). |
What this PR does / why we need it:
Expose the results of kubelet liveness and readiness probes through a new endpoint on the kubelet called /containerHealth. This endpoint will expose a Prometheus metric. Below is a snippet of output when that endpoint is queried.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #58235
Release note: