Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose kubelet health checks using new prometheus endpoint #61369

Merged

Conversation

rramkumar1
Copy link
Contributor

@rramkumar1 rramkumar1 commented Mar 19, 2018

What this PR does / why we need it:
Expose the results of kubelet liveness and readiness probes through a new endpoint on the kubelet called /containerHealth. This endpoint will expose a Prometheus metric. Below is a snippet of output when that endpoint is queried.

rramkumar@e2e-test-rramkumar-master ~ $ curl localhost:10255/metrics/probes
# HELP prober_probe_result The result of a liveness or readiness probe for a container.
# TYPE prober_probe_result gauge
prober_probe_result{container_name="kube-apiserver",namespace="kube-system",pod_name="kube-apiserver-e2e-test-rramkumar-master",pod_uid="949e11ad296ad9e3c842fd900f8cc723",probe_type="Liveness"} 0
prober_probe_result{container_name="kube-controller-manager",namespace="kube-system",pod_name="kube-controller-manager-e2e-test-rramkumar-master",pod_uid="0abfc37840bba279706ec39ae53a924c",probe_type="Liveness"} 0
prober_probe_result{container_name="kube-scheduler",namespace="kube-system",pod_name="kube-scheduler-e2e-test-rramkumar-master",pod_uid="0cd4171f9c806808291e6e24f99f0454",probe_type="Liveness"} 0
prober_probe_result{container_name="l7-lb-controller",namespace="kube-system",pod_name="l7-lb-controller-v0.9.8-alpha.2-e2e-test-rramkumar-master",pod_uid="968c792f4c1772566c71403dca2407f9",probe_type="Liveness"} 0

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #58235

Release note:

Kubelet now exposes a new endpoint /metrics/probes which exposes a Prometheus metric containing the liveness and/or readiness probe results for a container.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 19, 2018
@rramkumar1
Copy link
Contributor Author

rramkumar1 commented Mar 19, 2018

/assign @loburm
For initial review of instrumentation piece.

Copy link
Contributor

@loburm loburm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks OK for me.

@@ -58,6 +58,17 @@ func (r Result) String() string {
}
}

func (r Result) ToPrometheusType() float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Name: "probe_result",
Help: "The result of a liveness or readiness probe for a container.",
},
[]string{"probe_type", "container_name", "pod_name", "namespace", "pod_uid"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we need both pod_uid and namespace+pod_name each of this should uniquely identify pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in #58827 we decided that it would be good to have both?

@rramkumar1
Copy link
Contributor Author

/assign @tallclair

@@ -66,6 +67,7 @@ import (
const (
metricsPath = "/metrics"
cadvisorMetricsPath = "/metrics/cadvisor"
containerHealthPath = "/containerHealth"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this exposed on a new endpoint, rather than /metrics? Minimally I think this should be /metrics/probes, but maybe it can just be in metrics?

Copy link
Contributor Author

@rramkumar1 rramkumar1 Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@loburm did not want this under /metrics. His reasoning was that anything under that endpoint should be metrics about the operation of kubelet only, not about its probes. Maybe we should revisit this discussion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question was discussed on sig-instrumentation meeting and we have agreed that /metrics endpoint should not expose information about state of containers running on the node. This endpoint contains only kubelet operational metrics and we wanted to keep it like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. What do you think about /metrics/probes then? /metrics/cadvisor already has per-container metrics, I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with /metrics/probes. @loburm does that sound okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rramkumar1 rramkumar1 force-pushed the expose-kubelet-health-checks branch 3 times, most recently from a2e4115 to 01d51ba Compare March 30, 2018 17:29
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor nit, then LGTM

@@ -31,6 +32,16 @@ import (
"k8s.io/kubernetes/pkg/kubelet/util/format"
)

// ProberResults stores the results of a probe as prometheus metrics.
var ProberResults = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this set a namespace?

Copy link
Contributor Author

@rramkumar1 rramkumar1 Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of the metrics declared in pkg/kubelet set the namespace so to be consistent I don't think I should here.

@tallclair
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rramkumar1, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 31, 2018
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 61894, 61369). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 20f7f37 into kubernetes:master Mar 31, 2018
@brancz
Copy link
Member

brancz commented Dec 15, 2018

Hi, next time when adding metrics please tag sig-instrumentation, as these metrics don't comply with the Kubernetes metrics instrumentation guidelines. Adding these metrics to the metrics overhaul targeted for 1.14 (kubernetes/enhancements#655).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Result of kubelet health checks not exposed
7 participants