New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose kubelet health checks using new prometheus endpoint #61369

Merged
merged 1 commit into from Mar 31, 2018

Conversation

@rramkumar1
Member

rramkumar1 commented Mar 19, 2018

What this PR does / why we need it:
Expose the results of kubelet liveness and readiness probes through a new endpoint on the kubelet called /containerHealth. This endpoint will expose a Prometheus metric. Below is a snippet of output when that endpoint is queried.

rramkumar@e2e-test-rramkumar-master ~ $ curl localhost:10255/metrics/probes
# HELP prober_probe_result The result of a liveness or readiness probe for a container.
# TYPE prober_probe_result gauge
prober_probe_result{container_name="kube-apiserver",namespace="kube-system",pod_name="kube-apiserver-e2e-test-rramkumar-master",pod_uid="949e11ad296ad9e3c842fd900f8cc723",probe_type="Liveness"} 0
prober_probe_result{container_name="kube-controller-manager",namespace="kube-system",pod_name="kube-controller-manager-e2e-test-rramkumar-master",pod_uid="0abfc37840bba279706ec39ae53a924c",probe_type="Liveness"} 0
prober_probe_result{container_name="kube-scheduler",namespace="kube-system",pod_name="kube-scheduler-e2e-test-rramkumar-master",pod_uid="0cd4171f9c806808291e6e24f99f0454",probe_type="Liveness"} 0
prober_probe_result{container_name="l7-lb-controller",namespace="kube-system",pod_name="l7-lb-controller-v0.9.8-alpha.2-e2e-test-rramkumar-master",pod_uid="968c792f4c1772566c71403dca2407f9",probe_type="Liveness"} 0

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #58235

Release note:

Kubelet now exposes a new endpoint /metrics/probes which exposes a Prometheus metric containing the liveness and/or readiness probe results for a container.
@rramkumar1

This comment has been minimized.

Member

rramkumar1 commented Mar 19, 2018

/assign @loburm
For initial review of instrumentation piece.

@loburm

Overall looks OK for me.

@@ -58,6 +58,17 @@ func (r Result) String() string {
}
}
func (r Result) ToPrometheusType() float64 {

This comment has been minimized.

@loburm

loburm Mar 21, 2018

Contributor

Please add a comment to this function.

This comment has been minimized.

@rramkumar1
Name: "probe_result",
Help: "The result of a liveness or readiness probe for a container.",
},
[]string{"probe_type", "container_name", "pod_name", "namespace", "pod_uid"},

This comment has been minimized.

@loburm

loburm Mar 21, 2018

Contributor

not sure if we need both pod_uid and namespace+pod_name each of this should uniquely identify pod.

This comment has been minimized.

@rramkumar1

rramkumar1 Mar 21, 2018

Member

I think in #58827 we decided that it would be good to have both?

@rramkumar1

This comment has been minimized.

Member

rramkumar1 commented Mar 21, 2018

/assign @tallclair

@@ -66,6 +67,7 @@ import (
const (
metricsPath = "/metrics"
cadvisorMetricsPath = "/metrics/cadvisor"
containerHealthPath = "/containerHealth"

This comment has been minimized.

@tallclair

tallclair Mar 26, 2018

Member

Why is this exposed on a new endpoint, rather than /metrics? Minimally I think this should be /metrics/probes, but maybe it can just be in metrics?

This comment has been minimized.

@rramkumar1

rramkumar1 Mar 26, 2018

Member

@loburm did not want this under /metrics. His reasoning was that anything under that endpoint should be metrics about the operation of kubelet only, not about its probes. Maybe we should revisit this discussion?

This comment has been minimized.

@loburm

loburm Mar 27, 2018

Contributor

This question was discussed on sig-instrumentation meeting and we have agreed that /metrics endpoint should not expose information about state of containers running on the node. This endpoint contains only kubelet operational metrics and we wanted to keep it like that.

This comment has been minimized.

@tallclair

tallclair Mar 28, 2018

Member

Ack. What do you think about /metrics/probes then? /metrics/cadvisor already has per-container metrics, I believe.

This comment has been minimized.

@rramkumar1

rramkumar1 Mar 28, 2018

Member

I'm fine with /metrics/probes. @loburm does that sound okay?

This comment has been minimized.

@loburm

loburm Mar 29, 2018

Contributor

Yeah sure.

This comment has been minimized.

@rramkumar1
@tallclair

one minor nit, then LGTM

@@ -31,6 +32,16 @@ import (
"k8s.io/kubernetes/pkg/kubelet/util/format"
)
// ProberResults stores the results of a probe as prometheus metrics.
var ProberResults = prometheus.NewGaugeVec(
prometheus.GaugeOpts{

This comment has been minimized.

@tallclair

tallclair Mar 30, 2018

Member

Should this set a namespace?

This comment has been minimized.

@rramkumar1

rramkumar1 Mar 30, 2018

Member

I don't think any of the metrics declared in pkg/kubelet set the namespace so to be consistent I don't think I should here.

@tallclair

This comment has been minimized.

Member

tallclair commented Mar 31, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Mar 31, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Mar 31, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rramkumar1, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

fejta-bot commented Mar 31, 2018

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Mar 31, 2018

Automatic merge from submit-queue (batch tested with PRs 61894, 61369). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 20f7f37 into kubernetes:master Mar 31, 2018

13 of 14 checks passed

pull-kubernetes-e2e-gce-device-plugin-gpu Job triggered.
Details
Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation rramkumar1 authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment