change kubelet probe metrics to counter #76074

danielqsj · 2019-04-03T09:42:49Z

What type of PR is this?

/kind feature
/sig instrumentation node

What this PR does / why we need it:

As discussion in #75839, we prefer to using counter type of metrics for kubelet probe rather than gauge type.

Which issue(s) this PR fixes:

Fixes #75839

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Change kubelet probe metrics to counter type.
The metrics `prober_probe_result` is replaced by `prober_probe_total`.

danielqsj · 2019-04-03T09:45:27Z

/cc @brancz @logicalhan

brancz · 2019-04-03T11:41:21Z

Given that none of the popular alert/dashboard definitions use this metric, I'm inclined to just remove the old metric.

logicalhan

I'm okay with just deleting the old metric as well. As mentioned in the issue, I would be surprised if anyone is actually using it for alerting since it is probably pretty noisy.

logicalhan · 2019-04-03T17:19:43Z

pkg/kubelet/prober/worker.go

@@ -98,16 +101,27 @@ func newWorker(
 		w.initialValue = results.Success
 	}

-	w.proberResultsMetricLabels = prometheus.Labels{
-		"probe_type":     w.probeType.String(),
-		"container_name": w.container.Name,


Woah, this is weird. I realize you didn't write this, but curious if anyone knows why we would have multiple labels on a metric which would always share the same label value?

Are you talking about container and contaiber_name?

Either. The previous implementation of this seems to store the same value for in both the container label and the container_name label. Same thing happens for pod and pod_name, which I find quite odd.

You can verify it by outputting the actual metric output via curl.

prober_probe_result{container="etcd-container",container_name="etcd-container",namespace="kube-system",pod="etcd-server-events-kubernetes-master",pod_name="etcd-server-events-kubernetes-master",pod_uid="1234",probe_type="Liveness"} 0

Why do we have two labels which always have the same value in either of them?

Super weird.

This was for migration purposes. For a long time these and cadvisor metrics used pod_name and container_name label keys, which violate the instrumentation guidelines. For migration purposes in 1.14 both are present.

Ah, that would explain it.

danielqsj · 2019-04-04T09:05:34Z

@brancz @logicalhan removed old metrics prober_probe_result. PTAL

brancz · 2019-04-04T09:09:07Z

looks good from instrumentation side

/lgtm

still needs a kubelet approver though @derekwaynecarr @tallclair

logicalhan

/lgtm

@dashpole, thoughts?

dashpole · 2019-04-05T22:01:57Z

/lgtm
This looks reasonable to me.

danielqsj · 2019-04-08T01:31:44Z

/retest

danielqsj · 2019-04-08T02:05:26Z

friendly ping @derekwaynecarr @tallclair , could you please help review it?

tallclair · 2019-04-11T18:04:47Z

/approve

k8s-ci-robot · 2019-04-11T18:04:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielqsj, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [tallclair]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2019-04-12T01:38:52Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

danielqsj · 2019-04-12T01:39:10Z

/retest

fejta-bot · 2019-04-12T05:50:50Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

brancz · 2019-04-12T07:12:10Z

/retest

k8s-ci-robot requested review from dims and yujuhong April 3, 2019 09:43

k8s-ci-robot added the area/kubelet label Apr 3, 2019

change kubelet probe metrics to counter type

295d672

danielqsj force-pushed the probe branch from 4b53d1f to 295d672 Compare April 3, 2019 09:44

k8s-ci-robot requested review from brancz and logicalhan April 3, 2019 09:45

logicalhan reviewed Apr 3, 2019

View reviewed changes

remove metrics prober_probe_result

6d041ab

k8s-ci-robot assigned brancz Apr 4, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 4, 2019

k8s-ci-robot assigned logicalhan Apr 4, 2019

logicalhan reviewed Apr 4, 2019

View reviewed changes

k8s-ci-robot assigned dashpole Apr 5, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 11, 2019

k8s-ci-robot merged commit b7858e3 into kubernetes:master Apr 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change kubelet probe metrics to counter #76074

change kubelet probe metrics to counter #76074

danielqsj commented Apr 3, 2019 •

edited

danielqsj commented Apr 3, 2019

brancz commented Apr 3, 2019

logicalhan left a comment

logicalhan Apr 3, 2019

brancz Apr 3, 2019

logicalhan Apr 3, 2019

logicalhan Apr 3, 2019 •

edited

logicalhan Apr 3, 2019

brancz Apr 3, 2019

logicalhan Apr 4, 2019

danielqsj commented Apr 4, 2019

brancz commented Apr 4, 2019

logicalhan left a comment

dashpole commented Apr 5, 2019

danielqsj commented Apr 8, 2019

danielqsj commented Apr 8, 2019

tallclair commented Apr 11, 2019

k8s-ci-robot commented Apr 11, 2019

fejta-bot commented Apr 12, 2019

danielqsj commented Apr 12, 2019

fejta-bot commented Apr 12, 2019

brancz commented Apr 12, 2019

change kubelet probe metrics to counter #76074

change kubelet probe metrics to counter #76074

Conversation

danielqsj commented Apr 3, 2019 • edited

danielqsj commented Apr 3, 2019

brancz commented Apr 3, 2019

logicalhan left a comment

Choose a reason for hiding this comment

logicalhan Apr 3, 2019

Choose a reason for hiding this comment

brancz Apr 3, 2019

Choose a reason for hiding this comment

logicalhan Apr 3, 2019

Choose a reason for hiding this comment

logicalhan Apr 3, 2019 • edited

Choose a reason for hiding this comment

logicalhan Apr 3, 2019

Choose a reason for hiding this comment

brancz Apr 3, 2019

Choose a reason for hiding this comment

logicalhan Apr 4, 2019

Choose a reason for hiding this comment

danielqsj commented Apr 4, 2019

brancz commented Apr 4, 2019

logicalhan left a comment

Choose a reason for hiding this comment

dashpole commented Apr 5, 2019

danielqsj commented Apr 8, 2019

danielqsj commented Apr 8, 2019

tallclair commented Apr 11, 2019

k8s-ci-robot commented Apr 11, 2019

fejta-bot commented Apr 12, 2019

danielqsj commented Apr 12, 2019

fejta-bot commented Apr 12, 2019

brancz commented Apr 12, 2019

danielqsj commented Apr 3, 2019 •

edited

logicalhan Apr 3, 2019 •

edited