proxy: followup to last-queued-change metric #90972

squeed · 2020-05-11T10:22:13Z

Fixes two small issues with the metric added in #90175:

Bump the timestamp on initial informer sync. Otherwise it remains 0 if restarting kube-proxy in a quiescent cluster, which isn't quite right.
Bump the timestamp even if no healthz server is specified.

/kind cleanup

What this PR does / why we need it:
The metrics introduced in #90175 don't quite match expectations.

Does this PR introduce a user-facing change?:
No, because this has not yet been released.

NONE

squeed · 2020-05-11T10:22:54Z

/sig network

squeed · 2020-05-11T10:23:03Z

/cc @danwinship @dcbw

danwinship · 2020-05-11T13:49:53Z

Bump the timestamp on initial informer sync. Otherwise it remains 0 if restarting kube-proxy in a quiescent cluster, which isn't quite right.

But there's still a race condition at startup even with this patch isn't there? If you look at the metrics before the initial sync, you'll get nonsense?

I'm not sure what the expected behavior for timestamp metrics in this case is...

Kube-proxy always forces a resync shortly after startup, so arguably just starting up kube-proxy counts as requesting a proxy sync, and it might be reasonable to just set SyncProxyRulesLastQueuedTimestamp at startup...

squeed · 2020-05-11T15:31:12Z

But there's still a race condition at startup even with this patch isn't there? If you look at the metrics before the initial sync, you'll get nonsense?

I'm not sure what the expected behavior for timestamp metrics in this case is...

It is indeed awkward for all "gauge" metrics. Now that Prometheus handles stale metrics better, the convention is to not report an unobserved metric. Unfortunately, the existing library (and the k8s wrapper around it) makes this somewhat awkward. I'm asking around to see if there's a reasonable way to do this.

Kube-proxy always forces a resync shortly after startup, so arguably just starting up kube-proxy counts as requesting a proxy sync, and it might be reasonable to just set SyncProxyRulesLastQueuedTimestamp at startup...

Yeah, that might be the right thing to do. Arguably, for a running system, the user doesn't care about the difference between a slow informer sync and a slow iptables sync. Either way, there's latency.

Fixes two small issues with the metric added in kubernetes#90175: 1. Bump the timestamp on initial informer sync. Otherwise it remains 0 if restarting kube-proxy in a quiescent cluster, which isn't quite right. 2. Bump the timestamp even if no healthz server is specified.

squeed · 2020-05-11T16:49:50Z

OK, updated to bump the timestamp on proxy startup.

ksubrmnn · 2020-05-11T16:55:16Z

@JocelynBerrendonner @dineshgovindasamy FYI

danwinship · 2020-05-11T17:07:05Z

/lgtm

squeed · 2020-05-12T08:50:25Z

/retest

squeed · 2020-05-12T08:51:51Z

Update: I asked around, and prometheus client-go doesn't currently have an ergonomic way of exposing a Gauge (timestamp) that has no value until first observed. In this particular case, we don't care anymore since we're bumping the value on startup.

danwinship · 2020-05-12T18:07:32Z

oh forgot that I have to say
/approve
too

k8s-ci-robot · 2020-05-12T18:08:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/proxy/OWNERS~~ [danwinship]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2020-05-12T20:52:18Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2020-05-13T00:01:18Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2020-05-13T03:10:16Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

k8s-ci-robot · 2020-05-13T03:55:43Z

@squeed: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-gci-gce-ipvs	`042daa2`	link	`/test pull-kubernetes-e2e-gci-gce-ipvs`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 11, 2020

k8s-ci-robot requested review from danwinship and dcbw May 11, 2020 10:23

k8s-ci-robot added the area/ipvs label May 11, 2020

k8s-ci-robot requested review from freehan and ksubrmnn May 11, 2020 10:23

squeed force-pushed the proxy-last-queued-metric branch from cf0cfc6 to 042daa2 Compare May 11, 2020 16:49

k8s-ci-robot assigned danwinship May 11, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2020

k8s-ci-robot merged commit 592a79c into kubernetes:master May 13, 2020

k8s-ci-robot added this to the v1.19 milestone May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy: followup to last-queued-change metric #90972

proxy: followup to last-queued-change metric #90972

squeed commented May 11, 2020

squeed commented May 11, 2020

squeed commented May 11, 2020

danwinship commented May 11, 2020

squeed commented May 11, 2020

squeed commented May 11, 2020

ksubrmnn commented May 11, 2020

danwinship commented May 11, 2020

squeed commented May 12, 2020

squeed commented May 12, 2020

danwinship commented May 12, 2020

k8s-ci-robot commented May 12, 2020

fejta-bot commented May 12, 2020

fejta-bot commented May 13, 2020

fejta-bot commented May 13, 2020

k8s-ci-robot commented May 13, 2020

proxy: followup to last-queued-change metric #90972

proxy: followup to last-queued-change metric #90972

Conversation

squeed commented May 11, 2020

squeed commented May 11, 2020

squeed commented May 11, 2020

danwinship commented May 11, 2020

squeed commented May 11, 2020

squeed commented May 11, 2020

ksubrmnn commented May 11, 2020

danwinship commented May 11, 2020

squeed commented May 12, 2020

squeed commented May 12, 2020

danwinship commented May 12, 2020

k8s-ci-robot commented May 12, 2020

fejta-bot commented May 12, 2020

fejta-bot commented May 13, 2020

fejta-bot commented May 13, 2020

k8s-ci-robot commented May 13, 2020