Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: followup to last-queued-change metric #90972

Merged
merged 1 commit into from May 13, 2020

Conversation

squeed
Copy link
Contributor

@squeed squeed commented May 11, 2020

Fixes two small issues with the metric added in #90175:

  1. Bump the timestamp on initial informer sync. Otherwise it remains 0 if restarting kube-proxy in a quiescent cluster, which isn't quite right.
  2. Bump the timestamp even if no healthz server is specified.

/kind cleanup

What this PR does / why we need it:
The metrics introduced in #90175 don't quite match expectations.

Does this PR introduce a user-facing change?:
No, because this has not yet been released.

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 11, 2020
@squeed
Copy link
Contributor Author

squeed commented May 11, 2020

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 11, 2020
@squeed
Copy link
Contributor Author

squeed commented May 11, 2020

/cc @danwinship @dcbw

@danwinship
Copy link
Contributor

Bump the timestamp on initial informer sync. Otherwise it remains 0 if restarting kube-proxy in a quiescent cluster, which isn't quite right.

But there's still a race condition at startup even with this patch isn't there? If you look at the metrics before the initial sync, you'll get nonsense?

I'm not sure what the expected behavior for timestamp metrics in this case is...

Kube-proxy always forces a resync shortly after startup, so arguably just starting up kube-proxy counts as requesting a proxy sync, and it might be reasonable to just set SyncProxyRulesLastQueuedTimestamp at startup...

@squeed
Copy link
Contributor Author

squeed commented May 11, 2020

But there's still a race condition at startup even with this patch isn't there? If you look at the metrics before the initial sync, you'll get nonsense?

I'm not sure what the expected behavior for timestamp metrics in this case is...

It is indeed awkward for all "gauge" metrics. Now that Prometheus handles stale metrics better, the convention is to not report an unobserved metric. Unfortunately, the existing library (and the k8s wrapper around it) makes this somewhat awkward. I'm asking around to see if there's a reasonable way to do this.

Kube-proxy always forces a resync shortly after startup, so arguably just starting up kube-proxy counts as requesting a proxy sync, and it might be reasonable to just set SyncProxyRulesLastQueuedTimestamp at startup...

Yeah, that might be the right thing to do. Arguably, for a running system, the user doesn't care about the difference between a slow informer sync and a slow iptables sync. Either way, there's latency.

Fixes two small issues with the metric added in kubernetes#90175:

1. Bump the timestamp on initial informer sync. Otherwise it remains 0 if
   restarting kube-proxy in a quiescent cluster, which isn't quite right.
2. Bump the timestamp even if no healthz server is specified.
@squeed
Copy link
Contributor Author

squeed commented May 11, 2020

OK, updated to bump the timestamp on proxy startup.

@ksubrmnn
Copy link
Contributor

@danwinship
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2020
@squeed
Copy link
Contributor Author

squeed commented May 12, 2020

/retest

@squeed
Copy link
Contributor Author

squeed commented May 12, 2020

Update: I asked around, and prometheus client-go doesn't currently have an ergonomic way of exposing a Gauge (timestamp) that has no value until first observed. In this particular case, we don't care anymore since we're bumping the value on startup.

@danwinship
Copy link
Contributor

oh forgot that I have to say
/approve
too

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2020
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot
Copy link
Contributor

@squeed: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gci-gce-ipvs 042daa2 link /test pull-kubernetes-e2e-gci-gce-ipvs

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 592a79c into kubernetes:master May 13, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipvs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants