New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gauge metric for master of leader election. #71731

Merged
merged 1 commit into from Jan 8, 2019

Conversation

@cheftako
Copy link
Member

cheftako commented Dec 5, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:
Adds a gauge for leader election.
0 indicates standby, 1 indicates master, label indicates which lease.

Which issue(s) this PR fixes:
Fixes #71730

Special notes for your reviewer:
$ curl http://127.0.0.1:10252/metrics | egrep -i leader
# HELP leader_election_master_gauge Gauge of if the reporting system is master of the relevant election.
# TYPE leader_election_master_gauge gauge
leader_election_master_gauge{name="kube-controller-manager"} 1

Does this PR introduce a user-facing change?:

NONE
@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 5, 2018

@k8s-ci-robot k8s-ci-robot requested review from eparis and wojtek-t Dec 5, 2018

@@ -214,6 +215,7 @@ func (le *LeaderElector) acquire(ctx context.Context) bool {
klog.Infof("successfully acquired lease %v", desc)
cancel()
}, le.config.RetryPeriod, JitterFactor, true, ctx.Done())
leaderGauge.WithLabelValues(le.config.Name).Set(1.0)

This comment has been minimized.

@logicalhan

logicalhan Dec 5, 2018

Contributor

Shouldn't we check the value of succeeded prior to toggling the gauge? I'm not super familiar with this code, but at least according to the comment for this function, there seems to be a situation where we did not succeed in acquiring the lease.

This comment has been minimized.

@cheftako

cheftako Dec 5, 2018

Author Member

Good point. If succeeded is false we are about to exit but no point in putting a blip on the graphs.

@@ -245,6 +247,7 @@ func (le *LeaderElector) renew(ctx context.Context) {
klog.V(5).Infof("successfully renewed lease %v", desc)
return
}
leaderGauge.WithLabelValues(le.config.Name).Set(0.0)

This comment has been minimized.

@logicalhan

logicalhan Dec 5, 2018

Contributor

Out of curiosity, what is the value of le.config.Name?

This comment has been minimized.

@cheftako

cheftako Dec 5, 2018

Author Member

It depends on the lease in question. So for the KCM its "kube-controller-manager" and for the scheduler its "kube-scheduler".

@krmayankk

This comment has been minimized.

Copy link
Contributor

krmayankk commented Dec 5, 2018

@cheftako is this a per local node metric ? How will i know using this metric which node has the lock ? Today i can do kubectl get endpoints kube-controller-manager -n kube-system -o yaml | grep holderIdentity and it tells me who the master is one location

@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 5, 2018

@krmayankk the metric will show up in an individual servers /metrics endpoint. So prometheus or whichever system is collecting the metrics should be associating that with the server from which it gathered the metric. This is common to all the metrics coming out of the /metrics endpoint. (Eg. "daemonset_queue_latency_count 104") Presumably you will have something periodically scraping this endpoint and creating a graph for your HA cluster.

@cheftako cheftako force-pushed the cheftako:leaseMetric branch from 2cc976a to ebf2183 Dec 6, 2018

@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 6, 2018

/test pull-kubernetes-integration

@cheftako cheftako force-pushed the cheftako:leaseMetric branch from ebf2183 to 4e26492 Dec 6, 2018

@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 7, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@jpbetz

This comment has been minimized.

Copy link
Contributor

jpbetz commented Dec 7, 2018

The metric help string now makes perfect sense to me. Thanks @cheftako!

@jpbetz

jpbetz approved these changes Dec 7, 2018

@jpbetz

This comment has been minimized.

Copy link
Contributor

jpbetz commented Dec 7, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Dec 7, 2018

@logicalhan logicalhan referenced this pull request Dec 8, 2018

Closed

REQUEST: New membership for @logicalhan #292

6 of 6 tasks complete
@logicalhan
Copy link
Contributor

logicalhan left a comment

/lgtm to me too!

@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 12, 2018

/assign @mikedanese

@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 14, 2018

/test pull-kubernetes-godeps

@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 26, 2018

@cheftako cheftako force-pushed the cheftako:leaseMetric branch from 32eb2f0 to 02f2edb Dec 26, 2018

@cheftako cheftako force-pushed the cheftako:leaseMetric branch from 02f2edb to 342fca0 Dec 27, 2018

Add gauge metric for master of leader election.
Fixes #71730
0 indicates standby, 1 indicates master, label indicates which lease.
Tweaked name and documentation
Factored in Mike Danese feedback.
Removed dependency on prometheus from client-go using adapter.
Centralized adapter import.
Fixed godeps
Fixed boilerplate.
Put in fixes for caesarxuchao

@cheftako cheftako force-pushed the cheftako:leaseMetric branch from 342fca0 to f192657 Dec 27, 2018

@@ -27,8 +27,8 @@ import (
"k8s.io/apiserver/pkg/server"
"k8s.io/apiserver/pkg/util/logs"
"k8s.io/kubernetes/cmd/kube-apiserver/app"
_ "k8s.io/kubernetes/pkg/client/metrics/prometheus" // for client metric registration
_ "k8s.io/kubernetes/pkg/version/prometheus" // for version metric registration
_ "k8s.io/kubernetes/pkg/util/prometheusclientgo" // load all the prometheus client-go plugins

This comment has been minimized.

@caesarxuchao

caesarxuchao Dec 27, 2018

Member

Disucssed offline. This registers more metrics then it used to be. We think the memory usage would be O(MB) so it's ok.

Show resolved Hide resolved staging/src/k8s.io/client-go/tools/leaderelection/metrics.go
@caesarxuchao

This comment has been minimized.

Copy link
Member

caesarxuchao commented Dec 27, 2018

/lgtm
/approve

@cheftako

This comment has been minimized.

Copy link
Member Author

cheftako commented Dec 27, 2018

@deads2k

This comment has been minimized.

Copy link
Contributor

deads2k commented Jan 4, 2019

This is a great idea.

/approve

@smarterclayton I don't have approver powers on /pkg

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 4, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: caesarxuchao, cheftako, deads2k, mikedanese
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: thockin

If they are not already assigned, you can assign the PR to them by writing /assign @thockin in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

le := LeaderElector{
config: lec,
clock: clock.RealClock{},
metrics: globalMetricsFactory.newLeaderMetrics(),

This comment has been minimized.

@smarterclayton

smarterclayton Jan 4, 2019

Contributor

This is really ugly. What are we gaining by making prometheus metrics not the default? Who is adding code level metrics plugins to k/k that we would approve?

This feels like a pointless abstraction - would like more clarity about why we are polluting the code with it.

This comment has been minimized.

@deads2k

deads2k Jan 7, 2019

Contributor

This is really ugly. What are we gaining by making prometheus metrics not the default? Who is adding code level metrics plugins to k/k that we would approve?

This feels like a pointless abstraction - would like more clarity about why we are polluting the code with it.

Short answer: fewer deps from client-go.

Back when we started the project, prometheus wasn't as wide spread and we wanted to avoid the dependency. I think we should revisit that choice, but I would not choose to revisit it in this pull.

This comment has been minimized.

@cheftako

cheftako Jan 7, 2019

Author Member

As David says this is about reducing how many extra dependencies you have to pull in when you pull in client-go. The original version of the PR did not include the dependency breaking back flips but I was asked to add it in the standard manner, we do for other client-go metrics)

@deads2k deads2k added the approved label Jan 8, 2019

@deads2k

This comment has been minimized.

Copy link
Contributor

deads2k commented Jan 8, 2019

@smarterclayton approved via slack.

@k8s-ci-robot k8s-ci-robot merged commit cc67ccf into kubernetes:master Jan 8, 2019

19 checks passed

cla/linuxfoundation cheftako authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-godeps Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
limitations under the License.
*/

package prometheus

This comment has been minimized.

@luxas

luxas Jan 10, 2019

Member

leaderelection?


package prometheusclientgo

// Provided metrics needing adapting

This comment has been minimized.

@luxas

luxas Jan 10, 2019

Member

Explain here that this package should be used in your application when using both client-go and prometheus in your app.

// Package prometheus sets the workqueue DefaultMetricsFactory to produce
// prometheus metrics. To use this package, you just have to import it.

func init() {

This comment has been minimized.

@luxas

luxas Jan 10, 2019

Member

I REALLY think we should move away from adding init funcs in new code pkgs like this.
Why not just name this UseProvider() and call it (even in an init func) on the caller's side?

},
[]string{"name"},
)
prometheus.Register(leaderGauge)

This comment has been minimized.

@luxas

luxas Jan 10, 2019

Member

Do we track moving to a world where we're not using the global registry and gatherer?
We should move away from that IMO. (Separate issue I know but...)

leaderOff(name string)
}

// GaugeMetric represents a single numerical value that can arbitrarily go up

This comment has been minimized.

@luxas

luxas Jan 10, 2019

Member

nit, something like: SwitchMetric represents a value that can be enabled or disabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment