Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding initial EndpointSlice metrics. #83257

Merged
merged 1 commit into from Nov 5, 2019

Conversation

@robscott
Copy link
Member

robscott commented Sep 27, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds some initial EndpointSlice metrics that cover the number of endpoints added or removed per sync, the number of endpoints in each slice, the total number of endpoints desired, and the total number of changes to endpoint slices, grouped by operation.

Special notes for your reviewer:
I'm very open to feedback and ideas for improving/changing these metrics. They've been helpful for me during some EndpointSlice scale testing but I'm interested in any other ideas.

Does this PR introduce a user-facing change?:

Adding initial EndpointSlice metrics.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/sig network
/cc @bowei @freehan
/priority important-longterm

Copy link
Member

freehan left a comment

This PR need more discussion.

@@ -186,6 +187,8 @@ func (c *Controller) Run(workers int, stopCh <-chan struct{}) {
return
}

endpointslicemetrics.RegisterMetrics()

This comment has been minimized.

Copy link
@freehan

freehan Oct 3, 2019

Member

Is it common practice to do it in Run?

I believe this can be done in init()?

This comment has been minimized.

This comment has been minimized.

Copy link
@bowei

bowei Oct 17, 2019

Member

Their code sample registers in the init(). I'm going to assume either way it fine (threadsafe)

https://godoc.org/github.com/prometheus/client_golang/prometheus

This comment has been minimized.

Copy link
@bowei

This comment has been minimized.

Copy link
@robscott

robscott Oct 17, 2019

Author Member

Moved it to init, I think it does make more sense there.

Help: "Number of endpoints added on each Service sync",
StabilityLevel: metrics.ALPHA,
},
[]string{"service_name", "service_namespace"},

This comment has been minimized.

Copy link
@freehan

freehan Oct 3, 2019

Member

I do not think you can use service namespace/name as the labels since they are unbounded in a cluster. This may cause high memory usage for Prometheus collector

@robscott robscott force-pushed the robscott:endpointslice-metrics branch from 5ad4354 to 1b07419 Oct 8, 2019
Help: "Number of endpoints added on each Service sync",
StabilityLevel: metrics.ALPHA,
},
[]string{"service_name", "service_namespace"},

This comment has been minimized.

Copy link
@bowei

bowei Oct 8, 2019

Member

We have to be careful of any dimensions that can vary based on # of a class of objects: services, namespaces.

Help: "The number of EndpointSlice changes",
StabilityLevel: metrics.ALPHA,
},
[]string{"service_name", "service_namespace", "operation"},

This comment has been minimized.

Copy link
@bowei

bowei Oct 8, 2019

Member

Generally speaking, you cannot put user input-derived fields in a metric, as the # of metrics will potentially grow w/out bound, not to mention if they are stored in internal systems, they would constitute PIIl.

@robscott robscott force-pushed the robscott:endpointslice-metrics branch from 1b07419 to be23b14 Oct 9, 2019
@robscott robscott force-pushed the robscott:endpointslice-metrics branch 2 times, most recently from ef89228 to cd03db4 Oct 9, 2019
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 9, 2019

@bowei and @freehan Thanks for the feedback! I've made some big changes in response to that. The metrics are different now in hopes that they'll be more useful globally without relying on labels. There's also a new efficiency metric that shows the efficiency of endpoint distribution (essentially expected slices / actual slices) with 1 representing ideal allocation and anything less than that being less than ideal.

@robscott robscott force-pushed the robscott:endpointslice-metrics branch from cd03db4 to 46e19fc Oct 9, 2019

// UpdateServicePortCache updates a ServicePortCache in the global cache for a
// given Service and updates the corresponding metrics.
func (c *Cache) UpdateServicePortCache(serviceNN types.NamespacedName, spCache ServicePortCache) {

This comment has been minimized.

Copy link
@freehan

freehan Oct 14, 2019

Member

It is worth comment on the detail of input.

This comment has been minimized.

Copy link
@robscott

robscott Oct 16, 2019

Author Member

I added comments describing the parameters here, wasn't quite sure how to format them though.

pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
@robscott robscott force-pushed the robscott:endpointslice-metrics branch from 46e19fc to aa78d94 Oct 16, 2019
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 16, 2019

/retest

1 similar comment
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 16, 2019

/retest

@robscott robscott force-pushed the robscott:endpointslice-metrics branch from aa78d94 to e1ff9d6 Oct 16, 2019
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 17, 2019

@bowei I reworked the ServicePortCache interface a bit so it's not just passing a map around. Not quite sure this is what you were suggesting, but I think it's at least a bit closer.

@@ -186,6 +187,8 @@ func (c *Controller) Run(workers int, stopCh <-chan struct{}) {
return
}

endpointslicemetrics.RegisterMetrics()

This comment has been minimized.

Copy link
@bowei

bowei Oct 17, 2019

Member

Their code sample registers in the init(). I'm going to assume either way it fine (threadsafe)

https://godoc.org/github.com/prometheus/client_golang/prometheus

@@ -186,6 +187,8 @@ func (c *Controller) Run(workers int, stopCh <-chan struct{}) {
return
}

endpointslicemetrics.RegisterMetrics()

This comment has been minimized.

Copy link
@bowei
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
@robscott robscott force-pushed the robscott:endpointslice-metrics branch from e1ff9d6 to 48775e9 Oct 17, 2019
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 17, 2019

/retest

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 17, 2019

/retest

pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
@robscott robscott force-pushed the robscott:endpointslice-metrics branch from 48775e9 to 724b142 Oct 24, 2019
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 24, 2019

/retest

3 similar comments
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 24, 2019

/retest

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 24, 2019

/retest

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Oct 24, 2019

/retest

Copy link
Member

freehan left a comment

/lgtm
/approve

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Nov 4, 2019

/retest

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Nov 4, 2019

/verify-owners

1 similar comment
@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Nov 5, 2019

/verify-owners

@robscott

This comment has been minimized.

Copy link
Member Author

robscott commented Nov 5, 2019

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 5, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: freehan, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 0c32aa8 into kubernetes:master Nov 5, 2019
15 checks passed
15 checks passed
cla/linuxfoundation robscott authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.