Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding initial EndpointSlice metrics. #83257

Merged
merged 1 commit into from Nov 5, 2019

Conversation

robscott
Copy link
Member

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds some initial EndpointSlice metrics that cover the number of endpoints added or removed per sync, the number of endpoints in each slice, the total number of endpoints desired, and the total number of changes to endpoint slices, grouped by operation.

Special notes for your reviewer:
I'm very open to feedback and ideas for improving/changing these metrics. They've been helpful for me during some EndpointSlice scale testing but I'm interested in any other ideas.

Does this PR introduce a user-facing change?:

Adding initial EndpointSlice metrics.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

/sig network
/cc @bowei @freehan
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 27, 2019
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/network Categorizes an issue or PR as relevant to SIG Network. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Sep 27, 2019
Copy link
Contributor

@freehan freehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR need more discussion.

@@ -186,6 +187,8 @@ func (c *Controller) Run(workers int, stopCh <-chan struct{}) {
return
}

endpointslicemetrics.RegisterMetrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it common practice to do it in Run?

I believe this can be done in init()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their code sample registers in the init(). I'm going to assume either way it fine (threadsafe)

https://godoc.org/github.com/prometheus/client_golang/prometheus

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to init, I think it does make more sense there.

Help: "Number of endpoints added on each Service sync",
StabilityLevel: metrics.ALPHA,
},
[]string{"service_name", "service_namespace"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you can use service namespace/name as the labels since they are unbounded in a cluster. This may cause high memory usage for Prometheus collector

Help: "Number of endpoints added on each Service sync",
StabilityLevel: metrics.ALPHA,
},
[]string{"service_name", "service_namespace"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to be careful of any dimensions that can vary based on # of a class of objects: services, namespaces.

Help: "The number of EndpointSlice changes",
StabilityLevel: metrics.ALPHA,
},
[]string{"service_name", "service_namespace", "operation"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, you cannot put user input-derived fields in a metric, as the # of metrics will potentially grow w/out bound, not to mention if they are stored in internal systems, they would constitute PIIl.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 9, 2019
@robscott robscott force-pushed the endpointslice-metrics branch 2 times, most recently from ef89228 to cd03db4 Compare October 9, 2019 20:09
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2019
@robscott
Copy link
Member Author

robscott commented Oct 9, 2019

@bowei and @freehan Thanks for the feedback! I've made some big changes in response to that. The metrics are different now in hopes that they'll be more useful globally without relying on labels. There's also a new efficiency metric that shows the efficiency of endpoint distribution (essentially expected slices / actual slices) with 1 representing ideal allocation and anything less than that being less than ideal.

pkg/controller/endpointslice/metrics/cache.go Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Show resolved Hide resolved

// UpdateServicePortCache updates a ServicePortCache in the global cache for a
// given Service and updates the corresponding metrics.
func (c *Cache) UpdateServicePortCache(serviceNN types.NamespacedName, spCache ServicePortCache) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth comment on the detail of input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments describing the parameters here, wasn't quite sure how to format them though.

pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
@robscott
Copy link
Member Author

/retest

1 similar comment
@robscott
Copy link
Member Author

/retest

@robscott
Copy link
Member Author

@bowei I reworked the ServicePortCache interface a bit so it's not just passing a map around. Not quite sure this is what you were suggesting, but I think it's at least a bit closer.

@@ -186,6 +187,8 @@ func (c *Controller) Run(workers int, stopCh <-chan struct{}) {
return
}

endpointslicemetrics.RegisterMetrics()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their code sample registers in the init(). I'm going to assume either way it fine (threadsafe)

https://godoc.org/github.com/prometheus/client_golang/prometheus

@@ -186,6 +187,8 @@ func (c *Controller) Run(workers int, stopCh <-chan struct{}) {
return
}

endpointslicemetrics.RegisterMetrics()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/controller/endpointslice/metrics/cache.go Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
@robscott
Copy link
Member Author

/retest

@robscott
Copy link
Member Author

/retest

pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
pkg/controller/endpointslice/metrics/cache.go Outdated Show resolved Hide resolved
@robscott
Copy link
Member Author

/retest

3 similar comments
@robscott
Copy link
Member Author

/retest

@robscott
Copy link
Member Author

/retest

@robscott
Copy link
Member Author

/retest

Copy link
Contributor

@freehan freehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2019
@robscott
Copy link
Member Author

robscott commented Nov 4, 2019

/retest

@robscott
Copy link
Member Author

robscott commented Nov 4, 2019

/verify-owners

1 similar comment
@robscott
Copy link
Member Author

robscott commented Nov 5, 2019

/verify-owners

@robscott
Copy link
Member Author

robscott commented Nov 5, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: freehan, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2019
@k8s-ci-robot k8s-ci-robot merged commit 0c32aa8 into kubernetes:master Nov 5, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 5, 2019
@robscott robscott deleted the endpointslice-metrics branch March 11, 2021 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants