Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Disable the rest_client_request_latency_seconds metric by default #1587

Merged
merged 1 commit into from
Jul 9, 2021
Merged

⚠️ Disable the rest_client_request_latency_seconds metric by default #1587

merged 1 commit into from
Jul 9, 2021

Conversation

2uasimojo
Copy link
Contributor

@2uasimojo 2uasimojo commented Jul 9, 2021

This metric has been deprecated since kube 1.14 and turned off in kube 1.17. Stop registering it by default, but export the appropriate variables so consumers can register it explicitly if they wish.

fixes #1423

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 9, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @2uasimojo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 9, 2021
@logicalhan
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 9, 2021
Copy link

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this? this isn't really making it optional.. you should have a command line flag which allows you to enable this metric and plumb the value of that flag to the metric registration function

@alvaroaleman
Copy link
Member

what is this? this isn't really making it optional.. you should have a command line flag which allows you to enable this metric and plumb the value of that flag to the metric registration function

No. We are a library. We are not going to register a flag. How and if controller authors expose options in their controllers to end-users is up to them.

@logicalhan
Copy link

component-base is also a library, it includes flags which can be used by consuming binaries. these include the ones for metrics.

@2uasimojo
Copy link
Contributor Author

...but I'm with you on the plumbing part. If there was a way we could set a package variable such that it's present by the time the import init()s, we could switch it on/off from client code. Is that even possible in go?

@logicalhan
Copy link

...but I'm with you on the plumbing part. If there was a way we could set a package variable such that it's present by the time the import init()s, we could switch it on/off from client code. Is that even possible in go?

yes it is possible, we literally do this in upstream kubernetes.

@logicalhan
Copy link

obviously this means you can't use the init for these though.

@2uasimojo
Copy link
Contributor Author

Okay; I was looking for a way to enable opt-out so the change could be non-breaking. But if

you can't use the init for these

then I'm not sure how that would work.

@logicalhan
Copy link

@alvaroaleman
Copy link
Member

Kubelet is a binary, we are a library. I am going to veto any change that registers a flag. The one case where we do this creates enough trouble already: #1396

@logicalhan
Copy link

Okay; I was looking for a way to enable opt-out so the change could be non-breaking.

I think it's okay to make it opt-in, it's more friendly than just deleting it. And the approach of manually invoking Register is now an option.

@logicalhan
Copy link

Kubelet is a binary, we are a library. I am going to veto any change that registers a flag. The one case where we do this creates enough trouble already: #1396

That's fine, I don't have any skin in the game here, I am just saying that it's not friendly to outright break people. What you guys want to do with your library and what your priorities are 🤷 to me.. I'm stating my opinion about deleting metrics and breaking people.. I've had it happen to me and it sucks.

@logicalhan
Copy link

Actually, as a library, it would make sense to make all your metrics opt-in...

@2uasimojo
Copy link
Contributor Author

So if we don't

registers a flag

and we want to

make it opt-in

then is the approach in this PR acceptable? Or would it be better to wrap the registration of this metric in its own Register func?

Alternatively, I would be happy to propose

make all your metrics opt-in

@logicalhan
Copy link

Yeah I would probably make all the metrics public and auto-register them for a release and have a release note saying that in the next release, these will all be opt in. That way, people have some sort of warning, rather than just randomly have everything break. You have several options (as I see it):

  • expose all metrics individually (this is kinda a pain because now you have to know all the ones you want to register)
  • expose groups of metrics (this is slightly easier for consumers)
  • expose a single Register which registers all metrics (possibly taking a var args of metrics not to enable).

@2uasimojo
Copy link
Contributor Author

I'm happy to do anything in this range, from outright removal to immediate opt-in (this PR) to a multi-release no-change-deprecation => opt-in dance.

@alvaroaleman what is your wish?

@alvaroaleman
Copy link
Member

This change modulo my comment https://github.com/kubernetes-sigs/controller-runtime/pull/1587/files#r667037223 is IMHO fine. I don't think we should require opt-in for every single metric but I am happy to discuss if, which and how we register metrics by default in a new issue.

This metric has been deprecated since kube 1.14 and [turned off in kube
1.17](kubernetes/kubernetes#83836). Stop
registering it by default, but export the appropriate variables so
consumers can register it explicitly if they wish.
@2uasimojo
Copy link
Contributor Author

This change modulo my comment

I don't think we should require opt-in for every single metric but I am happy to discuss

See #1589 for one approach: opt in gets you all by default, but you have the ability to opt out of individual metrics.

@alvaroaleman
Copy link
Member

I am going to retitle this to make sure that its properly understood when it ends up in a release note:
/retitle ⚠️ Disable the rest_client_request_latency_seconds metric by default

@k8s-ci-robot k8s-ci-robot changed the title ⚠️ Make rest_client_request_latency_seconds metric optional ⚠️ Disable the rest_client_request_latency_seconds metric by default Jul 9, 2021
Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 9, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, alvaroaleman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2021
@k8s-ci-robot k8s-ci-robot merged commit 650ea59 into kubernetes-sigs:master Jul 9, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.10.x milestone Jul 9, 2021
@2uasimojo 2uasimojo deleted the issues/1423/optional branch July 9, 2021 18:40
@2uasimojo
Copy link
Contributor Author

Okay, so what's the process for getting this into a tagged release?

@alvaroaleman
Copy link
Member

@2uasimojo we deliver breaking changes with a new minor release and do a new minor release whenever there is a new kube minor release. So this will take some time.

2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Jul 9, 2021
We were using a
[controller-runtime fork](github.com/openshift-hive/controller-runtime)
so we could
[disable a metric](openshift-hive/controller-runtime@e847e4b)
that was causing a cardinality explosion and memory problems.

Upstream controller-runtime has now
[disabled that metric by default](kubernetes-sigs/controller-runtime#1587);
so this commit switches back to using upstream.

NOTE: We're using an unreleased upstream version.
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Jul 9, 2021
We were using a
[controller-runtime fork](github.com/openshift-hive/controller-runtime)
so we could
[disable a metric](openshift-hive/controller-runtime@e847e4b)
that was causing a cardinality explosion and memory problems.

Upstream controller-runtime has now
[disabled that metric by default](kubernetes-sigs/controller-runtime#1587);
so this commit switches back to using upstream.

NOTE: We're using an unreleased upstream version.
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Jul 9, 2021
We were using a
[controller-runtime fork](github.com/openshift-hive/controller-runtime)
so we could
[disable a metric](openshift-hive/controller-runtime@e847e4b)
that was causing a cardinality explosion and memory problems.

Upstream controller-runtime has now
[disabled that metric by default](kubernetes-sigs/controller-runtime#1587);
so this commit switches back to using upstream.

NOTE: We're using an unreleased upstream version.
xrstf added a commit to xrstf/kubermatic that referenced this pull request Sep 8, 2021
kubermatic-bot pushed a commit to kubermatic/kubermatic that referenced this pull request Sep 21, 2021
* adapt to metrics name change from k8s 1.17

see kubernetes-sigs/controller-runtime#1587

* remove dependency on github.com/grafana/grafana just for the RoleType

* update to k8s 1.22.1 and controller-runtime 0.10

* update codegen
tommyknows added a commit to snyk/kubernetes-scanner that referenced this pull request Mar 30, 2023
This commit adds two simple prometheus metrics to the http client that
is being used by the backend; "requests_total" and
"requests_duration_histogram_seconds". With that we should get some
initial visibility into backend failures, response times and client
requests per seconds as well.

I decided to register everything in an `init` function to the
`metrics.Gatherer`. Not perfect, but simple and probably good
enough for a long time. I got to that `metrics.Gatherer` type by
following the metrics-code of `controller-runtime`; I'm not sure if
there's a better way to register metrics to that Registry, or if using a
different registry would be fine as well and they'd simply get
interlaced?...

Additionally, controller-runtime also has a
[`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants)
trics that it registers. As far as I can tell, this is for the default
`http.Client` and comes from `client-go`. We could probably also make
use of that, but would be missing a latency bucket. That latency bucket
also exists, but is
[disabled by default](kubernetes-sigs/controller-runtime#1587)
because it created a cardinality explosion for some users, so I'm wary
to enable it as well.

By using a completely separate code-path and metrics-handler, we get
metrics for only our backend, instead of them being interlaced with
potential metrics from `client-go`. Additionally, we can start off with
both latency and count-metrics, as I don't think we'll have issues with
cardinality (we're only registering two labels - `client-go` also
registered a "url" label which is not optimal).
tommyknows added a commit to snyk/kubernetes-scanner that referenced this pull request Mar 30, 2023
This commit adds two simple prometheus metrics to the http client that
is being used by the backend; "requests_total" and
"requests_duration_histogram_seconds". With that we should get some
initial visibility into backend failures, response times and client
requests per seconds as well.

I decided to register everything in an `init` function to the
`metrics.Gatherer`. Not perfect, but simple and probably good
enough for a long time. I got to that `metrics.Gatherer` type by
following the metrics-code of `controller-runtime`; I'm not sure if
there's a better way to register metrics to that Registry, or if using a
different registry would be fine as well and they'd simply get
interlaced?...

Additionally, controller-runtime also has a
[`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants)
trics that it registers. As far as I can tell, this is for the default
`http.Client` and comes from `client-go`. We could probably also make
use of that, but would be missing a latency bucket. That latency bucket
also exists, but is
[disabled by default](kubernetes-sigs/controller-runtime#1587)
because it created a cardinality explosion for some users, so I'm wary
to enable it as well.

By using a completely separate code-path and metrics-handler, we get
metrics for only our backend, instead of them being interlaced with
potential metrics from `client-go`. Additionally, we can start off with
both latency and count-metrics, as I don't think we'll have issues with
cardinality (we're only registering two labels - `client-go` also
registered a "url" label which is not optimal).
tommyknows added a commit to snyk/kubernetes-scanner that referenced this pull request Mar 30, 2023
This commit adds two simple prometheus metrics to the http client that
is being used by the backend; "requests_total" and
"requests_duration_histogram_seconds". With that we should get some
initial visibility into backend failures, response times and client
requests per seconds as well.

I decided to register everything in an `init` function to the
`metrics.Gatherer`. Not perfect, but simple and probably good
enough for a long time. I got to that `metrics.Gatherer` type by
following the metrics-code of `controller-runtime`; I'm not sure if
there's a better way to register metrics to that Registry, or if using a
different registry would be fine as well and they'd simply get
interlaced?...

Additionally, controller-runtime also has a
[`rest_client_requests_total`
metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants)
trics that it registers. As far as I can tell, this is for the default
`http.Client` and comes from `client-go`. We could probably also make
use of that, but would be missing a latency bucket. That latency bucket
also exists, but is
[disabled by default](kubernetes-sigs/controller-runtime#1587)
because it created a cardinality explosion for some users, so I'm wary
to enable it as well.

By using a completely separate code-path and metrics-handler, we get
metrics for only our backend, instead of them being interlaced with
potential metrics from `client-go`. Additionally, we can start off with
both latency and count-metrics, as I don't think we'll have issues with
cardinality (we're only registering two labels - `client-go` also
registered a "url" label which is not optimal).
tommyknows added a commit to snyk/kubernetes-scanner that referenced this pull request Mar 31, 2023
This commit adds two simple prometheus metrics to the http client that
is being used by the backend; "requests_total" and
"requests_duration_histogram_seconds". With that we should get some
initial visibility into backend failures, response times and client
requests per seconds as well.

I decided to register everything in an `init` function to the
`metrics.Gatherer`. Not perfect, but simple and probably good
enough for a long time. I got to that `metrics.Gatherer` type by
following the metrics-code of `controller-runtime`; I'm not sure if
there's a better way to register metrics to that Registry, or if using a
different registry would be fine as well and they'd simply get
interlaced?...

Additionally, controller-runtime also has a
[`rest_client_requests_total`
metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants)
that it registers. As far as I can tell, this is for the default
`http.Client` and comes from `client-go`. We could probably also make
use of that, but would be missing a latency bucket. That latency bucket
also exists, but is
[disabled by default](kubernetes-sigs/controller-runtime#1587)
because it created a cardinality explosion for some users, so I'm wary
to enable it as well.

By using a completely separate code-path and metrics-handler, we get
metrics for only our backend, instead of them being interlaced with
potential metrics from `client-go`. Additionally, we can start off with
both latency and count-metrics, as I don't think we'll have issues with
cardinality (we're only registering two labels - `client-go` also
registered a "url" label which is not optimal).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable rest client latency metric
4 participants