-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update etcdRequestLatency metrics bucket size #107042
Conversation
Hi @kkkkun. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/release-note-none |
These changes looks good from my side although I am wondering if we really need that much granularity for etcd. That said, considering that the number of buckets allocated to the apiserver latency was recently reduced from 40 to 20, I think it is reasonable to reconsider aligning the buckets of both metrics to be consistent since that's what we used to have in the past, but I would still like to have the opinion from others considering this past effort to reduce cardinality: #96754. |
I had a PR to increase the buckets #94134 (similar to this PR), but then I had to reduce it #96754. I agree that keeping the bucket size similar makes these metrics more comparable, but we have to trade it off with a performance impact. The cardinality explosion was noticeable, we saw it in OpenShift. That's why I reduced it. Also, not sure how directly comparable it can be - an api call may result in multiple calls to storage? @wojtek-t will know the answer to this. On a related note, I am working on a PR to add the latency incurred in the storage layer as an annotation in the audit logs, if you mine the audit logs you will be able to have a clear picture what portion of the latency is spent in storage layer, admission layer, object transformation. |
do we need more than 11 buckets? |
The bucket size is not the matter. We just want to keep the bucket size similar with requestLatencies . |
Even that comes with a cost, please note that It would be nice to see how this change impacts cpu utilization or memory footprint on a cluster and how it grows with increasing number of CRDs. |
/assign @jpbetz |
/lgtm |
@logicalhan Hi, Could you please give some advice? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@logicalhan: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kkkkun, logicalhan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR may require stable metrics review. Stable metrics are guaranteed to not change. Please review the documentation for the requirements and lifecycle of stable metrics and ensure that your metrics meet these guidelines. |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
1 similar comment
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
@kkkkun: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind flake
/release-note-none
What this PR does / why we need it:
Apiserver has two metrics etcdRequestLatency and requestLatencies which show latency.
But they have different bucket size.
When we have a large cluster, It is difficulty to distinguish long latency which is spent in etcd or apiserver. Keeping consistent would help us more simple to find question.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: