New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifests/telemetry: replace apiserver_request_count with apiserver_request_total #821
manifests/telemetry: replace apiserver_request_count with apiserver_request_total #821
Conversation
Hi @martinpovolny. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Not sure if this qualifies as "user facing" to require a Changelog entry. |
# of each http status code over 10 minutes | ||
- '{__name__="code:apiserver_request_count:rate:sum"}' | ||
- '{__name__="code:apiserver_request_total:rate:sum"}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- this change is for 4.6 only, it might be work backporting to 4.5 @s-urbaniak thoughts?
- As its a recording rule, we have to change that first, this is defined here
cluster-monitoring-operator/jsonnet/rules.jsonnet
Lines 311 to 312 in 8871868
expr: 'sum(rate(apiserver_request_total{job="apiserver"}[10m])) BY (code)', record: 'code:apiserver_request_total:rate:sum', make generate-in-docker
make target and commit results.
Let me know if there is something not clear, happy to help out, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a backport to 4.5 sounds reasonable to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilic: Sorry, I do not understand. The file you are referencing has the name already changed (as I can see in your snippet). It was changed in this commit: 5b34fdb
However I should probably also change these files:
./Documentation/telemeter_query
./Documentation/sample-metrics.md
./Documentation/data-collection.md
./Documentation/timeseries.txt
Maybe not the examples in the doc, but the doc text for sure, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, you are right, I copy pasted the fixed thing :D Even I fixed it :D
All you need to do is run make docs
target. It should fix all the above.
Don't think its a user facing change no, so all good. |
/ok-to-test |
/retest |
…equest_total Currently the old metric name is used in the whitelist resulting in the metric data not being collected. This fixes it by replacing the name in the manifest with the new one.
bcdaf19
to
378ccae
Compare
@openshift/openshift-team-olm FYI this metric is fixed now. You need to replace this in any dashboards or queries. |
/hold Last time we changed a metric from allowlist it caused some problems for people, want to bring this up to wider audience, maybe we could even avoid this by creating a new recording rule with old name that uses new metric. The con to creating a new recording rule is we will breach the best practice around recording rule, since underlying metric changed which was caused in kubernetes, this is not something we should support. Holding until we finish the discussion. |
Best practice is to not use recording rules for renaming of things, so it's obvious from their name what it represents. We should follow this best practice here otherwise we will one day end up in a mess of data and we don't understand where it's all originating from. The tooling on the receiving infrastructure has been improves so removals from the allowlist are handled better, so that part is nothing to worry about. |
/retest |
/cherry-pick release-4.5 |
@lilic: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks! We should backport this to 4.4 and 4.5 as well.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lilic, martinpovolny The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@lilic: #821 failed to apply on top of branch "release-4.5":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Let's try this again |
@paulfantom: #821 failed to apply on top of branch "release-4.5":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently the old metric name is used in the whitelist resulting in the
metric data not being collected.
This fixes it by replacing the name in the manifest with the new one.
Related kubernetes PR: kubernetes/kubernetes#76496