Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifests/telemetry: replace apiserver_request_count with apiserver_request_total #821

Conversation

martinpovolny
Copy link

Currently the old metric name is used in the whitelist resulting in the
metric data not being collected.
This fixes it by replacing the name in the manifest with the new one.

Related kubernetes PR: kubernetes/kubernetes#76496

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 24, 2020
@openshift-ci-robot
Copy link
Contributor

Hi @martinpovolny. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@martinpovolny
Copy link
Author

Not sure if this qualifies as "user facing" to require a Changelog entry.

# of each http status code over 10 minutes
- '{__name__="code:apiserver_request_count:rate:sum"}'
- '{__name__="code:apiserver_request_total:rate:sum"}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • this change is for 4.6 only, it might be work backporting to 4.5 @s-urbaniak thoughts?
  • As its a recording rule, we have to change that first, this is defined here
    expr: 'sum(rate(apiserver_request_total{job="apiserver"}[10m])) BY (code)',
    record: 'code:apiserver_request_total:rate:sum',
    we use jsonnet to do this, so you would need to change that file and run make generate-in-docker make target and commit results.

Let me know if there is something not clear, happy to help out, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a backport to 4.5 sounds reasonable to me 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilic: Sorry, I do not understand. The file you are referencing has the name already changed (as I can see in your snippet). It was changed in this commit: 5b34fdb

However I should probably also change these files:

./Documentation/telemeter_query
./Documentation/sample-metrics.md
./Documentation/data-collection.md
./Documentation/timeseries.txt

Maybe not the examples in the doc, but the doc text for sure, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, you are right, I copy pasted the fixed thing :D Even I fixed it :D

All you need to do is run make docs target. It should fix all the above.

@lilic
Copy link
Contributor

lilic commented Jun 24, 2020

Not sure if this qualifies as "user facing" to require a Changelog entry.

Don't think its a user facing change no, so all good.

@lilic
Copy link
Contributor

lilic commented Jun 24, 2020

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 24, 2020
@martinpovolny
Copy link
Author

/retest

…equest_total

Currently the old metric name is used in the whitelist resulting in the
metric data not being collected.
This fixes it by replacing the name in the manifest with the new one.
@lilic
Copy link
Contributor

lilic commented Jun 26, 2020

@openshift/openshift-team-olm FYI this metric is fixed now. You need to replace this in any dashboards or queries.

@lilic
Copy link
Contributor

lilic commented Jun 26, 2020

/hold

Last time we changed a metric from allowlist it caused some problems for people, want to bring this up to wider audience, maybe we could even avoid this by creating a new recording rule with old name that uses new metric. The con to creating a new recording rule is we will breach the best practice around recording rule, since underlying metric changed which was caused in kubernetes, this is not something we should support.

Holding until we finish the discussion.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2020
@brancz
Copy link
Contributor

brancz commented Jun 30, 2020

Best practice is to not use recording rules for renaming of things, so it's obvious from their name what it represents. We should follow this best practice here otherwise we will one day end up in a mess of data and we don't understand where it's all originating from. The tooling on the receiving infrastructure has been improves so removals from the allowlist are handled better, so that part is nothing to worry about.

@lilic
Copy link
Contributor

lilic commented Jul 1, 2020

/retest

@lilic
Copy link
Contributor

lilic commented Jul 1, 2020

/cherry-pick release-4.5

@openshift-cherrypick-robot

@lilic: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks! We should backport this to 4.4 and 4.5 as well.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 1, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lilic, martinpovolny

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 1, 2020
@lilic
Copy link
Contributor

lilic commented Jul 1, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 1, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 7739eaf into openshift:master Jul 1, 2020
@openshift-cherrypick-robot

@lilic: #821 failed to apply on top of branch "release-4.5":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	Documentation/data-collection.md
M	Documentation/sample-metrics.md
M	Documentation/telemeter_query
M	manifests/0000_50_cluster_monitoring_operator_04-config.yaml
Falling back to patching base and 3-way merge...
Auto-merging manifests/0000_50_cluster_monitoring_operator_04-config.yaml
Auto-merging Documentation/telemeter_query
CONFLICT (content): Merge conflict in Documentation/telemeter_query
Auto-merging Documentation/sample-metrics.md
CONFLICT (content): Merge conflict in Documentation/sample-metrics.md
Auto-merging Documentation/data-collection.md
Patch failed at 0001 manifests/telemetry: replace apiserver_request_count with apiserver_request_total

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@paulfantom
Copy link
Contributor

Let's try this again
/cherry-pick release-4.5

@openshift-cherrypick-robot

@paulfantom: #821 failed to apply on top of branch "release-4.5":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	Documentation/data-collection.md
M	Documentation/sample-metrics.md
M	Documentation/telemeter_query
M	manifests/0000_50_cluster_monitoring_operator_04-config.yaml
Falling back to patching base and 3-way merge...
Auto-merging manifests/0000_50_cluster_monitoring_operator_04-config.yaml
Auto-merging Documentation/telemeter_query
CONFLICT (content): Merge conflict in Documentation/telemeter_query
Auto-merging Documentation/sample-metrics.md
CONFLICT (content): Merge conflict in Documentation/sample-metrics.md
Auto-merging Documentation/data-collection.md
Patch failed at 0001 manifests/telemetry: replace apiserver_request_count with apiserver_request_total

In response to this:

Let's try this again
/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants