New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506
RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506
Conversation
@kahowell: This pull request references RHOBS-995 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hi @kahowell. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
d851af5
to
f7039dc
Compare
I'll follow-up once we get this merged :) |
// 2. x86_64 nodes that do not show hyperthreading need the cores value adjusted to account for 2 threads per core (* 0.5). | ||
// 3. Other CPU architectures are assumed to have accurate values in node_role_os_version_machine:cpu_capacity_cores:sum. | ||
// 1. x86_64 nodes need the cores value adjusted to account for 2 threads per core (* 0.5). | ||
// 2. Other CPU architectures are assumed to have accurate values in cluster:capacity_cpu_cores:sum. | ||
record: 'cluster:capacity_effective_cpu_cores', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the recording expression only uses cluster:capacity_cpu_cores:sum
, shouldn't we also modify cluster:cpu_capacity_cores:_id
to use the same metric? It would also simplify the test inputs since the tests would properly generate cluster:capacity_cpu_cores:_id
.
Simplify by dividing all x86_64 cpu counts by 2. Note that this takes advantage of the way that the SKUs are structured, where the capacity is written as multiples of "2 cores or 4vCPUs". One difference in how this simplification works is that with nodes reporting more than 2 threads-per-core will be counted by CPUs, rather than by cores. When exactly 2 threads-per-core are reported, there is no functional difference, as node_role_os_version_machine:cpu_capacity_cores:sum already divides CPUs by 2. This adds testing similar to what's in cluster-monitoring-operator, covering only the `cluster:capacity_effective_cpu_cores` rule. I had to update the prometheus version, as promtool was too old and incorrectly flagging existing rules. I added a note about rule tests to the README. I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.
f7039dc
to
20c1a28
Compare
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
@kahowell looks good to me, thanks!
I've put a /hold
on the PR so you can merge it when you feel it's ready from your side.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kahowell, simonpasquier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
/test benchmark |
/test integration |
2 similar comments
/test integration |
/test integration |
I'll investigate separately why the integration job fails (it's unrelated to this PR). |
/hold cancel |
/test integration |
2 similar comments
/test integration |
/test integration |
|
@kahowell: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[ART PR BUILD NOTIFIER] This PR has been included in build telemeter-container-v4.16.0-202402141610.p0.g4220ebc.assembly.stream.el9 for distgit telemeter. |
Signed-off-by: Moad Zardab <mzardab@redhat.com>
Signed-off-by: Moad Zardab <mzardab@redhat.com>
Signed-off-by: Moad Zardab <mzardab@redhat.com>
Simplify by dividing all x86_64 cpu counts by 2.
Note that this takes advantage of the way that the SKUs are structured, where the capacity is written as multiples of "2 cores or 4vCPUs".
One difference in how this simplification works is that with nodes reporting more than 2 threads-per-core will be counted by CPUs, rather than by cores.
When exactly 2 threads-per-core are reported, there is no functional difference, as node_role_os_version_machine:cpu_capacity_cores:sum already divides CPUs by 2.
This adds testing similar to what's in cluster-monitoring-operator, covering only the
cluster:capacity_effective_cpu_cores
rule.I had to update the prometheus version, as promtool was too old and incorrectly flagging existing rules.
I added a note about rule tests to the README.
I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.