RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506

kahowell · 2024-02-08T18:37:00Z

Simplify by dividing all x86_64 cpu counts by 2.

Note that this takes advantage of the way that the SKUs are structured, where the capacity is written as multiples of "2 cores or 4vCPUs".

One difference in how this simplification works is that with nodes reporting more than 2 threads-per-core will be counted by CPUs, rather than by cores.

When exactly 2 threads-per-core are reported, there is no functional difference, as node_role_os_version_machine:cpu_capacity_cores:sum already divides CPUs by 2.

This adds testing similar to what's in cluster-monitoring-operator, covering only the cluster:capacity_effective_cpu_cores rule.

I had to update the prometheus version, as promtool was too old and incorrectly flagging existing rules.

I added a note about rule tests to the README.

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

openshift-ci-robot · 2024-02-08T18:37:03Z

@kahowell: This pull request references RHOBS-995 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Simplify by dividing all x86_64 cpu counts by 2.

Note that this takes advantage of the way that the SKUs are structured, where the capacity is written as multiples of "2 cores or 4vCPUs".

One difference in how this simplification works is that with nodes reporting more than 2 threads-per-core will be counted by CPUs, rather than by cores.

When exactly 2 threads-per-core are reported, there is no functional difference, as node_role_os_version_machine:cpu_capacity_cores:sum already divides CPUs by 2.

This adds testing similar to what's in cluster-monitoring-operator, covering only the cluster:capacity_effective_cpu_cores rule.

I had to update the prometheus version, as promtool was too old and incorrectly flagging existing rules.

I added a note about rule tests to the README.

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-02-08T18:38:12Z

Hi @kahowell. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kahowell · 2024-02-09T18:37:55Z

d851af5..f7039dc: rebase

simonpasquier · 2024-02-12T14:33:32Z

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

I'll follow-up once we get this merged :)

Makefile

simonpasquier · 2024-02-12T14:38:18Z

jsonnet/telemeter/rules.libsonnet

-              // 2. x86_64 nodes that do not show hyperthreading need the cores value adjusted to account for 2 threads per core (* 0.5).
-              // 3. Other CPU architectures are assumed to have accurate values in node_role_os_version_machine:cpu_capacity_cores:sum.
+              // 1. x86_64 nodes need the cores value adjusted to account for 2 threads per core (* 0.5).
+              // 2. Other CPU architectures are assumed to have accurate values in cluster:capacity_cpu_cores:sum.
              record: 'cluster:capacity_effective_cpu_cores',


since the recording expression only uses cluster:capacity_cpu_cores:sum, shouldn't we also modify cluster:cpu_capacity_cores:_id to use the same metric? It would also simplify the test inputs since the tests would properly generate cluster:capacity_cpu_cores:_id.

Simplify by dividing all x86_64 cpu counts by 2. Note that this takes advantage of the way that the SKUs are structured, where the capacity is written as multiples of "2 cores or 4vCPUs". One difference in how this simplification works is that with nodes reporting more than 2 threads-per-core will be counted by CPUs, rather than by cores. When exactly 2 threads-per-core are reported, there is no functional difference, as node_role_os_version_machine:cpu_capacity_cores:sum already divides CPUs by 2. This adds testing similar to what's in cluster-monitoring-operator, covering only the `cluster:capacity_effective_cpu_cores` rule. I had to update the prometheus version, as promtool was too old and incorrectly flagging existing rules. I added a note about rule tests to the README. I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

…s:sum

simonpasquier · 2024-02-12T16:11:06Z

/ok-to-test
/lgtm

simonpasquier

/lgtm
/hold
@kahowell looks good to me, thanks!
I've put a /hold on the PR so you can merge it when you feel it's ready from your side.

openshift-ci · 2024-02-12T16:17:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kahowell, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

simonpasquier · 2024-02-13T08:11:03Z

/retest-required

simonpasquier · 2024-02-13T09:44:06Z

/test benchmark

simonpasquier · 2024-02-13T13:45:49Z

/test integration

simonpasquier · 2024-02-13T16:14:50Z

/test integration

simonpasquier · 2024-02-14T07:47:12Z

/test integration

simonpasquier · 2024-02-14T08:02:43Z

I'll investigate separately why the integration job fails (it's unrelated to this PR).

simonpasquier · 2024-02-14T08:02:49Z

/hold cancel

openshift-ci-robot · 2024-02-14T08:25:59Z

/retest-required

Remaining retests: 0 against base HEAD 323b9e3 and 2 for PR HEAD 20c1a28 in total

simonpasquier · 2024-02-14T09:23:42Z

/test integration

simonpasquier · 2024-02-14T10:30:49Z

/test integration

simonpasquier · 2024-02-14T12:57:06Z

/test integration

simonpasquier · 2024-02-14T13:53:42Z

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

I'll follow-up once we get this merged :)

openshift/release#48786

openshift-ci · 2024-02-14T15:11:26Z

@kahowell: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2024-02-14T17:20:37Z

[ART PR BUILD NOTIFIER]

This PR has been included in build telemeter-container-v4.16.0-202402141610.p0.g4220ebc.assembly.stream.el9 for distgit telemeter.
All builds following this will include this PR.

Signed-off-by: Moad Zardab <mzardab@redhat.com>

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 8, 2024

openshift-ci bot requested review from simonpasquier and thibaultmg February 8, 2024 18:37

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 8, 2024

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2024

kahowell force-pushed the smt2_cores_adjustment_simplification branch from d851af5 to f7039dc Compare February 9, 2024 18:37

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2024

simonpasquier reviewed Feb 12, 2024

View reviewed changes

kahowell added 2 commits February 12, 2024 10:30

Define cluster:cpu_capacity_cores:_id using cluster:capacity_cpu_core…

20c1a28

…s:sum

kahowell force-pushed the smt2_cores_adjustment_simplification branch from f7039dc to 20c1a28 Compare February 12, 2024 15:33

openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 12, 2024

simonpasquier reviewed Feb 12, 2024

View reviewed changes

openshift-ci bot assigned simonpasquier Feb 12, 2024

openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Feb 12, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2024

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2024

openshift-merge-bot bot merged commit 4220ebc into openshift:master Feb 14, 2024
10 checks passed

moadz added a commit to moadz/configuration that referenced this pull request Feb 19, 2024

Updating telemeter rules from openshift/telemeter#506

e38466e

Signed-off-by: Moad Zardab <mzardab@redhat.com>

moadz added a commit to moadz/configuration that referenced this pull request Feb 19, 2024

Updating telemeter rules from openshift/telemeter#506

c9c7a4f

Signed-off-by: Moad Zardab <mzardab@redhat.com>

moadz added a commit to rhobs/configuration that referenced this pull request Feb 19, 2024

Updating telemeter rules from openshift/telemeter#506 (#682)

8c97fcd

Signed-off-by: Moad Zardab <mzardab@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506

RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506

kahowell commented Feb 8, 2024

openshift-ci-robot commented Feb 8, 2024 •

edited by openshift-ci bot

openshift-ci bot commented Feb 8, 2024

kahowell commented Feb 9, 2024

simonpasquier commented Feb 12, 2024

simonpasquier Feb 12, 2024

simonpasquier commented Feb 12, 2024

simonpasquier left a comment

openshift-ci bot commented Feb 12, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

openshift-ci-robot commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

openshift-ci bot commented Feb 14, 2024

openshift-bot commented Feb 14, 2024

RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506

RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506

Conversation

kahowell commented Feb 8, 2024

openshift-ci-robot commented Feb 8, 2024 • edited by openshift-ci bot

openshift-ci bot commented Feb 8, 2024

kahowell commented Feb 9, 2024

simonpasquier commented Feb 12, 2024

simonpasquier Feb 12, 2024

Choose a reason for hiding this comment

simonpasquier commented Feb 12, 2024

simonpasquier left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Feb 12, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 13, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

openshift-ci-robot commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

simonpasquier commented Feb 14, 2024

openshift-ci bot commented Feb 14, 2024

openshift-bot commented Feb 14, 2024

openshift-ci-robot commented Feb 8, 2024 •

edited by openshift-ci bot