Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOBS-995: Simplify cluster:capacity_effective_cpu_cores, add tests #506

Conversation

kahowell
Copy link
Contributor

@kahowell kahowell commented Feb 8, 2024

Simplify by dividing all x86_64 cpu counts by 2.

Note that this takes advantage of the way that the SKUs are structured, where the capacity is written as multiples of "2 cores or 4vCPUs".

One difference in how this simplification works is that with nodes reporting more than 2 threads-per-core will be counted by CPUs, rather than by cores.

When exactly 2 threads-per-core are reported, there is no functional difference, as node_role_os_version_machine:cpu_capacity_cores:sum already divides CPUs by 2.

This adds testing similar to what's in cluster-monitoring-operator, covering only the cluster:capacity_effective_cpu_cores rule.

I had to update the prometheus version, as promtool was too old and incorrectly flagging existing rules.

I added a note about rule tests to the README.

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 8, 2024

@kahowell: This pull request references RHOBS-995 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Simplify by dividing all x86_64 cpu counts by 2.

Note that this takes advantage of the way that the SKUs are structured, where the capacity is written as multiples of "2 cores or 4vCPUs".

One difference in how this simplification works is that with nodes reporting more than 2 threads-per-core will be counted by CPUs, rather than by cores.

When exactly 2 threads-per-core are reported, there is no functional difference, as node_role_os_version_machine:cpu_capacity_cores:sum already divides CPUs by 2.

This adds testing similar to what's in cluster-monitoring-operator, covering only the cluster:capacity_effective_cpu_cores rule.

I had to update the prometheus version, as promtool was too old and incorrectly flagging existing rules.

I added a note about rule tests to the README.

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 8, 2024
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 8, 2024
Copy link
Contributor

openshift-ci bot commented Feb 8, 2024

Hi @kahowell. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2024
@kahowell kahowell force-pushed the smt2_cores_adjustment_simplification branch from d851af5 to f7039dc Compare February 9, 2024 18:37
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2024
@kahowell
Copy link
Contributor Author

kahowell commented Feb 9, 2024

d851af5..f7039dc: rebase

@simonpasquier
Copy link
Contributor

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

I'll follow-up once we get this merged :)

Makefile Outdated Show resolved Hide resolved
// 2. x86_64 nodes that do not show hyperthreading need the cores value adjusted to account for 2 threads per core (* 0.5).
// 3. Other CPU architectures are assumed to have accurate values in node_role_os_version_machine:cpu_capacity_cores:sum.
// 1. x86_64 nodes need the cores value adjusted to account for 2 threads per core (* 0.5).
// 2. Other CPU architectures are assumed to have accurate values in cluster:capacity_cpu_cores:sum.
record: 'cluster:capacity_effective_cpu_cores',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the recording expression only uses cluster:capacity_cpu_cores:sum, shouldn't we also modify cluster:cpu_capacity_cores:_id to use the same metric? It would also simplify the test inputs since the tests would properly generate cluster:capacity_cpu_cores:_id.

Simplify by dividing all x86_64 cpu counts by 2.

Note that this takes advantage of the way that the SKUs are structured,
where the capacity is written as multiples of "2 cores or 4vCPUs".

One difference in how this simplification works is that with nodes
reporting more than 2 threads-per-core will be counted by CPUs, rather
than by cores.

When exactly 2 threads-per-core are reported, there is no functional
difference, as node_role_os_version_machine:cpu_capacity_cores:sum
already divides CPUs by 2.

This adds testing similar to what's in cluster-monitoring-operator,
covering only the `cluster:capacity_effective_cpu_cores` rule.

I had to update the prometheus version, as promtool was too old and
incorrectly flagging existing rules.

I added a note about rule tests to the README.

I did not update the prow config, because I don't know where to, but
happy to do an update for that given some hints.
@kahowell kahowell force-pushed the smt2_cores_adjustment_simplification branch from f7039dc to 20c1a28 Compare February 12, 2024 15:33
@simonpasquier
Copy link
Contributor

/ok-to-test
/lgtm

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 12, 2024
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
@kahowell looks good to me, thanks!
I've put a /hold on the PR so you can merge it when you feel it's ready from your side.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Feb 12, 2024
Copy link
Contributor

openshift-ci bot commented Feb 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kahowell, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2024
@simonpasquier
Copy link
Contributor

/retest-required

@simonpasquier
Copy link
Contributor

/test benchmark

@simonpasquier
Copy link
Contributor

/test integration

2 similar comments
@simonpasquier
Copy link
Contributor

/test integration

@simonpasquier
Copy link
Contributor

/test integration

@simonpasquier
Copy link
Contributor

I'll investigate separately why the integration job fails (it's unrelated to this PR).

@simonpasquier
Copy link
Contributor

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 323b9e3 and 2 for PR HEAD 20c1a28 in total

@simonpasquier
Copy link
Contributor

/test integration

2 similar comments
@simonpasquier
Copy link
Contributor

/test integration

@simonpasquier
Copy link
Contributor

/test integration

@simonpasquier
Copy link
Contributor

I did not update the prow config, because I don't know where to, but happy to do an update for that given some hints.

I'll follow-up once we get this merged :)

openshift/release#48786

Copy link
Contributor

openshift-ci bot commented Feb 14, 2024

@kahowell: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 4220ebc into openshift:master Feb 14, 2024
10 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build telemeter-container-v4.16.0-202402141610.p0.g4220ebc.assembly.stream.el9 for distgit telemeter.
All builds following this will include this PR.

moadz added a commit to moadz/configuration that referenced this pull request Feb 19, 2024
Signed-off-by: Moad Zardab <mzardab@redhat.com>
moadz added a commit to moadz/configuration that referenced this pull request Feb 19, 2024
Signed-off-by: Moad Zardab <mzardab@redhat.com>
moadz added a commit to rhobs/configuration that referenced this pull request Feb 19, 2024
Signed-off-by: Moad Zardab <mzardab@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants