New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NO-JIRA: e2e: set performance profile cpus using env vars #909
NO-JIRA: e2e: set performance profile cpus using env vars #909
Conversation
8a1ceb9
to
031e6f5
Compare
Note: The README.md file is generally outdated and needs to be updated with the new suites and instructions on adding and running them, yet this can be addressed later as it is outside of this PR scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at first I was a bit taken aback by the approach here but after a more careful review and some thinking I'm growing to like it.
Do you have already a sketch about how the changes will be used e.g in openshift/release
flows? (e.g. which changes to prow config will be needed?)
031e6f5
to
9dd5e63
Compare
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile and considering the u/s CI runs on AWS-generated clusters with constant CPU settings, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should environment settings change. Signed-off-by: shajmakh <shajmakh@redhat.com>
Thank you for reviewing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
so this is the minimal solution to adapt the builtin hardcoded profile to different environments setting up the cpu setup correctly and consuming all CPUs.
Many other tradeoffs are possible ranging from pre-building a profile and letting the tests consume it (which should already be possible today) to fully automatically compute the right baseline CPU allocation autodetecting from the cluster at each run
where to draw the line is exactly one of the many cases about finding the correct tradeoff.
I think this PR can unblock us and enable us to further explore the solution space if we can and want (we really should FWIW) so I'm fine with this initial step.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani, shajmakh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest-required |
/hold |
right, sorry I missed it. Also, it seems we have a clear case of #881 also affecting our suites: |
9dd5e63
to
1d1907d
Compare
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO u/s CI is using ipi deployment on vms and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should environment settings change in gcp. Signed-off-by: shajmakh <shajmakh@redhat.com>
fixed in #911 |
@shajmakh: The
The following commands are available to trigger optional jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ci/prow/e2e-gcp-pao passing is already a good sign. |
@shajmakh: The
The following commands are available to trigger optional jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-gcp-pao-updating-profile |
/retitle NO-JIRA: e2e: set performance profile cpus using env vars |
@shajmakh: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold cancel |
we need this in all supported versions |
@shajmakh: once the present PR merges, I will cherry-pick it on top of release-4.15 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@shajmakh: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@shajmakh: new pull request created: #934 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build cluster-node-tuning-operator-container-v4.16.0-202402020041.p0.g64ed3fe.assembly.stream for distgit cluster-node-tuning-operator. |
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile and considering the u/s CI runs on AWS-generated clusters with constant CPU settings, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should environment settings change. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO u/s CI is using ipi deployment on vms and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should environment settings change in gcp. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO u/s CI is using ipi deployment on vms and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should environment settings change in gcp. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile. Considering the u/s CI runs on BM-node clusters with constant CPU settings (0-111), according to the PPC CPU calculation using must-gather data, we set the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should cluster nodes' settings change. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO and cnf-features-deploy u/s CI is using vm-node clusters and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should node cpu settings change. in gcp. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO and cnf-features-deploy u/s CI is using vm-node clusters and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should node cpu settings change. in gcp. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO and cnf-features-deploy u/s CI is using vm-node clusters and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should node cpu settings change. in gcp. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO and cnf-features-deploy u/s CI is using vm-node clusters and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should node cpu settings change. in gcp. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO and cnf-features-deploy u/s CI is using vm-node clusters and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should node cpu settings change. in gcp. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile. Considering the u/s CI runs on BM-node clusters with constant CPU settings (0-111), according to the PPC CPU calculation using must-gather data, we set the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should cluster nodes' settings change. Signed-off-by: shajmakh <shajmakh@redhat.com>
Following openshift/cluster-node-tuning-operator#909, we now can provide the CPU specifications for the performance profile that will be used as a base for the functional tests. Given that the NTO and cnf-features-deploy u/s CI is using vm-node clusters and that the deployment settings are constant, according to the PPC CPU calculation using must-gather data, we export the result in environment variables indicating reserved and isolated CPUs. Note: this will need to be maintained should node cpu settings change. in gcp. Signed-off-by: Shereen Haj <shajmakh@redhat.com>
We've been observing lately that some tests that involve disabling load balancing are failing (like 32646) because the expected result does not have specific anticipated CPUs. After investigation, it turns out that one factor is the profile configuration of the CPU distribution.
PAO functional tests configure fixed CPU values under the PP. This is considered misconfiguration, especially when the system has more than 4 CPUs, and there is no guarantee that the functionality of the performance profile controller will work adequately with not all cpus reflected in the CPU section in the PP.
To resolve this complication, we are introducing new environment variables RESERVED_CPU_SET, ISOLATED_CPU_SET, OFFLINED_CPU_SET, should be set the profile would use them instead of the defaults.