Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-5064: E2E: Per Core Runtime Tuning Test automation #509

Merged

Conversation

mrniranjan
Copy link
Contributor

Signed-off-by: Niranjan M.R mrniranjan@redhat.com

@mrniranjan
Copy link
Contributor Author

/retest-required

1 similar comment
@mrniranjan
Copy link
Contributor Author

/retest-required

Copy link
Contributor

@marioferh marioferh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

//First enable HighPowerConsumption
By("Modifying profile")
profile.Spec.WorkloadHints = &performancev2.WorkloadHints{
HighPowerConsumption: pointer.Bool(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test we are checking if we can move from one Hint to another is the system tuned appropriately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can remove the previous one?


})

It("[test_id:54179]Verify System is tuned when reverting from PerPodPowerManagement to HighPowerConsumption", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test we are checking if we can move from one Hint to another is the system tuned appropriately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

checkTunedParameters(workerRTNodes, stalldEnabled, sysctlMap, kernelParameters, rtKernel)
})

It("[test_id:54184]Verify enabling both HighPowerConsumption and PerPodPowerManagment fails", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test must fail, where we are catching the expected error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this test fails, we are checking if it fails. May be i should check the error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's what I mean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marioferh Addressed this on latest patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

)).To(HaveOccurred())
})

It("[test_id:54185]Verify sysfs paramters of guaranteed pod with powersave annotiations", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is of cri-o/cri-o#5927, @bartwensley PTAL?

@marioferh
Copy link
Contributor

/retest-required

1 similar comment
@marioferh
Copy link
Contributor

/retest-required

Comment on lines +1038 to +973
"cpu-c-states.crio.io": "enable",
"cpu-freq-governor.crio.io": "schedutil",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will the default c-state/p-state configuration be for the server running the test? Just wondering if these will have any impact if the system is configured for low power by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley In this case system configuration is what the perPodPowerManagement configures. so intel_pstate is set passive. And the bios is set it's power settings to OS controlled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - and I assume c-states are also enabled in the BIOS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley Yes C-states are enabled in BIOS

Expect(output).To(Equal("schedutil"))
Expect(err).ToNot(HaveOccurred())
}
deleteTestPod(testpod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the pod is deleted, it would be good to check that the cpus that were used by the pod have been set back to their original pm_qos_resume_latency and scaling_governor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will add that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley i have addressed this in the latest patch

}, (cluster.ComputeTestTimeout(30*time.Second, RunningOnSingleNode)), 5*time.Second).ShouldNot(BeEmpty(),
fmt.Sprintf("cannot find cgroup for container %q", containerID))

By("Checking what CPU the pod is using")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is too much, but I wonder if it would be useful to check that cpus not in use by the pod did not have their pm_qos_resume_latency_us or scaling_governor modified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack will check that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley i have addressed this in the latest patch

deleteTestPod(testpod)
})

It("[test_id:54186] Verify sysfs paramters of guaranteed pod with performance annotiations", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments from the previous testcase would apply to this testcase as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley i have addressed this in the latest patch

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2022
@mrniranjan mrniranjan force-pushed the pcrtuning_automation branch 2 times, most recently from 52bcf1a to f59fd2a Compare November 28, 2022 12:56
@mrniranjan
Copy link
Contributor Author

/test e2e-gcp-pao

1 similar comment
@marioferh
Copy link
Contributor

/test e2e-gcp-pao

}, time.Minute, 5*time.Second).Should(ContainSubstring("HighPowerConsumption and PerPodPowerManagement can not be both enabled"))
})

It("[test_id:54185] Verify sysfs paramters of guaranteed pod with powersave annotiations", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling - parameters and annotations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley addressed this in the latest patch

}

//checkCpuGovernors Checks power settings of the cpus
func checkCpuGovernors(cpus []int, targetNode *corev1.Node, pm_qos string, governor string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a better name might be checkCpuGovernorsAndResumeLatency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley addressed this in the latest patch

Expect(err).ToNot(HaveOccurred())
cpus, err := cpuset.Parse(output)
targetCpus := cpus.ToSlice()
// Verify cpus assingned to the pod have performance powersettings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling - assigned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley addressed this in the latest patch

cpus, err := cpuset.Parse(output)
targetCpus := cpus.ToSlice()
// Verify cpus assingned to the pod have performance powersettings
By("Verify the rest of the cpus donot haver powersave settings")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be down one line. It should also say "have default settings".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartwensley addressed this in the latest patch

@marioferh
Copy link
Contributor

/test e2e-gcp-pao

Copy link
Contributor

@marioferh marioferh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

//First enable HighPowerConsumption
By("Modifying profile")
profile.Spec.WorkloadHints = &performancev2.WorkloadHints{
HighPowerConsumption: pointer.Bool(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can remove the previous one?


})

It("[test_id:54179]Verify System is tuned when reverting from PerPodPowerManagement to HighPowerConsumption", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

checkTunedParameters(workerRTNodes, stalldEnabled, sysctlMap, kernelParameters, rtKernel)
})

It("[test_id:54184]Verify enabling both HighPowerConsumption and PerPodPowerManagment fails", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 30, 2022
@mrniranjan
Copy link
Contributor Author

/retest-required

Comment on lines 1177 to 1115
// Verify cpus not assigned to the pod have default settings
err = checkCpuGovernorsAndResumeLatency(targetCpus, &workerRTNodes[0], "n/a", "performance")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking the targetCpus, which are the ones assigned to the pod - right? So the comment is wrong and the check is in the wrong place - it should be on line 1176 - above the "Verify the rest of the cpus donot have powersave settings).

Expect(err).ToNot(HaveOccurred())
cpus, err := cpuset.Parse(output)
targetCpus := cpus.ToSlice()
By("Verify the rest of the cpus donot haver powersave settings")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect - the code is not checking whether the remaining cpus "donot have powersave settings" - it is checking that they have the default settings, which are a mix (c-states are enabled and governor is performance).

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 30, 2022
@bartwensley
Copy link
Contributor

/lgtm

Copy link
Contributor

@bartwensley bartwensley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2022
@yanirq
Copy link
Contributor

yanirq commented Feb 6, 2023

/retest

@yanirq
Copy link
Contributor

yanirq commented Feb 6, 2023

/hold - check if we need to add skip for lower than 8 cpus

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm Indicates that a PR is ready to be merged. labels Feb 6, 2023
@mrniranjan
Copy link
Contributor Author

/test e2e-upgrade

@mrniranjan
Copy link
Contributor Author

@yanirq can you check the latest ci results and also the changes i have added a new commit to the changes done . So if its good i will squash the commits.

@mrniranjan
Copy link
Contributor Author

/retest-required

1 similar comment
@mrniranjan
Copy link
Contributor Author

/retest-required

@mrniranjan
Copy link
Contributor Author

/test e2e-gcp-pao-updating-profile

@yanirq
Copy link
Contributor

yanirq commented Feb 8, 2023

@mrniranjan please rebase this PR first on top of the latest changes , I would like to see an updated test run

Skip tests where we create a guaranteed pod with
powersave and performance annotations as they requires
baremetal with powermanagement settings in BIOS
make changes required for ginkgov2

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>

Add new function to check harware capability

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>

remove hardcoded online cpu count.

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>

Minor fix reduce the total number of cpus from 80 to 32

Also add the checkHardwareCapability to PerPodPowermanagement
test cases instead of offline cpu test cases

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>

Use constant for totalCpus

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>

Skip the tests if number of online cpus is not more than 32

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>
@mrniranjan
Copy link
Contributor Author

/retest-required

2 similar comments
@mrniranjan
Copy link
Contributor Author

/retest-required

@mrniranjan
Copy link
Contributor Author

/retest-required

@yanirq
Copy link
Contributor

yanirq commented Feb 13, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2023
@mrniranjan
Copy link
Contributor Author

@yanirq can we remove the Do not merge/hold flag ?

@yanirq
Copy link
Contributor

yanirq commented Feb 14, 2023

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2023
@mrniranjan
Copy link
Contributor Author

/test e2e-gcp-pao-updating-profile

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 93c97bc and 2 for PR HEAD 9eaeb6c in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 15, 2023

@mrniranjan: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 9382029 into openshift:master Feb 15, 2023
@openshift-ci-robot
Copy link
Contributor

@mrniranjan: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-5064 has been moved to the MODIFIED state.

In response to this:

Signed-off-by: Niranjan M.R mrniranjan@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mrniranjan added a commit to mrniranjan/cluster-node-tuning-operator that referenced this pull request Feb 16, 2023
Skip tests where we create a guaranteed pod with
powersave and performance annotations as they requires
baremetal with powermanagement settings in BIOS
make changes required for ginkgov2

Add new function to check harware capability

remove hardcoded online cpu count.

Minor fix reduce the total number of cpus from 80 to 32

Also add the checkHardwareCapability to PerPodPowermanagement
test cases instead of offline cpu test cases

Use constant for totalCpus

Skip the tests if number of online cpus is not more than 32

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>
Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
openshift-merge-robot pushed a commit that referenced this pull request Feb 27, 2023
Skip tests where we create a guaranteed pod with
powersave and performance annotations as they requires
baremetal with powermanagement settings in BIOS
make changes required for ginkgov2

Add new function to check harware capability

remove hardcoded online cpu count.

Minor fix reduce the total number of cpus from 80 to 32

Also add the checkHardwareCapability to PerPodPowermanagement
test cases instead of offline cpu test cases

Use constant for totalCpus

Skip the tests if number of online cpus is not more than 32

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>
Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request May 23, 2023
Skip tests where we create a guaranteed pod with
powersave and performance annotations as they requires
baremetal with powermanagement settings in BIOS
make changes required for ginkgov2



Add new function to check harware capability



remove hardcoded online cpu count.



Minor fix reduce the total number of cpus from 80 to 32

Also add the checkHardwareCapability to PerPodPowermanagement
test cases instead of offline cpu test cases



Use constant for totalCpus



Skip the tests if number of online cpus is not more than 32

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>
Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request Jun 13, 2023
Skip tests where we create a guaranteed pod with
powersave and performance annotations as they requires
baremetal with powermanagement settings in BIOS
make changes required for ginkgov2



Add new function to check harware capability



remove hardcoded online cpu count.



Minor fix reduce the total number of cpus from 80 to 32

Also add the checkHardwareCapability to PerPodPowermanagement
test cases instead of offline cpu test cases



Use constant for totalCpus



Skip the tests if number of online cpus is not more than 32

Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>
Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants