New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-5064: E2E: Per Core Runtime Tuning Test automation #509
OCPBUGS-5064: E2E: Per Core Runtime Tuning Test automation #509
Conversation
728e28c
to
397284c
Compare
/retest-required |
1 similar comment
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
//First enable HighPowerConsumption | ||
By("Modifying profile") | ||
profile.Spec.WorkloadHints = &performancev2.WorkloadHints{ | ||
HighPowerConsumption: pointer.Bool(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this test we are checking if we can move from one Hint to another is the system tuned appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can remove the previous one?
|
||
}) | ||
|
||
It("[test_id:54179]Verify System is tuned when reverting from PerPodPowerManagement to HighPowerConsumption", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this test we are checking if we can move from one Hint to another is the system tuned appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
checkTunedParameters(workerRTNodes, stalldEnabled, sysctlMap, kernelParameters, rtKernel) | ||
}) | ||
|
||
It("[test_id:54184]Verify enabling both HighPowerConsumption and PerPodPowerManagment fails", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test must fail, where we are catching the expected error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this test fails, we are checking if it fails. May be i should check the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's what I mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marioferh Addressed this on latest patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
)).To(HaveOccurred()) | ||
}) | ||
|
||
It("[test_id:54185]Verify sysfs paramters of guaranteed pod with powersave annotiations", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is of cri-o/cri-o#5927, @bartwensley PTAL?
/retest-required |
1 similar comment
/retest-required |
"cpu-c-states.crio.io": "enable", | ||
"cpu-freq-governor.crio.io": "schedutil", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will the default c-state/p-state configuration be for the server running the test? Just wondering if these will have any impact if the system is configured for low power by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley In this case system configuration is what the perPodPowerManagement configures. so intel_pstate is set passive. And the bios is set it's power settings to OS controlled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - and I assume c-states are also enabled in the BIOS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley Yes C-states are enabled in BIOS
Expect(output).To(Equal("schedutil")) | ||
Expect(err).ToNot(HaveOccurred()) | ||
} | ||
deleteTestPod(testpod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the pod is deleted, it would be good to check that the cpus that were used by the pod have been set back to their original pm_qos_resume_latency and scaling_governor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will add that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley i have addressed this in the latest patch
}, (cluster.ComputeTestTimeout(30*time.Second, RunningOnSingleNode)), 5*time.Second).ShouldNot(BeEmpty(), | ||
fmt.Sprintf("cannot find cgroup for container %q", containerID)) | ||
|
||
By("Checking what CPU the pod is using") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is too much, but I wonder if it would be useful to check that cpus not in use by the pod did not have their pm_qos_resume_latency_us or scaling_governor modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack will check that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley i have addressed this in the latest patch
deleteTestPod(testpod) | ||
}) | ||
|
||
It("[test_id:54186] Verify sysfs paramters of guaranteed pod with performance annotiations", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments from the previous testcase would apply to this testcase as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley i have addressed this in the latest patch
397284c
to
9427ff0
Compare
52bcf1a
to
f59fd2a
Compare
/test e2e-gcp-pao |
1 similar comment
/test e2e-gcp-pao |
}, time.Minute, 5*time.Second).Should(ContainSubstring("HighPowerConsumption and PerPodPowerManagement can not be both enabled")) | ||
}) | ||
|
||
It("[test_id:54185] Verify sysfs paramters of guaranteed pod with powersave annotiations", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling - parameters and annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley addressed this in the latest patch
} | ||
|
||
//checkCpuGovernors Checks power settings of the cpus | ||
func checkCpuGovernors(cpus []int, targetNode *corev1.Node, pm_qos string, governor string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a better name might be checkCpuGovernorsAndResumeLatency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley addressed this in the latest patch
Expect(err).ToNot(HaveOccurred()) | ||
cpus, err := cpuset.Parse(output) | ||
targetCpus := cpus.ToSlice() | ||
// Verify cpus assingned to the pod have performance powersettings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling - assigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley addressed this in the latest patch
cpus, err := cpuset.Parse(output) | ||
targetCpus := cpus.ToSlice() | ||
// Verify cpus assingned to the pod have performance powersettings | ||
By("Verify the rest of the cpus donot haver powersave settings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be down one line. It should also say "have default settings".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartwensley addressed this in the latest patch
/test e2e-gcp-pao |
f59fd2a
to
44983e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
//First enable HighPowerConsumption | ||
By("Modifying profile") | ||
profile.Spec.WorkloadHints = &performancev2.WorkloadHints{ | ||
HighPowerConsumption: pointer.Bool(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can remove the previous one?
|
||
}) | ||
|
||
It("[test_id:54179]Verify System is tuned when reverting from PerPodPowerManagement to HighPowerConsumption", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
checkTunedParameters(workerRTNodes, stalldEnabled, sysctlMap, kernelParameters, rtKernel) | ||
}) | ||
|
||
It("[test_id:54184]Verify enabling both HighPowerConsumption and PerPodPowerManagment fails", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
/retest-required |
// Verify cpus not assigned to the pod have default settings | ||
err = checkCpuGovernorsAndResumeLatency(targetCpus, &workerRTNodes[0], "n/a", "performance") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checking the targetCpus, which are the ones assigned to the pod - right? So the comment is wrong and the check is in the wrong place - it should be on line 1176 - above the "Verify the rest of the cpus donot have powersave settings).
Expect(err).ToNot(HaveOccurred()) | ||
cpus, err := cpuset.Parse(output) | ||
targetCpus := cpus.ToSlice() | ||
By("Verify the rest of the cpus donot haver powersave settings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect - the code is not checking whether the remaining cpus "donot have powersave settings" - it is checking that they have the default settings, which are a mix (c-states are enabled and governor is performance).
44983e3
to
263cb9b
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
/retest |
/hold - check if we need to add skip for lower than 8 cpus |
/test e2e-upgrade |
@yanirq can you check the latest ci results and also the changes i have added a new commit to the changes done . So if its good i will squash the commits. |
ed8fae6
to
0ba9a50
Compare
/retest-required |
1 similar comment
/retest-required |
/test e2e-gcp-pao-updating-profile |
@mrniranjan please rebase this PR first on top of the latest changes , I would like to see an updated test run |
Skip tests where we create a guaranteed pod with powersave and performance annotations as they requires baremetal with powermanagement settings in BIOS make changes required for ginkgov2 Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Add new function to check harware capability Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> remove hardcoded online cpu count. Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Minor fix reduce the total number of cpus from 80 to 32 Also add the checkHardwareCapability to PerPodPowermanagement test cases instead of offline cpu test cases Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Use constant for totalCpus Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Skip the tests if number of online cpus is not more than 32 Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>
0ba9a50
to
9eaeb6c
Compare
/retest-required |
2 similar comments
/retest-required |
/retest-required |
/lgtm |
@yanirq can we remove the Do not merge/hold flag ? |
/hold cancel |
/test e2e-gcp-pao-updating-profile |
@mrniranjan: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@mrniranjan: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-5064 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skip tests where we create a guaranteed pod with powersave and performance annotations as they requires baremetal with powermanagement settings in BIOS make changes required for ginkgov2 Add new function to check harware capability remove hardcoded online cpu count. Minor fix reduce the total number of cpus from 80 to 32 Also add the checkHardwareCapability to PerPodPowermanagement test cases instead of offline cpu test cases Use constant for totalCpus Skip the tests if number of online cpus is not more than 32 Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
Skip tests where we create a guaranteed pod with powersave and performance annotations as they requires baremetal with powermanagement settings in BIOS make changes required for ginkgov2 Add new function to check harware capability remove hardcoded online cpu count. Minor fix reduce the total number of cpus from 80 to 32 Also add the checkHardwareCapability to PerPodPowermanagement test cases instead of offline cpu test cases Use constant for totalCpus Skip the tests if number of online cpus is not more than 32 Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
Skip tests where we create a guaranteed pod with powersave and performance annotations as they requires baremetal with powermanagement settings in BIOS make changes required for ginkgov2 Add new function to check harware capability remove hardcoded online cpu count. Minor fix reduce the total number of cpus from 80 to 32 Also add the checkHardwareCapability to PerPodPowermanagement test cases instead of offline cpu test cases Use constant for totalCpus Skip the tests if number of online cpus is not more than 32 Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
Skip tests where we create a guaranteed pod with powersave and performance annotations as they requires baremetal with powermanagement settings in BIOS make changes required for ginkgov2 Add new function to check harware capability remove hardcoded online cpu count. Minor fix reduce the total number of cpus from 80 to 32 Also add the checkHardwareCapability to PerPodPowermanagement test cases instead of offline cpu test cases Use constant for totalCpus Skip the tests if number of online cpus is not more than 32 Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Co-authored-by: Niranjan M.R <mrniranjan@redhat.com>
Signed-off-by: Niranjan M.R mrniranjan@redhat.com