New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-29596: e2e: Enhance tests related to crio annotations #955
Conversation
mrniranjan
commented
Feb 14, 2024
- Add new test case for testing cpu-quota.crio.io annotation
- Move the existing cpu load balancing test under single container where all the tests related to crio annotation exist
- Modify tests to be executed under different runtimes and cgroup versions
@mrniranjan: This pull request references Jira Issue OCPBUGS-29596, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (mniranja@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Period string | ||
} | ||
|
||
type CpuStat struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each struct should represent a cgroup controller. i.e cpu
,cpuset
,memory
,pids
...
stats
is not a controller, it's a field under the cpu
controller
Could you please elaborate why this change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this before stat has multi line values where the number files are different in v1 and v2 and with current implementation reading stat is difficult where the current implementation tries to read multiple files and thats a problem. And we wanted stat values to be be stored in map . So in the earlier discussion we decide stat to be handled separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrniranjan Can't you simply read the cpu.stat
in a separate call and store it in cfg.cpuStat
?
Add the logic here:
https://github.com/openshift/cluster-node-tuning-operator/blob/master/test/e2e/performanceprofile/functests/utils/cgroup/v1/v1.go#L45
And here:
https://github.com/openshift/cluster-node-tuning-operator/blob/master/test/e2e/performanceprofile/functests/utils/cgroup/v2/v2.go#L45
something like:
cmd = []string{
"/bin/cat",
dirPath + "/cpu.stat",
}
b, err := pods.ExecCommandOnPod(cm.k8sClient, pod, containerName, cmd)
// do some parsing logic and store in map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in the latest commit
interfacevalues := strings.Split(output, "\r\n") | ||
// cpu.stat always contains 3 stats usage_usec, user_usec, system_usec | ||
// only when cpu controller is enabled other stats like nr_periods etc are enabled | ||
if len(interfacevalues) < 4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more robust way to check whether cpu controller enabled on cgroup v2 is by checking the cgroup.controllers
and cgroup.subtree_control
files.
From the documentation https://man7.org/linux/man-pages/man7/cgroups.7.html:
Active cgroups must be specified via the files
cgroup.controllers and cgroup.subtree_control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only a suggestion, if we want to keep the function identical this is also an option but I think it would be better to move the common (identical for v1 and v2) functions into a separate package.
This can be handled in a separate PR as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i will address this in separate PR , for now i will modify the message.
@@ -507,11 +381,12 @@ var _ = Describe("[rfe_id:27363][performance] CPU Management", Ordered, func() { | |||
|
|||
containersCgroups = strings.Trim(containersCgroups, "\n") | |||
containersCgroupsDirs := strings.Split(containersCgroups, "\n") | |||
Expect(len(containersCgroupsDirs)).To(Equal(2), "unexpected amount of containers cgroups") | |||
//Expect(len(containersCgroupsDirs)).To(Equal(2), "unexpected amount of containers cgroups") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is still here, probably some rebase issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
@@ -654,6 +529,126 @@ var _ = Describe("[rfe_id:27363][performance] CPU Management", Ordered, func() { | |||
) | |||
|
|||
}) | |||
When("Crio Annotations", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Context
is more readable here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
if len(defaultCpuNotInSchedulingDomains) > 0 { | ||
pods, err := pods.GetPodsOnNode(context.TODO(), workerRTNode.Name) | ||
if err != nil { | ||
testlog.Warningf("cannot list pods on %q: %v", workerRTNode.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is warning enough here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i am not sure too, though this code is mostly used as debug to know that we found some pods before the test starts and for some reason we are unable to list the pods before throwing the error: " the test expects all CPUs within a scheduling domain when starting"
testlog.Infof("deleting test pod %s/%s UID=%q", testpod.Namespace, testpod.Name, podUID) | ||
err := testclient.Client.Get(ctx, client.ObjectKeyFromObject(testpod), testpod) | ||
Expect(err).ToNot(HaveOccurred()) | ||
//testpodUID := testpod.UID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
err := getter.Container(ctx, testpod, testpod.Spec.Containers[0].Name, cpusetCfg) | ||
Expect(err).ToNot(HaveOccurred()) | ||
// Get cpus used by the container | ||
tasksetcmd := []string{"taskset", "-pc", "1"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's always better to provide full path for the command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
parts := strings.Split(strings.TrimSpace(podCpusStr), ":") | ||
testpodCpus := strings.TrimSpace(parts[1]) | ||
testlog.Infof("%v pod is using %v cpus", testpod.Name, string(testpodCpus)) | ||
Expect(cpusetCfg.Cpus).To(Equal(testpodCpus), "cpuset.cpus not matching the process affinity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison here is done between strings but they can be logically equals while unequal by string.
Example:
setA: 1-3,7
setB: 1,2,3,7
logically equals but the strings are different.
parse them into cpuset
and then compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
/retest-required |
55eeed9
to
36c5e63
Compare
if cgroupV2 { | ||
cpusetPath = "/rootfs/sys/fs/cgroup/kubepods.slice" | ||
} else { | ||
cpusetPath = "/rootfs/sys/fs/cgroup/cpuset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On v1 the kubepods.slice
part exists as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, on v1 the paths differ
podAffinityCpuset, err := cpuset.Parse(testpodCpus) | ||
Expect(err).ToNot(HaveOccurred(), "Unable to parse cpus %s used by %s pod", testpodCpus, testpod.Name) | ||
cgroupCpuset, err := cpuset.Parse(cpusetCfg.Cpus) | ||
Expect(err).ToNot(HaveOccurred(), "Unable to parse cpus from cgroups.cpugset") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpugset or cpuset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
@@ -77,7 +77,7 @@ func (cm *ControllersManager) CpuStat(ctx context.Context, pod *corev1.Pod, cont | |||
// cpu.stat always contains 3 stats usage_usec, user_usec, system_usec | |||
// only when cpu controller is enabled other stats like nr_periods etc are enabled | |||
if len(interfacevalues) < 4 { | |||
return nil, fmt.Errorf("CPU Controller is not enabled") | |||
return nil, fmt.Errorf("cpu.stat doesn't contain all the cpu metrics required") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this was restored in the later commit:
8092af0#diff-f2536864a27be1f85973bce12efd1a1447d9fab4bb03170997c538c5e5650e81R81
if not intended to change it here, I think better to discard this line change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
@@ -57,9 +57,36 @@ func (cm *ControllersManager) Cpu(ctx context.Context, pod *corev1.Pod, containe | |||
output := strings.Split(string(b), "\r\n") | |||
cfg.Quota = output[0] | |||
cfg.Period = output[1] | |||
cfg.Stat, err = stat(cm.k8sClient, pod, containerName, childName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check the error
value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
/test e2e-gcp-pao-workloadhints |
/test e2e-gcp-pao |
/retest-required |
|
||
By("Starting the pod") | ||
testpod.Spec.NodeSelector = testutils.NodeSelectorLabels | ||
testpod.Spec.Containers[0].Image = "quay.io/mniranja/busycpus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably replace this quay url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanirq fixed it, using http://quay.io/ocp-edge-qe/busycpus
2fbc81d
to
d2a3e80
Compare
/retest-required |
@mrniranjan does this PR warrant all the commits ? (9 commits) perhaps we should squash them |
/lgtm |
1. Add new test case for testing cpu-quota.crio.io annotation 2. Move the existing cpu load balancing test under single container where all the tests related to crio annotation exist 3. Modify tests to be executed under different runtimes and cgroup versions Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> fix the cpu controller path Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> reword test case descriptions and add test case id Fixing review comments 1. Use cpuset to compare 2 cpu string instead of string comparision 2. Remove unnecessary code comments Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> Modify the error message when all cpu metric are not available in cpu.stat Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> make Stat part of Cpu structure replace cpuStat method with stat function Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> minor typo fixes and remove lines with comments Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> fix the path of the cpu controller in cgroupv2 env Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> use ocp-edge-qe quay repository for busycpus image Signed-off-by: Niranjan M.R <mrniranjan@redhat.com> fetch busycpusImage using environment variables This patch uses env variable IMAGE_REGISTRY and BUSY_CPUS_IMAGE for QE image registry and busycpus image This is required to execute tests in disconnected environment where quay.io is not accessible and have to use internal private registry Signed-off-by: Niranjan M.R <mrniranjan@redhat.com>
@yanirq i have squashed the commits |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrniranjan, yanirq The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mrniranjan: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
@mrniranjan: Jira Issue OCPBUGS-29596: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-29596 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build cluster-node-tuning-operator-container-v4.16.0-202404080642.p0.g5cc1952.assembly.stream.el9 for distgit cluster-node-tuning-operator. |