New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e:cpuloadbalance: deflake the test #730
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Tal-or The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR suppose to deflake the test or make the failure consistent. |
e0cb162
to
e252517
Compare
return map from the `getCPUswithLoadBalanceDisabled` function and then remove the nested loop from the check. This is done only to make the test more clear and not contain an actual fix. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The test pod is the only GU pod that requests for cpu-load-balancing disable. This means that all the cpus on the system should be with cpu-load-balancing enable before test starts. We should verify that before the test begin and bail out early if it doesn't Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The pod get deleted during the test and there's only single `It` in the node spec anyway, so the `AfterEach` is not needed Signed-off-by: Talor Itzhak <titzhak@redhat.com>
After the pod gets deleted, all cpus should be back into sched domain, so the check should be simpler. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
e252517
to
8042047
Compare
@@ -343,14 +350,14 @@ var _ = Describe("[rfe_id:27363][performance] CPU Management", Ordered, func() { | |||
return true | |||
} else { | |||
for _, podcpu := range podCpus.ToSlice() { | |||
for _, cpu := range cpusNotinSchedulingDomains { | |||
if !strings.Contains(cpu, fmt.Sprint(podcpu)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line checks if the podcpu
is not substring of cpu
.
podcpu
is cpu id for example: 3
.
the cpu
is single line output of the /proc/schedstats
command, for example:
[ cpu0 0 0 0 0 0 0 68186178574807 516377247436 331031573 cpu1 0 0 0 0 0 0 75970491002822 375072790117 330836684]
The problem with this check is that if the string "3" not appears somewhere in this line, we return true
, which is wrong, because although this line refer to cpu0
and cpu1
the string 3
still appears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this deserves to be a comment in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the test is going to pass.
I think this patch just float that the issue is real and consistent.
IOW, this test should never passed because the kernel doesn't behave as we expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's probably little point in documenting this even in the fixing patch.
8042047
to
01ebf8d
Compare
/retest |
We need to check if this is another issue |
/test e2e-gcp-pao |
2 similar comments
/test e2e-gcp-pao |
/test e2e-gcp-pao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
nice work!
/test e2e-gcp-pao |
@Tal-or: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Tal-or we can rebase now |
Agreed with @mrniranjan to remove the check at the end of the test until we'll figure out why the kernel doesn't put the cpus back in the sched domain |
This PR contains bunch of improvement in order to deflake the test.
Most of the commits are cosmetics, but the verification of the sched
domains before the test begins commit.