-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update test/e2e/storage for new GetReadySchedulableNodes stuff #83480
Conversation
@danwinship: GitHub didn't allow me to request PR reviews from the following users: danwinship. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/test pull-kubernetes-e2e-gce-storage-slow |
/cc @oomichi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why pull-kubernetes-e2e-gce-storage-slow was failed..
/test pull-kubernetes-e2e-gce-storage-slow
framework.Skipf("Requires at least %d node", 1) | ||
} | ||
nodeInfo = TestContext.NodeMapper.GetNodeInfo(nodes.Items[0].Name) | ||
nodeInfo = GetReadySchedulableRandomNodeInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The original code made skip if the node number is 0, and the changed code makes failure with gomega.Expect(nodesInfo).NotTo(gomega.BeEmpty())
in GetReadySchedulableRandomNodeInfo().
I guess the new behavior is not bad because that is the same as the other test code.
This is just a comment if somebody face this new behavior.
@@ -72,7 +72,6 @@ var _ = utils.SIGDescribe("vsphere cloud provider stress [Feature:vsphere]", fun | |||
gomega.Expect(instances > len(scNames)).To(gomega.BeTrue(), "VCP_STRESS_INSTANCES should be greater than 3 to utilize all 4 types of storage classes") | |||
|
|||
iterations = GetAndExpectIntEnvVar(VCPStressIterations) | |||
framework.ExpectNoError(err, "Error Parsing VCP_STRESS_ITERATIONS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch
maxLen := len(nodes.Items) | ||
if maxLen > maxNodes { | ||
maxLen = maxNodes | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to remove this capping way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code uses the new GetBoundedReadySchedulableNodes
, which does the capping for you
Yeah, it seems to time out, but I couldn't figure out why. (Also, it seems that it doesn't always time out; it passed on the first run.) |
6a103bb
to
0fa41e5
Compare
7179f21
to
3f96947
Compare
/test pull-kubernetes-e2e-gce-storage-slow |
0ce2f49
to
6a103bb
Compare
@oomichi is it possible that pull-kubernetes-e2e-gce-storage-slow is just super flaky? It seems like there are lots of failures in other PR's runs as well... |
rerun for checking current situation of the job: /test pull-kubernetes-e2e-gce-storage-slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign @oomichi
nodeList := framework.GetReadySchedulableNodesOrDie(f.ClientSet) | ||
node := nodeList.Items[0] | ||
node, err := e2enode.GetRandomReadySchedulableNode(f.ClientSet) | ||
framework.ExpectNoError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I got the point why the pull-kubernetes-e2e-gce-storage-slow job was failed.
At the original line 71, nodeName
is always the first node of the list.
And the label is added to the first node at original line 82.
In addition, flexvolume is installed into the node which is selected here.
After applying this change, both nodes of the original line 71 and 128 are selected randomly.
So if they are different, the test is failed.
That means the original code always selects the same node at these lines.
So it is better to change here and line 69 like
nodeList, err := e2enode.GetReadySchedulableNodes(f.ClientSet)
framework.ExpectNoError(err)
node := nodeList.Items[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! I was originally thinking it must be something to do with the introduction of node randomization, but I missed that there was dependency between the two GetReadySchedulableNodes
calls in this file.
I kept the use of GetRandomReadySchedulableNode
in the BeforeEach and just made it save the selected node and use the same one again in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating, that makes sense :-)
6a103bb
to
211db02
Compare
211db02
to
6a04043
Compare
/test pull-kubernetes-e2e-gce-storage-slow |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, oomichi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Update test/e2e/storage for new GetReadySchedulableNodes stuff
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Spinoff of #82291 with just the changes to
test/e2e/storage/
, which were causingpull-kubernetes-e2e-gce-storage-slow
to fail for mysterious reasons...Does this PR introduce a user-facing change?:
/cc