-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRA: e2e: test non-graceful node shutdown #120965
DRA: e2e: test non-graceful node shutdown #120965
Conversation
/priority important-soon |
/assign @pohly |
52e7b41
to
8188e66
Compare
/retest |
8188e66
to
af17606
Compare
/retest |
/retest |
test/e2e/dra/dra.go
Outdated
gomega.Expect(stderr).To(gomega.BeEmpty()) | ||
framework.ExpectNoError(err) | ||
|
||
ginkgo.By("remove out-of-service taint from the node " + nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. will remove. Thanks for the suggestion!
test/e2e/dra/dra.go
Outdated
ginkgo.By("start node" + nodeName) | ||
_, stderr, err = framework.RunCmd("docker", "start", nodeName) | ||
gomega.Expect(stderr).To(gomega.BeEmpty()) | ||
framework.ExpectNoError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move all of this into the DeferCleanup
above. A cleanup callback can fail just like the It
callback and it will make the test as failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would increase a duration of the test to 480 seconds. I've mentioned this in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forget about it. It seems to work correctly without it. I'll remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this test takes much longer. I've commented in the PR description about it. Now I've added a comment to the test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't suggesting to remove framework.RunCmd("docker", "start", nodeName)
, just to put it into a ginkgo.DeferCleanup
. Why is that slower? It shouldn't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ginkgo.DeferCleanup(framework.RunCmd, "docker", "start", nodeName)
is in the code from the very beginning.
However, without explicitly calling docker start
at the end of the test, the test takes a lot more time.
test/e2e/dra/dra.go
Outdated
gomega.Eventually(ctx, func(ctx context.Context) (*resourcev1alpha2.ResourceClaim, error) { | ||
return b.f.ClientSet.ResourceV1alpha2().ResourceClaims(b.f.Namespace.Name).Get(ctx, claim.Name, metav1.GetOptions{}) | ||
}).WithTimeout(f.Timeouts.PodDelete).Should(gomega.HaveField("Status.Allocation", (*resourcev1alpha2.AllocationResult)(nil))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can wrap b.f.ClientSet.ResourceV1alpha2().ResourceClaims(b.f.Namespace.Name).Get
with the helper in e2e/framework/get.go
to get support handling transient API server errors, something like:
gomega.Eventually(ctx, func(ctx context.Context) (*resourcev1alpha2.ResourceClaim, error) { | |
return b.f.ClientSet.ResourceV1alpha2().ResourceClaims(b.f.Namespace.Name).Get(ctx, claim.Name, metav1.GetOptions{}) | |
}).WithTimeout(f.Timeouts.PodDelete).Should(gomega.HaveField("Status.Allocation", (*resourcev1alpha2.AllocationResult)(nil))) | |
gomega.Eventually(ctx, framework.GetObject(b.f.ClientSet.ResourceV1alpha2().ResourceClaims(b.f.Namespace.Name), claim.Name, metav1.GetOptions{}). WithTimeout(f.Timeouts.PodDelete).Should(gomega.HaveField("Status.Allocation", gomega.BeNil())) |
I'm using gomega.BeNil
because it expresses the intent a bit more concisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After applying this, I've got this error:
type "k8s.io/client-go/kubernetes/typed/resource/v1alpha2".ResourceClaimInterface of b.f.ClientSet.ResourceV1alpha2().ResourceClaims(b.f.Namespace.Name) does not match framework.APIGetFunc[T] (cannot infer T)compiler[CannotInferTypeArgs](https://pkg.go.dev/golang.org/x/tools/internal/typesinternal#CannotInferTypeArgs)
func (kubernetes.Interface).ResourceV1alpha2() v1alpha2.ResourceV1alpha2Interface
[(kubernetes.Interface).ResourceV1alpha2 on pkg.go.dev]
Any hints of fixing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add .Get
to the first parameter of GetObject
:
gomega.Eventually(ctx, framework.GetObject(b.f.ClientSet.ResourceV1alpha2().ResourceClaims(b.f.Namespace.Name).Get, claim.Name, metav1.GetOptions{}). WithTimeout(f.Timeouts.PodDelete).Should(gomega.HaveField("Status.Allocation", gomega.BeNil()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks for the suggestion!
test/e2e/dra/dra.go
Outdated
// This test covers aspects of non graceful node shutdown by DRA controller | ||
// More details about this can be found in the KEP: | ||
// https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown | ||
var _ = ginkgo.Describe("[sig-node] [Serial] [Disruptive] [Slow] [Feature:DynamicResourceAllocation] DRA: handling non graceful node shutdown", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be a new Describe
? Can't it be a new It
under some existing context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't. I separated it because I wanted to show that this is a special test case. It's disruptive, it shuts down test node, it works only with kind. I think it deserves to be visually separated from the rest of existing test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether I agree with this rationale. It has a few downsides (some duplicated setup source code lines, full test name.
What is the the full test name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me be more explicit "DRA: handling non graceful node shutdown cluster must deallocate on non graceful node shutdown" just doesn't look right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved under existing context. PTAL.
test/e2e/dra/dra.go
Outdated
// Prevent builder tearDown to fail waiting for unprepared resources | ||
delete(b.driver.Nodes, nodeName) | ||
ginkgo.By("stop node " + nodeName + " non gracefully") | ||
ginkgo.DeferCleanup(framework.RunCmd, "docker", "start", nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a dependency on running on a kind cluster here which isn't expressed via some test tag. At least call it out in the test comment, as we don't have a feature tag defined for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
a5e9e62
to
11317a4
Compare
/retest |
38c3fa6
to
f7d5193
Compare
f7d5193
to
debc814
Compare
/retest |
debc814
to
4b65e42
Compare
test/e2e/dra/dra.go
Outdated
// This test covers aspects of non graceful node shutdown by DRA controller | ||
// More details about this can be found in the KEP: | ||
// https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown | ||
ginkgo.Context("cluster", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "cluster" context itself isn't about non-graceful shutdown. This comment belongs in front of the ginkgo.It
with the test.
Regarding context: can you put the It
under
ginkgo.Context("multiple nodes", func() {
nodes := NewNodes(f, 2, 8)
ginkgo.Context("with network-attached resources", func() {
driver := NewDriver(f, nodes, networkResources)
b := newBuilder(f, driver)
ginkgo.It(...
There's no need for a new context. The advantage is that tests that use a similar setup are also grouped together and there's less code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, please review again
4b65e42
to
fb9f2f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: 0ddea5aa7113cf66dbed026c0508b9726047ee88
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bart0sh, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Added DRA e2e test that covers non graceful node shutdown
Which issue(s) this PR fixes:
Fixes #120421
Special notes for your reviewer:
see https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown for more details
This test takes 72 seconds to run:
Does this PR introduce a user-facing change?