test: Fix flake in node e2e mirror pod tests #117000

bobbypage · 2023-03-30T00:18:09Z

The newly added MirrorPodWithGracePeriod when create a mirror pod and the container runtime is temporarily down during pod termination test is currently flaking because in some cases when it is run there are other pods from other tests that are still in progress of being terminated. This results in the test failing because it asserts metrics that assume that there is only one pod running on the node.

To fix the flake, prior to starting the test, verify that no pods exist in the api server.

What type of PR is this?

/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #116998

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2023-03-30T00:18:13Z

Please note that we're already in Test Freeze for the release-1.27 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.27.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Mar 29 22:31:15 UTC 2023.

pacoxu · 2023-03-30T08:26:22Z

/kind flake
/priority important-soon
/triage accepted

pacoxu · 2023-03-30T08:34:59Z

The fix makes sense in my opinion.

However, as this is a Serial e2e test, it is heard that this is caused by the bad cleanup of another test case.

If we can know which test case is finished with some pod in terminating mode(not cleaned), adding the check in the AfterEach would be better, right?

Adding this is no harm to other tests.
/lgtm

k8s-ci-robot · 2023-03-30T08:35:07Z

LGTM label has been added.

Git tree hash: 5787f5c106720bcb959a4e9fc510ef3c09d10b72

bart0sh · 2023-03-30T20:48:05Z

/assign @mrunalp @derekwaynecarr @dchen1107

bobbypage · 2023-03-31T07:21:01Z

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial

bobbypage · 2023-03-31T07:26:17Z

If we can know which test case is finished with some pod in terminating mode(not cleaned), adding the check in the AfterEach would be better, right?

The issue is that the tests themselves are not responsible for cleaning up their resources, this is the responsibility of the test framework. The test framework creates a new namespace for every test and when the test finishes, it deletes the namespace. But the namespace is deleted asynchronously, i.e. the test framework doesn't wait until all the resources are deleted until moving on to the next test (this is by design). This is fine for parallel tests, but for serial tests that assume they are the only thing running can sometimes be an issue.

See related slack thread about this here.

It might be worth to consider in future if for node e2e serial tests in particular, we should adapt the test framework to explicitly wait until the namespace is deleted. It may slow down test time, but maybe that is acceptable.

The newly added `MirrorPodWithGracePeriod when create a mirror pod and the container runtime is temporarily down during pod termination` test is currently flaking because in some cases when it is run there are other pods from other tests that are still in progress of being terminated. This results in the test failing because it asserts metrics that assume that there is only one pod running on the node. To fix the flake, prior to starting the test, verify that no pods exist in the api server other then the newly created mirror pod. Signed-off-by: David Porter <david@porter.me>

bobbypage · 2023-03-31T09:11:14Z

/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial

bart0sh · 2023-03-31T20:22:38Z

/lgtm

k8s-ci-robot · 2023-03-31T20:22:45Z

LGTM label has been added.

Git tree hash: 715c7784ade78a57dc4626702d6502a7e1e6accb

pacoxu

/lgtm

oomichi · 2023-04-04T23:41:16Z

/approve

k8s-ci-robot · 2023-04-04T23:41:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobbypage, oomichi, pacoxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/e2e_node/OWNERS~~ [oomichi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 30, 2023

k8s-ci-robot requested a review from krmayankk March 30, 2023 00:23

k8s-ci-robot added the area/test label Mar 30, 2023

k8s-ci-robot requested a review from yujuhong March 30, 2023 00:23

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 30, 2023

bobbypage force-pushed the gh_116998 branch from 90f5346 to 1664636 Compare March 30, 2023 00:25

k8s-ci-robot assigned pacoxu Mar 30, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2023

k8s-ci-robot assigned dchen1107, derekwaynecarr and mrunalp Mar 30, 2023

bobbypage force-pushed the gh_116998 branch from 1664636 to f7f3a34 Compare March 31, 2023 07:20

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2023

k8s-ci-robot requested review from dchen1107, derekwaynecarr, mrunalp and pacoxu March 31, 2023 07:20

bobbypage force-pushed the gh_116998 branch from f7f3a34 to 1893f63 Compare March 31, 2023 09:11

k8s-ci-robot assigned bart0sh Mar 31, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2023

pacoxu approved these changes Apr 3, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 4, 2023

k8s-ci-robot merged commit 95d3492 into kubernetes:master Apr 12, 2023

k8s-ci-robot added this to the v1.28 milestone Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Fix flake in node e2e mirror pod tests #117000

test: Fix flake in node e2e mirror pod tests #117000

bobbypage commented Mar 30, 2023 •

edited

Loading

k8s-ci-robot commented Mar 30, 2023

pacoxu commented Mar 30, 2023

pacoxu commented Mar 30, 2023

k8s-ci-robot commented Mar 30, 2023

bart0sh commented Mar 30, 2023

bobbypage commented Mar 31, 2023

bobbypage commented Mar 31, 2023 •

edited

Loading

bobbypage commented Mar 31, 2023

bart0sh commented Mar 31, 2023

k8s-ci-robot commented Mar 31, 2023

pacoxu left a comment

oomichi commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

test: Fix flake in node e2e mirror pod tests #117000

test: Fix flake in node e2e mirror pod tests #117000

Conversation

bobbypage commented Mar 30, 2023 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Mar 30, 2023

pacoxu commented Mar 30, 2023

pacoxu commented Mar 30, 2023

k8s-ci-robot commented Mar 30, 2023

bart0sh commented Mar 30, 2023

bobbypage commented Mar 31, 2023

bobbypage commented Mar 31, 2023 • edited Loading

bobbypage commented Mar 31, 2023

bart0sh commented Mar 31, 2023

k8s-ci-robot commented Mar 31, 2023

pacoxu left a comment

Choose a reason for hiding this comment

oomichi commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

bobbypage commented Mar 30, 2023 •

edited

Loading

bobbypage commented Mar 31, 2023 •

edited

Loading