Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaking Test] ci-node-e2e (Container Lifecycle) #125030

Closed
Vyom-Yadav opened this issue May 21, 2024 · 11 comments · Fixed by #125282
Closed

[Flaking Test] ci-node-e2e (Container Lifecycle) #125030

Vyom-Yadav opened this issue May 21, 2024 · 11 comments · Fixed by #125282
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@Vyom-Yadav
Copy link
Member

Which jobs are flaking?

master-blocking:

  • ci-node-e2e

Which tests are flaking?

E2eNode Suite.[It] [sig-node] [NodeConformance] Containers Lifecycle should run Init container to completion before call to PostStart of regular container

Since when has it been flaking?

Recent failures:
5/21/2024, 1:27:10 PM ci-kubernetes-node-e2e-containerd
5/19/2024, 6:56:05 PM ci-cos-containerd-node-e2e
5/17/2024, 5:18:04 AM ci-kubernetes-node-e2e-containerd
5/13/2024, 12:59:17 AM ci-kubernetes-node-e2e-containerd
5/11/2024, 4:41:15 AM ci-cos-containerd-node-e2e

Testgrid link

https://testgrid.k8s.io/sig-release-master-blocking#ci-node-e2e

Reason for failure (if possible)

[FAILED] couldn't find that PostStart-regular-1 ever started, got
0) 2024-05-09 10:55:23.485 +0000 UTC init-1 Starting
1) 2024-05-09 10:55:23.493 +0000 UTC init-1 Started
2) 2024-05-09 10:55:23.499 +0000 UTC init-1 Delaying
3) 2024-05-09 10:55:24.506 +0000 UTC init-1 Exiting
4) 2024-05-09 10:55:25.079 +0000 UTC PostStart-regular-1 Starting
5) 2024-05-09 10:55:25.085 +0000 UTC regular-1 Starting
6) 2024-05-09 10:55:25.091 +0000 UTC PostStart-regular-1 Started
7) 2024-05-09 10:55:25.096 +0000 UTC regular-1 Started
8) 2024-05-09 10:55:25.101 +0000 UTC PostStart-regular-1 Delaying
9) 2024-05-09 10:55:25.107 +0000 UTC regular-1 Delaying
10) 2024-05-09 10:55:26.11 +0000 UTC PostStart-regular-1 Exiting
11) 2024-05-09 10:55:35.116 +0000 UTC regular-1 Exiting

Anything else we need to know?

No response

Relevant SIG(s)

/sig node

@Vyom-Yadav Vyom-Yadav added the kind/flake Categorizes issue or PR as related to a flaky test. label May 21, 2024
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 21, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Vyom-Yadav
Copy link
Member Author

#120458 is also similar. The fix for that was bumping the delay time.

@collinm10
Copy link

Hi Vyom - would this be a good first issue because there is a previous similar solution?

@hshiina
Copy link

hshiina commented May 22, 2024

I hit the same failure in my PR.
This seems caused because RunTogether() expects the regular container to start before the postStart:

framework.ExpectNoError(results.RunTogether(regular1, prefixedName(PostStartPrefix, regular1)))

I posted a PR(#124854) to add StartDelay to the postStart. However, there are other approaches such as:

  • Introduce RunTogetherWithoutOrdering as mentioned here.
  • Remove RunTogther() from this test case because this check does not look the primary purpose of this testcase.

@collinm10
Copy link

The test mentioned by 120458 is similar to this flake and doesn't use RunTogether(). I can just remove it. I'm also going to add a comment on RunTogether to indicate it expects ordering to be enforced.

That being said, shouldn't PostStart be enforced to run after the container is started?

/assign

@Vyom-Yadav
Copy link
Member Author

Hey @collinm10 PRs are more than welcome :)

@collinm10
Copy link

I've submitted the PR, let me know if I didn't follow any best practices or anything else is required, thanks!

@pacoxu

This comment was marked as off-topic.

@k8s-ci-robot
Copy link
Contributor

@pacoxu: Closing this issue.

In response to this:

Skipped after #125027 was merged. Last CI run passed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pacoxu

This comment was marked as off-topic.

@k8s-ci-robot k8s-ci-robot reopened this May 23, 2024
@k8s-ci-robot
Copy link
Contributor

@pacoxu: Reopened this issue.

In response to this:

/reopen
close the wrong one: #125026 is closed for swap test skip. this is for another flake. Sorry.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Development

Successfully merging a pull request may close this issue.

5 participants