Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify e2e tests that rely on events and thus flaky, rewrite to avoid #71646

Open
spiffxp opened this issue Dec 3, 2018 · 16 comments
Open
Labels
area/deflake Issues or PRs related to deflaking kubernetes tests area/test kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@spiffxp
Copy link
Member

spiffxp commented Dec 3, 2018

This came up during a recent 1.13 release team meeting. Event delivery is not actually guaranteed, so any e2e tests that rely on events are susceptible to flakes. These will become more evident as a cluster is under load.

We should identify which e2e tests are using this anti-pattern, and determine if there is a recipe we can use to rewrite them to be less flaky. For example, we could likely use polling instead of waiting for an event.

/cc @jberkus
For any examples of tests that were already identified / fixed

/area test
/kind cleanup
/priority important-soon
/sig testing

@k8s-ci-robot k8s-ci-robot added area/test kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 3, 2018
@spiffxp
Copy link
Member Author

spiffxp commented Dec 3, 2018

/cc @mariantalla

@jberkus
Copy link

jberkus commented Dec 3, 2018

Multiple Storage tests have this issue, as identified by @msau42, who was the one to identify the problem.

Ref: #71434

@msau42
Copy link
Member

msau42 commented Dec 3, 2018

I am tracking and working on storage related tests here: #71570

@spiffxp
Copy link
Member Author

spiffxp commented Jan 5, 2019

/area deflake

@pontiyaraja
Copy link
Contributor

I am working on pod related tests here #72691

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 8, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Deflaking kubernetes e2e tests automation moved this from Inbox (unprioritized) to Done Jun 7, 2019
@spiffxp
Copy link
Member Author

spiffxp commented Jul 25, 2019

/remove-lifecycle rotten
/reopen
This would still be a viable way of identifying tests that are potentially flaky

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen
This would still be a viable way of identifying tests that are potentially flaky

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jul 25, 2019
Deflaking kubernetes e2e tests automation moved this from Done to In progress Jul 25, 2019
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 25, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 23, 2019
@liggitt liggitt added the kind/flake Categorizes issue or PR as related to a flaky test. label Dec 16, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Deflaking kubernetes e2e tests automation moved this from In progress to Done Jan 15, 2020
@liggitt liggitt reopened this Apr 30, 2020
@liggitt
Copy link
Member

liggitt commented Apr 30, 2020

This is still an issue. There are even first-class e2e helpers that encourage this antipattern (e.g. e2eevents.WaitTimeoutForEvent):

// WaitTimeoutForEvent waits the given timeout duration for an event to occur.
func WaitTimeoutForEvent(c clientset.Interface, namespace, eventSelector, msg string, timeout time.Duration) error {
interval := 2 * time.Second
return wait.PollImmediate(interval, timeout, eventOccurred(c, namespace, eventSelector, msg))
}

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deflake Issues or PRs related to deflaking kubernetes tests area/test kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
No open projects
Development

No branches or pull requests

7 participants