Identify e2e tests that rely on events and thus flaky, rewrite to avoid #71646

spiffxp · 2018-12-03T00:32:04Z

This came up during a recent 1.13 release team meeting. Event delivery is not actually guaranteed, so any e2e tests that rely on events are susceptible to flakes. These will become more evident as a cluster is under load.

We should identify which e2e tests are using this anti-pattern, and determine if there is a recipe we can use to rewrite them to be less flaky. For example, we could likely use polling instead of waiting for an event.

/cc @jberkus
For any examples of tests that were already identified / fixed

/area test
/kind cleanup
/priority important-soon
/sig testing

spiffxp · 2018-12-03T18:07:19Z

/cc @mariantalla

jberkus · 2018-12-03T18:17:37Z

Multiple Storage tests have this issue, as identified by @msau42, who was the one to identify the problem.

Ref: #71434

msau42 · 2018-12-03T18:34:17Z

I am tracking and working on storage related tests here: #71570

spiffxp · 2019-01-05T00:02:56Z

/area deflake

pontiyaraja · 2019-01-08T18:42:40Z

I am working on pod related tests here #72691

fejta-bot · 2019-04-08T18:46:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-08T19:17:40Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-07T20:01:38Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-07T20:01:45Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

spiffxp · 2019-07-25T23:19:16Z

/remove-lifecycle rotten
/reopen
This would still be a viable way of identifying tests that are potentially flaky

k8s-ci-robot · 2019-07-25T23:19:17Z

@spiffxp: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen
This would still be a viable way of identifying tests that are potentially flaky

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2019-10-24T00:18:52Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-11-23T01:00:39Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-01-15T17:01:46Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-01-15T17:01:54Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liggitt · 2020-04-30T21:15:33Z

This is still an issue. There are even first-class e2e helpers that encourage this antipattern (e.g. e2eevents.WaitTimeoutForEvent):

kubernetes/test/e2e/framework/events/events.go

Lines 128 to 132 in dbbf868

    
           // WaitTimeoutForEvent waits the given timeout duration for an event to occur. 
        
           func WaitTimeoutForEvent(c clientset.Interface, namespace, eventSelector, msg string, timeout time.Duration) error { 
        
           	interval := 2 * time.Second 
        
           	return wait.PollImmediate(interval, timeout, eventOccurred(c, namespace, eventSelector, msg)) 
        
           }

/lifecycle frozen

k8s-ci-robot added the area/deflake Issues or PRs related to deflaking kubernetes tests label Jan 5, 2019

mariantalla added this to To do in Deflaking kubernetes e2e tests Jan 8, 2019

spiffxp mentioned this issue Jan 8, 2019

graceful pod termination with preStop container life cycle hook #72087

Merged

pontiyaraja mentioned this issue Jan 8, 2019

Don't rely on events and watch in pod e2e tests #72691

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 8, 2019

k8s-ci-robot closed this as completed Jun 7, 2019

Deflaking kubernetes e2e tests automation moved this from Inbox (unprioritized) to Done Jun 7, 2019

k8s-ci-robot reopened this Jul 25, 2019

Deflaking kubernetes e2e tests automation moved this from Done to In progress Jul 25, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 25, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 23, 2019

liggitt added the kind/flake Categorizes issue or PR as related to a flaky test. label Dec 16, 2019

k8s-ci-robot closed this as completed Jan 15, 2020

Deflaking kubernetes e2e tests automation moved this from In progress to Done Jan 15, 2020

liggitt reopened this Apr 30, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Apr 30, 2020

BenTheElder mentioned this issue Apr 30, 2020

flake: [sig-node] RuntimeClass should reject a Pod requesting a RuntimeClass with an unconfigured handler #90653

Closed

Nick-Triller mentioned this issue Apr 25, 2021

e2e: deflake test by not relying on events #101464

Merged

SergeyKanzhelev mentioned this issue Jul 8, 2021

unsafe sysctls: demote its conformance test to e2e test #101190

Closed

mkimuram mentioned this issue Mar 22, 2022

[flake][sig-node] Pods should run through the lifecycle of Pods and PodStatus [Conformance] #108891

Closed

aojea mentioned this issue Mar 23, 2022

e2e: deflake "should run through the lifecycle of Pods and PodStatus" #108892

Merged

aojea mentioned this issue Apr 18, 2022

e2e: Deprecated timeout constants migration #109503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify e2e tests that rely on events and thus flaky, rewrite to avoid #71646

Identify e2e tests that rely on events and thus flaky, rewrite to avoid #71646

spiffxp commented Dec 3, 2018

spiffxp commented Dec 3, 2018

jberkus commented Dec 3, 2018

msau42 commented Dec 3, 2018

spiffxp commented Jan 5, 2019

pontiyaraja commented Jan 8, 2019

fejta-bot commented Apr 8, 2019

fejta-bot commented May 8, 2019

fejta-bot commented Jun 7, 2019

k8s-ci-robot commented Jun 7, 2019

spiffxp commented Jul 25, 2019

k8s-ci-robot commented Jul 25, 2019

fejta-bot commented Oct 24, 2019

fejta-bot commented Nov 23, 2019

fejta-bot commented Jan 15, 2020

k8s-ci-robot commented Jan 15, 2020

liggitt commented Apr 30, 2020 •

edited

Identify e2e tests that rely on events and thus flaky, rewrite to avoid #71646

Identify e2e tests that rely on events and thus flaky, rewrite to avoid #71646

Comments

spiffxp commented Dec 3, 2018

spiffxp commented Dec 3, 2018

jberkus commented Dec 3, 2018

msau42 commented Dec 3, 2018

spiffxp commented Jan 5, 2019

pontiyaraja commented Jan 8, 2019

fejta-bot commented Apr 8, 2019

fejta-bot commented May 8, 2019

fejta-bot commented Jun 7, 2019

k8s-ci-robot commented Jun 7, 2019

spiffxp commented Jul 25, 2019

k8s-ci-robot commented Jul 25, 2019

fejta-bot commented Oct 24, 2019

fejta-bot commented Nov 23, 2019

fejta-bot commented Jan 15, 2020

k8s-ci-robot commented Jan 15, 2020

liggitt commented Apr 30, 2020 • edited

liggitt commented Apr 30, 2020 •

edited