-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eviction: Deflake TestStart #108010
eviction: Deflake TestStart #108010
Conversation
TestStart was previously flaky. In approx 100_000 local runs, it failed about 70% of the time, and has been mentioned as a flaky unit test in the past. This flake was due to a race condition with the logic as written and the go scheduler. UpdateThreshold calls `notifier.Start(events)` in a new Go Routine, which is not guarunteed to be called immediately. This meant that if `m.Start()` was called before `notifier.Start()`, the test would fail, as the notifier would not have been started before the 4 events were processed and lock released. Here, we update the test to more closely match the intended application behaviour, and have events passed to the channel when `Start` is called on the notifier. This ensures that -Start gets called and additionally validates that the correct channel is provided to the notifier. Stop was never called previously, as it only gets called on a subsequent call to UpdateThreshold. `AnyTimes()` hid that this did not occur.
925130f
to
3630328
Compare
/test pull-kubernetes-unit lol other flakes on a flake fix |
/priority important-soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
tested
/assign mrunalp |
/assign @klueska |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @klueska @dchen1107 @derekwaynecarr |
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
27 test failures (1 today) look like linksearch githubfile bug It is still one of the top flakes in http://storage.googleapis.com/k8s-metrics/flakes-latest.json. |
/approve thanks @endocrimes |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, endocrimes, pacoxu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind flake
/sig node
What this PR does / why we need it:
TestStart was previously flaky. In approx 100_000 local runs, it failed
70% of the time, and has been mentioned as a flaky unit test several times in
the past.
This flake was due to a race condition with the logic as written and the
go scheduler. UpdateThreshold calls
notifier.Start(events)
in a new GoRoutine, which is not guarunteed to be called immediately.
This meant that if
m.Start()
was called beforenotifier.Start()
, thetest would fail, as the notifier would not have been started before the
4 events were processed and lock released.
Here, we update the test to more closely match the intended application
behaviour, and have events passed to the channel when
Start
is calledon the notifier.
This ensures that -Start gets called and additionally validates
that the correct channel is provided to the notifier.
Stop was never called previously, as it only gets called on a subsequent
call to UpdateThreshold.
AnyTimes()
hid that this did not occur.Which issue(s) this PR fixes:
Related to: #107708
Special notes for your reviewer:
Does this PR introduce a user-facing change?