-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache: Reflector should have the same injected clock as its informer #115077
cache: Reflector should have the same injected clock as its informer #115077
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
While refactoring the backoff manager to simplify and unify the code in wait a race condition was encountered in TestSharedInformerWatchDisruption. The new implementation failed because the fake clock was not propagated to the backoff managers when the reflector was used in a controller. After ensuring the mangaers, reflector, controller, and informer shared the same clock the test needed was updated to avoid the race condition by advancing the fake clock and adding real sleeps to wait for asynchronous propagation of the various goroutines in the controller. Due to the deep structure of informers it is difficult to inject hooks to avoid having to perform sleeps. At a minimum the FakeClock interface should allow a caller to determine the number of waiting timers (to avoid the first sleep).
b78d9f2
to
91b3a81
Compare
@@ -348,6 +348,18 @@ func TestSharedInformerWatchDisruption(t *testing.T) { | |||
// Simulate a connection loss (or even just a too-old-watch) | |||
source.ResetWatch() | |||
|
|||
// Wait long enough for the reflector to exit and the backoff function to start waiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curios, what is "long enough" in this context? the time to execute all these actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test long enough is "for the goscheduler to kick in, switch to the waiting goroutine, and then run up until the point we try to get the timer channel from timer, which registers us with the fake clock so 'Step' actually does something". I.e. on the order of nano seconds but we don't have a way currently to inject a deterministic "wait until a timer is created on the fake clock and passes this argument".
/test pull-kubernetes-node-e2e-containerd unrelated failure
/lgtm
test doesn't flake with current sleeps |
LGTM label has been added. Git tree hash: 4b275cc1517278d2e8a3ea8e7afc723475dad0dc
|
TFTR! |
/triage accepted |
NOTE: This pull is part of a series of changes that introduce new context-cancellation-aware Poll methods, reduce the surface area of the wait package to a smaller set of functions (if unused, marked private, if consolidating, marked deprecated), unify the underlying loop implementation with better testing, consolidate the backoff manager code into a smaller chunk, and in general address a number of outstanding issues. See #115077, #115116, #115113, #115140, #115064, and #107826.
While refactoring the backoff manager to simplify and unify the code in wait a race condition was encountered in TestSharedInformerWatchDisruption. The new implementation failed because the fake clock was not propagated to the reflector AND backoff managers (right now the backoff managers in tests would be using a real clock). After ensuring the reflector, controller, and informer shared the same clock the test needed to be updated to avoid the race condition by advancing the fake clock and adding real sleeps to wait for asynchronous propagation of the various goroutines in the controller.
Due to the deep structure of informers it is difficult to inject hooks to avoid having to perform sleeps. At a minimum the FakeClock interface should allow a caller to determine the number of waiting timers (to avoid the first sleep).
Included in #115064 but called out separately here for independent tests.
/kind cleanup
/kind flake