-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Fix #549 Race condition between operator's watch and e2eutil.WaitForDeployment #550
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test 4.5-e2e |
/retest |
return false | ||
} | ||
return true | ||
}, 120*time.Second, 2*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In WaitForDeployment
, it makes use of wait.Poll
to achieve similar behavior[1]; WaitForDeployment
will return after timeout
seconds, which seems to be the same value as require.Eventually
is using. This means both would return virtually at the same time.
Lines 48 to 76 in a3e812b
func waitForDeployment(t *testing.T, kubeclient kubernetes.Interface, namespace, name string, replicas int, | |
retryInterval, timeout time.Duration, isOperator bool) error { | |
if isOperator && test.Global.LocalOperator { | |
t.Log("Operator is running locally; skip waitForDeployment") | |
return nil | |
} | |
err := wait.Poll(retryInterval, timeout, func() (done bool, err error) { | |
deployment, err := kubeclient.AppsV1().Deployments(namespace).Get(name, metav1.GetOptions{}) | |
if err != nil { | |
if apierrors.IsNotFound(err) { | |
t.Logf("Waiting for availability of %s deployment\n", name) | |
return false, nil | |
} | |
return false, err | |
} | |
if int(deployment.Status.AvailableReplicas) >= replicas { | |
return true, nil | |
} | |
t.Logf("Waiting for full availability of %s deployment (%d/%d)\n", name, | |
deployment.Status.AvailableReplicas, replicas) | |
return false, nil | |
}) | |
if err != nil { | |
return err | |
} | |
t.Logf("Deployment available (%d/%d)\n", replicas, replicas) | |
return nil | |
} |
Further investigation in assert.Eventually
[2] shows that every tick (in this case 2*time.Second
) the condition function will be executed inside a gofunc; in other words, this change can potentially create many go funcs that will be orphaned in the case the condition is never met in WaitForDeployment
.
service-binding-operator/vendor/github.com/stretchr/testify/assert/assertions.go
Lines 1635 to 1662 in 97172e4
func Eventually(t TestingT, condition func() bool, waitFor time.Duration, tick time.Duration, msgAndArgs ...interface{}) bool { | |
if h, ok := t.(tHelper); ok { | |
h.Helper() | |
} | |
ch := make(chan bool, 1) | |
timer := time.NewTimer(waitFor) | |
defer timer.Stop() | |
ticker := time.NewTicker(tick) | |
defer ticker.Stop() | |
for tick := ticker.C; ; { | |
select { | |
case <-timer.C: | |
return Fail(t, "Condition never satisfied", msgAndArgs...) | |
case <-tick: | |
tick = nil | |
go func() { ch <- condition() }() | |
case v := <-ch: | |
if v { | |
return true | |
} | |
tick = ticker.C | |
} | |
} | |
} |
Additionally, given that WaitForDeployment
's retryInterval
(5s) is greater than the Eventually
retry interval (2s), in the best scenario at least two WaitForDeployment
will be racing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further investigation in
assert.Eventually
[2] shows that every tick (in this case2*time.Second
) the condition function will be executed inside a gofunc; in other words, this change can potentially create many go funcs that will be orphaned in the case the condition is never met inWaitForDeployment
.
My theory was, as Eventually creates goroutines then waitForDeployment will be running concurrently in several goroutines created on each tick and this could increase the chance of detecting deployment readiness than detecting deployment by single waitForDeployment
. There is a possibility that these goroutines could be orphaned but they will be returned or exited once the waitForDeployment
timeouts.
Additionally, given that
WaitForDeployment
'sretryInterval
(5s) is greater than theEventually
retry interval (2s), in the best scenario at least twoWaitForDeployment
will be racing.
On each tick (per 2sec) new goroutine will be created in 120 sec there will be 60 goroutines(a bit high number) which will run for 120Sec each(timeout of waitForDeployment
). Some goroutines could run after the Eventually is timed-out or met the condition, and probably if I'm not wrong all the goroutines will be exited by 240Sec(if Eventually orphans the gofunc) and we just need one goroutine to meet the condition but still there is a small margin for never meeting the condition.
This was my theory for trying out this nested polling sort of thing. Yeah, this seems bit costlier approach.
Or this racing condition could be alleviated by #523 and also we would need to make single write for patching and annotating deployment.
Any suggestion for other approaches?
/test 4.5-e2e |
@pratikjagrut: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Closing this PR in favour of acceptance test. |
Motivation
Fix #549
Changes
Add waitForDeployment inside require.Eventually().
Testing
make test-e2e