New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: WaitForJobStopped correction #19749
Conversation
e2e/e2eutil/utils.go
Outdated
// sleep for 3 seconds to make sure things any related events (like Consul | ||
// service de-registration) has enough time to happen, because there are | ||
// tests that assert such things after stopping jobs. | ||
time.Sleep(3 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard-coded sleep does seem pretty gross, and potential for future flakiness...
would running DeregisterOpts()
instead with NoShutdownDelay
help here at all? also it returns an EvalID
-- any value from watching that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard-coded sleep does seem pretty gross, and potential for future flakiness...
I concur, although we have plenty of hard-coded sleeps all across our e2eutil package...
would running DeregisterOpts() instead with NoShutdownDelay help here at all?
I don't think so. It is my understanding that NoShutdownDelay
will simply result in Deregister
call taking less time, but the call blocks until the deregister message is applied via Raft.
also it returns an EvalID -- any value from watching that?
that's an idea maybe, I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming it gets our desired outcome, LGTM!
require.NoError(t, err, "error deregistering job %q", job) | ||
|
||
testutil.WaitForResultRetries(retries, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say to use must.Wait()
for the retry logic, except this whole file seems to use this testutil, so being consistent with the surrounding context seems fine.
mainly I'm curious if this does actually do what was originally expected? since the job's being purged, the eval doesn't complete until the allocs are all gone too?
We're getting errors from some tests that assert service de-registration from Consul after calling
WaitForJobStopped
. Before #19744,WaitForJobStopped
waited for allocs to be stopped too, but since we changed the behavior of-purge
, there is no wait anymore and Consul-related assertions happen immediately.Example of a failing test:
I welcome better suggestions on how to handle this.