-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky TestPeriodicHourly
and TestPeriodicMinutes
#17054
Comments
To repro this add a
The recorder (backed by an unbuffered channel) is supposed to act as a semaphore here, but the
|
I can take a look. |
/assign @moficodes |
Followed the instruction from @tjungblu and managed to reproduce the error.
One thing we could do is add some additional wait there. But that still does not feel like a great solution. But might be good enough as a test util? @ahrtr thoughts? |
Thanks for looking into the issue.
I do not get time to have a deep dive into the test case. But based on the quick reading on the source code, it seems that the failure is irrelevant to the
But the test case always reads the first item, (line 76 in
Did I miss anything? |
@moficodes are you still working on this issue? any update? |
@ahrtr I would like to work on it more. But I am not sure what would be the path to solving this. What should I look into more? |
Just saw this failure again https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/17367/pull-etcd-unit-test/1754513529177640960 Please try to figure out the root cause in the first step, thanks. |
Also observed the flakiness recently on our internal CI job https://testgrid.k8s.io/ibm-etcd-tests-ppc64le#Periodic%20etcd%20test%20suite%20on%20ppc64le |
|
@moficodes are you still working on this case? |
The first step should be definitely to figure out the root cause. Please feedback if you have any difficulty or do not have bandwidth to continue to work on this case. |
This is one of the top flaking tests in our internal CI. Please see testgrid |
The root cause, as aforementioned, should be out-of-sync of revision getter or compactor in two loops. It would help to remove timeout in revision getter's wait to minimize such kind of failure as in PR #17513 |
It took me some time to read the test case and this comment again, eventually I understood your point. It makes sense. Thanks @tjungblu
Thanks for the fix. Overall looks good with a minor comment. |
Observed recent flakiness for |
Which github workflows are flaking?
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/16822/pull-etcd-unit-test/1731327844841164800
Which tests are flaking?
TestPeriodicHourly
andTestPeriodicMinutes
Github Action link
No response
Reason for failure (if possible)
No response
Anything else we need to know?
No response
The text was updated successfully, but these errors were encountered: