-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Fix flaky tests by using RetryOnConflict #904
Conversation
|
||
Expect(k8sClient.Update(ctx, myRayService)).Should(Succeed(), "failed to update test RayService resource") | ||
err := retry.RetryOnConflict(retry.DefaultRetry, func() error { | ||
Eventually( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we need to getResourceFunc
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without getResourceFunc , if a 409 conflict happens, the resourceVersion we pass to API server during retry updating will never be updated and thus will never match the resourceVersion in the API server.
Also update the usage of k8s.io/utils/pointer pkg. Some functions are deprecated and have new names. See here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the fix!
Use RetryOnConflict to relieve flakiness.
Why are these changes needed?
In a previous experiment, it was shown that 86 runs succeed, and 14 runs failed with the following errors:
The above errors are all due to 409 conflict:
So, two changes are made in this PR:
If you have concerns about using RetryOnConflict to handle 409 conflict errors in testing or need some background information, see the below links :
Related issue number
Closes #902
Checks
In a 2-core CPU, 7 GB RAM VM (to simulate the github's standard Linux runner), I ran the test 100 times:
All pass with no error. The test is stable now.