New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hopefully stabilize test_bad_connection.py #6976
Conversation
Addresses #6688 |
2490 tests run: 2369 passed, 0 failed, 121 skipped (full report)Flaky tests (2)Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
f8f7d2d at 2024-03-07T16:30:43.505Z :recycle: |
It would be good to look at the connection failure probability vs. the number of retries in compute ctl -- if we're doing e.g. three retries and 50% failure rate, then those numbers probably need adjusting (probably by retrying more times). |
Currently we retry 5 times with an initial timeout of 500ms
Probability of the test failing is then |
That works for me. |
b1e1f4b
to
8c9f742
Compare
I switched it to 10 retries, with it 1.5xing the timeout every time, making the maximum amount of time spent waiting for retries
56 sec |
Problem
It seems that even though we have a retry on basebackup, it still sometimes fails to fetch it with the failpoint enabled, resulting in a test error.
Summary of changes
If we fail to get the basebackup, disable the failpoint and try again.