Hopefully stabilize test_bad_connection.py #6976

save-buffer · 2024-02-29T22:57:48Z

Problem

It seems that even though we have a retry on basebackup, it still sometimes fails to fetch it with the failpoint enabled, resulting in a test error.

Summary of changes

If we fail to get the basebackup, disable the failpoint and try again.

save-buffer · 2024-02-29T22:58:15Z

Addresses #6688

github-actions · 2024-02-29T23:48:43Z

2490 tests run: 2369 passed, 0 failed, 121 skipped (full report)

Flaky tests (2)

Postgres 15

test_empty_branch_remote_storage_upload: debug

Postgres 14

test_compute_pageserver_connection_stress: release

Code coverage* (full report)

functions: 28.8% (6992 of 24312 functions)
lines: 47.3% (43013 of 90878 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
f8f7d2d at 2024-03-07T16:30:43.505Z :recycle:}

test_runner/regress/test_bad_connection.py

jcsp · 2024-03-01T09:34:25Z

It would be good to look at the connection failure probability vs. the number of retries in compute ctl -- if we're doing e.g. three retries and 50% failure rate, then those numbers probably need adjusting (probably by retrying more times).

save-buffer · 2024-03-04T20:25:15Z

Currently we retry 5 times with an initial timeout of 500ms

>>> sum(500 * 2**i for i in range(5))
15500

Probability of the test failing is then 1/(2^5) = 3%. We can probably reduce the probability of test failure to 0.01% if we make it do 10 retries. Is that good enough? Or should we make it deterministically pass?

jcsp · 2024-03-06T19:09:57Z

Probability of the test failing is then 1/(2^5) = 3%. We can probably reduce the probability of test failure to 0.01% if we make it do 10 retries. Is that good enough? Or should we make it deterministically pass?

That works for me.

save-buffer · 2024-03-06T20:38:39Z

I switched it to 10 retries, with it 1.5xing the timeout every time, making the maximum amount of time spent waiting for retries

>>> sum(500*1.5**i for i in range(10))/1000.0
56.6650390625

56 sec

save-buffer requested a review from jcsp February 29, 2024 23:03

jcsp reviewed Mar 1, 2024

View reviewed changes

test_runner/regress/test_bad_connection.py Outdated Show resolved Hide resolved

save-buffer added 3 commits March 6, 2024 20:36

Hopefully stabilize test_bad_connection.py

90e3d82

make ruff happy8

020d03f

Switch to 10 retries

8c9f742

save-buffer force-pushed the sasha_fix_test branch from b1e1f4b to 8c9f742 Compare March 6, 2024 20:37

save-buffer requested review from a team as code owners March 6, 2024 20:37

save-buffer requested review from piercypixel and skyzh March 6, 2024 20:37

save-buffer added 2 commits March 6, 2024 15:25

Bump compute start timeout

9c2ece4

Bump timeout

f8f7d2d

skyzh approved these changes Mar 7, 2024

View reviewed changes

save-buffer enabled auto-merge (squash) March 7, 2024 18:11

save-buffer merged commit 2fc8942 into main Mar 7, 2024
53 checks passed

save-buffer deleted the sasha_fix_test branch March 7, 2024 18:12

save-buffer mentioned this pull request Mar 7, 2024

test_compute_pageserver_connection_stress flakiness #6688

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hopefully stabilize test_bad_connection.py #6976

Hopefully stabilize test_bad_connection.py #6976

save-buffer commented Feb 29, 2024

save-buffer commented Feb 29, 2024

github-actions bot commented Feb 29, 2024 •

edited

Postgres 15

Postgres 14

jcsp commented Mar 1, 2024

save-buffer commented Mar 4, 2024

jcsp commented Mar 6, 2024

save-buffer commented Mar 6, 2024

Hopefully stabilize test_bad_connection.py #6976

Hopefully stabilize test_bad_connection.py #6976

Conversation

save-buffer commented Feb 29, 2024

Problem

Summary of changes

save-buffer commented Feb 29, 2024

github-actions bot commented Feb 29, 2024 • edited

2490 tests run: 2369 passed, 0 failed, 121 skipped (full report)

Postgres 15

Postgres 14

Code coverage* (full report)

jcsp commented Mar 1, 2024

save-buffer commented Mar 4, 2024

jcsp commented Mar 6, 2024

save-buffer commented Mar 6, 2024

github-actions bot commented Feb 29, 2024 •

edited