Skip to content

Commit

Permalink
Hopefully stabilize test_bad_connection.py (#6976)
Browse files Browse the repository at this point in the history
## Problem
It seems that even though we have a retry on basebackup, it still
sometimes fails to fetch it with the failpoint enabled, resulting in a
test error.

## Summary of changes
If we fail to get the basebackup, disable the failpoint and try again.
  • Loading branch information
save-buffer committed Mar 7, 2024
1 parent ce7a82d commit 2fc8942
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
8 changes: 4 additions & 4 deletions compute_tools/src/compute.rs
Original file line number Diff line number Diff line change
Expand Up @@ -396,9 +396,9 @@ impl ComputeNode {
// Gets the basebackup in a retry loop
#[instrument(skip_all, fields(%lsn))]
pub fn get_basebackup(&self, compute_state: &ComputeState, lsn: Lsn) -> Result<()> {
let mut retry_period_ms = 500;
let mut retry_period_ms = 500.0;
let mut attempts = 0;
let max_attempts = 5;
let max_attempts = 10;
loop {
let result = self.try_get_basebackup(compute_state, lsn);
match result {
Expand All @@ -410,8 +410,8 @@ impl ComputeNode {
"Failed to get basebackup: {} (attempt {}/{})",
e, attempts, max_attempts
);
std::thread::sleep(std::time::Duration::from_millis(retry_period_ms));
retry_period_ms *= 2;
std::thread::sleep(std::time::Duration::from_millis(retry_period_ms as u64));
retry_period_ms *= 1.5;
}
Err(_) => {
return result;
Expand Down
2 changes: 1 addition & 1 deletion control_plane/src/endpoint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -656,7 +656,7 @@ impl Endpoint {
// Wait for it to start
let mut attempt = 0;
const ATTEMPT_INTERVAL: Duration = Duration::from_millis(100);
const MAX_ATTEMPTS: u32 = 10 * 30; // Wait up to 30 s
const MAX_ATTEMPTS: u32 = 10 * 90; // Wait up to 1.5 min
loop {
attempt += 1;
match self.get_status().await {
Expand Down

1 comment on commit 2fc8942

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2570 tests run: 2434 passed, 1 failed, 135 skipped (full report)


Failures on Postgres 15

  • test_null_config: debug
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_null_config[debug-pg15]"
Flaky tests (4)

Postgres 16

  • test_timeline_size_quota_on_startup: release

Postgres 15

  • test_ancestor_branch: debug
  • test_auth_failures[True]: debug
  • test_compute_auth_to_pageserver: debug

Test coverage report is not available

The comment gets automatically updated with the latest test results
2fc8942 at 2024-03-07T19:10:10.644Z :recycle:

Please sign in to comment.