-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: improve stability of test_deletion_queue_recovery
#7325
Conversation
The line before does a checkpoint during wait for upload.
2754 tests run: 2630 passed, 0 failed, 124 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
cdaf38e at 2024-04-05T12:24:44.075Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note there's also pausable_failpoint!
, it allows to use the indefinite sleep
action instead of a very high number like you did in this PR.
It is quite heavy though because it does a spawn_blocking
.
Lines 121 to 147 in b30b15e
}; | |
/// Declare a failpoint that can use the `pause` failpoint action. | |
/// We don't want to block the executor thread, hence, spawn_blocking + await. | |
macro_rules! pausable_failpoint { | |
($name:literal) => { | |
if cfg!(feature = "testing") { | |
tokio::task::spawn_blocking({ | |
let current = tracing::Span::current(); | |
move || { | |
let _entered = current.entered(); | |
tracing::info!("at failpoint {}", $name); | |
fail::fail_point!($name); | |
} | |
}) | |
.await | |
.expect("spawn_blocking"); | |
} | |
}; | |
($name:literal, $cond:expr) => { | |
if cfg!(feature = "testing") { | |
if $cond { | |
pausable_failpoint!($name) | |
} | |
} | |
}; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, for this specific use case, I think you want to exercise the cancellation path?
Would maybe make sense to have a failpoint_support::sleep_indefinitely_until_cancel
which does a 100 year sleep + wait for cancel internally.
Right, at some point we should pull that out into the failpoint_support location so that it's usable from more spaces (and make it support cancellation tokens like |
Problem
As #6092 points out, this test was (ab)using a failpoint!() with 'pause', which was occasionally causing index uploads to get hung on a stuck executor thread, resulting in timeouts waiting for remote_consistent_lsn.
That is one of several failure modes, but by far the most frequent.
Summary of changes
sleep_millis_async
, which is not only async but also supports clean shutdown.Checklist before requesting a review
Checklist before merging