-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests] time out tests in test_all after a duration #4780
[tests] time out tests in test_all after a duration #4780
Conversation
Created using spr 1.3.5
.config/nextest.toml
Outdated
[[profile.default.overrides]] | ||
filter = 'binary_id(omicron-nexus::test_all)' | ||
# As of 2023-01-08, the slowest test in test_all takes 196s on a Ryzen 7950X. | ||
# 900s is a good upper limit that adds a comfortable buffer. | ||
slow-timeout = { period = '60s', terminate-after = 15 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not great, but I've seen tests on my laptop that take up to 1000 seconds -- the saga 'test_action_failure_can_unwind' tests are particularly slow, and definitely need optimizing.
I'm not even sure I'd say this timeout is "wrong" - that test needs fixing - but figured I'd mention it regardless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that test isn't part of test_all
.
.config/nextest.toml
Outdated
[[profile.default.overrides]] | ||
filter = 'binary_id(omicron-nexus::test_all)' | ||
# As of 2023-01-08, the slowest test in test_all takes 196s on a Ryzen 7950X. | ||
# 900s is a good upper limit that adds a comfortable buffer. | ||
slow-timeout = { period = '60s', terminate-after = 15 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not great, but I've seen tests on my laptop that take up to 1000 seconds -- the saga 'test_action_failure_can_unwind' tests are particularly slow, and definitely need optimizing.
I'm not even sure I'd say this timeout is "wrong" - that test needs fixing - but figured I'd mention it regardless
FWIW there is a timeout already on the whole test run and we went with a few hours:
Clulow's Lament applies here: it's a tradeoff between false positives (spurious test failures because we didn't wait long enough) vs. taking too long to get feedback when they're hung. Personally, if a test hangs when I'm running it locally, I'd always rather it hang (and not kill it) because that will preserve a lot of useful runtime state I might want to debug it. In CI I don't think I care either way. But false positives here would certainly suck. |
Fair, I'll restrict it to CI. |
Created using spr 1.3.5
Yes -- there's currently a bug in nextest where it isn't printing out stdout/stderr on SIGTERM (nextest-rs/nextest#1208) -- I'll get around to fixing it soon but wanted to get some output going before then. |
Created using spr 1.3.5
We're seeing some tests in
test_all
hang forever (#4779). Set a reasonableupper bound on test duration.
This will also cause stdout and stderr for failing tests to be printed. Doing
so on SIGTERM in general is tracked at
nextest-rs/nextest#1208.
Also, bump up the required nextest version to 0.9.64 to make use of the
binary_id
predicate.