[tests] time out tests in test_all after a duration #4780

sunshowers · 2024-01-08T22:33:44Z

We're seeing some tests in test_all hang forever (#4779). Set a reasonable
upper bound on test duration.

This will also cause stdout and stderr for failing tests to be printed. Doing
so on SIGTERM in general is tracked at
nextest-rs/nextest#1208.

Also, bump up the required nextest version to 0.9.64 to make use of the
binary_id predicate.

Created using spr 1.3.5

smklein · 2024-01-08T23:37:59Z

.config/nextest.toml

+[[profile.default.overrides]]
+filter = 'binary_id(omicron-nexus::test_all)'
+# As of 2023-01-08, the slowest test in test_all takes 196s on a Ryzen 7950X.
+# 900s is a good upper limit that adds a comfortable buffer.
+slow-timeout = { period = '60s', terminate-after = 15 }


It's not great, but I've seen tests on my laptop that take up to 1000 seconds -- the saga 'test_action_failure_can_unwind' tests are particularly slow, and definitely need optimizing.

I'm not even sure I'd say this timeout is "wrong" - that test needs fixing - but figured I'd mention it regardless

Hmm, that test isn't part of test_all.

smklein · 2024-01-08T23:38:03Z

.config/nextest.toml

+[[profile.default.overrides]]
+filter = 'binary_id(omicron-nexus::test_all)'
+# As of 2023-01-08, the slowest test in test_all takes 196s on a Ryzen 7950X.
+# 900s is a good upper limit that adds a comfortable buffer.
+slow-timeout = { period = '60s', terminate-after = 15 }


It's not great, but I've seen tests on my laptop that take up to 1000 seconds -- the saga 'test_action_failure_can_unwind' tests are particularly slow, and definitely need optimizing.

I'm not even sure I'd say this timeout is "wrong" - that test needs fixing - but figured I'd mention it regardless

davepacheco · 2024-01-08T23:49:57Z

FWIW there is a timeout already on the whole test run and we went with a few hours:

omicron/.github/buildomat/build-and-test.sh

Line 62 in cde9b15

ptime -m timeout 2h cargo nextest run --profile ci --locked --verbose

Clulow's Lament applies here: it's a tradeoff between false positives (spurious test failures because we didn't wait long enough) vs. taking too long to get feedback when they're hung. Personally, if a test hangs when I'm running it locally, I'd always rather it hang (and not kill it) because that will preserve a lot of useful runtime state I might want to debug it. In CI I don't think I care either way. But false positives here would certainly suck.

sunshowers · 2024-01-08T23:51:04Z

In CI I don't think I care either way. But false positives here would certainly suck.

Fair, I'll restrict it to CI.

Created using spr 1.3.5

sunshowers · 2024-01-08T23:56:37Z

FWIW there is a timeout already on the whole test run and we went with a few hours:

Yes -- there's currently a bug in nextest where it isn't printing out stdout/stderr on SIGTERM (nextest-rs/nextest#1208) -- I'll get around to fixing it soon but wanted to get some output going before then.

Created using spr 1.3.5

[𝘀𝗽𝗿] initial version

1bab802

Created using spr 1.3.5

sunshowers requested a review from smklein January 8, 2024 22:33

smklein reviewed Jan 8, 2024

View reviewed changes

smklein approved these changes Jan 8, 2024

View reviewed changes

restrict timeouts to CI

bd5ac2b

Created using spr 1.3.5

sunshowers enabled auto-merge (squash) January 9, 2024 00:07

sunshowers added 2 commits January 8, 2024 20:50

Rebase

4ba5a41

Created using spr 1.3.5

Fix, whoops

a690f63

Created using spr 1.3.5

sunshowers mentioned this pull request Jan 9, 2024

Test flake: Failed to start ClickHouse keeper 1: waiting to discover if ClickHouse is ready for connections #4782

Closed

sunshowers disabled auto-merge January 9, 2024 06:59

sunshowers enabled auto-merge (squash) January 9, 2024 06:59

sunshowers mentioned this pull request Jan 9, 2024

Some tests in omicron-nexus::test_all stall out and hang #4779

Open

sunshowers merged commit 9fe8a3c into main Jan 9, 2024
20 checks passed

sunshowers deleted the sunshowers/spr/tests-time-out-tests-in-test_all-after-a-duration branch January 9, 2024 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tests] time out tests in test_all after a duration #4780

[tests] time out tests in test_all after a duration #4780

sunshowers commented Jan 8, 2024 •

edited

Loading

smklein Jan 8, 2024

sunshowers Jan 8, 2024

smklein Jan 8, 2024

davepacheco commented Jan 8, 2024

sunshowers commented Jan 8, 2024

sunshowers commented Jan 8, 2024

[tests] time out tests in test_all after a duration #4780

[tests] time out tests in test_all after a duration #4780

Conversation

sunshowers commented Jan 8, 2024 • edited Loading

smklein Jan 8, 2024

Choose a reason for hiding this comment

sunshowers Jan 8, 2024

Choose a reason for hiding this comment

smklein Jan 8, 2024

Choose a reason for hiding this comment

davepacheco commented Jan 8, 2024

sunshowers commented Jan 8, 2024

sunshowers commented Jan 8, 2024

sunshowers commented Jan 8, 2024 •

edited

Loading