Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
testsuite: improve reliability of some sharness tests #2058
I got tired of some common test failures, so I worked through some of the more unreliable tests. My test case was to run
I think I also finally worked around a race in the
Problem: there is an inherent race in the cron "min-interval" test, which ensures that a cron "event" entry will not run more than once per specified minimum interval. In the test, there is no way to guarantee that the check before the sleep runs before the interval has expired, and thus on busy or slow systems the test may fail because the stats.count is >1 instead of 1. Since there is no way to guarantee that our test command runs within the minimum interval specified, dump the idea of testing this way, and instead ensure the cron job was delayed by checking for a log message from the cron service.
For reasons shrouded in mystery, the backgroupd flux-kvs command in the job_wait_event() helper function was killed with signal 11 (segv) instead of 2 (sigint). Fix the argument to kill(1) to avoid dropping corefiles each time this test is run.
On slow or heavily loaded machines, some of the flux-aggregate cmds may take more than the 2s allowed by run_timeout. Since the timeout won't occur in the normal (non failing) case, there is no harm in increasing the timeout to 5s.
The wreck rc "personality" scripts were missed during the big wreck cleanout of '19. Remove these last remnants.
The t0001-basic.t 'test_under_flux' test is unreliable on slow or loaded machines. It might be that the 5s timeout is not quite long enough to get the underlying flux instance being spawned by test_under_flux instantiated and terminated. Double the timeout to 10s in hopes the test becomes more reliable.
For heavily loaded or slow machines, the bootstrap and shutdown of the flux instances used in the t2010-kvs-snapshot-restore test may take longer than even 3s. There is no downside to increasing the grace timeout even further, since under normal circumstances the instance will shutdown normally, so increase the grace time to 15s to handle pathological cases.
@@ Coverage Diff @@ ## master #2058 +/- ## ========================================== - Coverage 80.43% 80.42% -0.01% ========================================== Files 191 191 Lines 30252 30252 ========================================== - Hits 24333 24331 -2 - Misses 5919 5921 +2