-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testsuite: fix some test races and improve debugging #5609
Commits on Dec 7, 2023
-
testsuite: skip a cmddriver test when testing installed flux(1)
Problem: One of the tests in t1102-cmddriver.t assumes that the `flux` executable will not be found first in /bin or /usr/bin, but this may not be the case when FLUX_TEST_INSTALLED_PATH is being used. Skip the test if flux is found first in /bin or /usr/bin.
Configuration menu - View commit details
-
Copy full SHA for 15a8873 - Browse repository at this point
Copy the full SHA 15a8873View commit details -
testsuite: improve reliability of flux-top queue tests
Problem: The queue output tests in t2801-top-cmd.t often fail in CI. The issue seems to be related to the checks for 0 matching lines at the end of some of the tests. As a guess, sometimes one or more lines of output match due to timing issues, and so the test fails. Just remove these overly punctilious checks.
Configuration menu - View commit details
-
Copy full SHA for 4b4e611 - Browse repository at this point
Copy the full SHA 4b4e611View commit details -
testsuite: avoid inherently racy test in t0015-job-output.py
Problem: A test in t0015-job-output.py attempts to ensure that a FileNotFoundError is returned from a job output watch when the job guest namespace has been created, but the output eventlog does not yet exist. But this test has a race if the job is released before the output watch RPC is recieved by the job-info module, and there is no good way to avoid the race during the test since the `output_event_watch` call is synchronous and called from a thread in the test Just remove the racy test so that occasional failures in CI are avoided.
Configuration menu - View commit details
-
Copy full SHA for e3b439a - Browse repository at this point
Copy the full SHA e3b439aView commit details -
testsuite: add some debugging in sporadically failing tests
Problem: There are a couple tests that occasionally fail in ci without any clues as to the source of the failure. Add debugging to these tests for future use.
Configuration menu - View commit details
-
Copy full SHA for ad1d894 - Browse repository at this point
Copy the full SHA ad1d894View commit details -
testsuite: fix potential races in t2201-job-cmd.t
Problem: The multiple job kill tests in t2201-job-cmd.t occasionally fail in ci. These tests will be more reliable if the test waits for the shell.init event, so add that synchronization to affected tests.
Configuration menu - View commit details
-
Copy full SHA for 5d1d742 - Browse repository at this point
Copy the full SHA 5d1d742View commit details -
testsuite: do not suppress stderr in t2607-job-shell-input.t
Problem: The job shell input tests in t2607-job-shell-input.t redirect stderr to files that are then never used, making debugging the tests in ci impossible. Do not suppress stderr so errors appear in ci logs.
Configuration menu - View commit details
-
Copy full SHA for 177eb0a - Browse repository at this point
Copy the full SHA 177eb0aView commit details -
testsuite: skip t3307-system-leafcrash.t with --chain-lint
Problem: t3307-system-leafcrash.t has been seen to hang when used with --chain-lint. The assumption is that this series of tests become racy with --chain-lint because some tests end up being skipped. Just skip the entire test when --chain-lint is used.
Configuration menu - View commit details
-
Copy full SHA for 0d27137 - Browse repository at this point
Copy the full SHA 0d27137View commit details -
testsuite: improve reliability of t2712-python-cli-alloc.t
Problem: The test `flux alloc --bg can be interrupted` fails sporadically in ci. This may be due to an inherent race condition handling SIGINT. Send SIGINT a second time in the test to hopefully increase reliability.
Configuration menu - View commit details
-
Copy full SHA for 9d7fc42 - Browse repository at this point
Copy the full SHA 9d7fc42View commit details -
testsuite: fix possible race in t0005-exec.t
Problem: The 'signal forwarding works' test occasionally fails in ci. A potential issue is that the test_signal.sh test is run twice, once with SIGINT and once with SIGTERM, and the sleepready.out logfile is not cleaned up between the two runs. If the subsequent waitfail invocation starts before the background process overwrites the existing sleepready.out, then the test could prematurely send SIGTERM before flux-exec is ready. Remove the output file in the test script to avoid this race.
Configuration menu - View commit details
-
Copy full SHA for 20a25d9 - Browse repository at this point
Copy the full SHA 20a25d9View commit details -
python: handle missing /proc/PID/environ in LSF uri resolver
Problem: There's a small race in the LSF uri resolver if a flux-broker process for the calling user exits between when it is listed by ps(1) and the resolver attempts to open the /proc/PID/environ file. In this case, instead of continuing to try other brokers `flux-uri` aborts with flux-uri: ERROR: No such file or directory: '/proc/PID/environ' Catch the FileNotFound exception when trying to read /proc/PID/environ and treat this the same as if the broker did not match the LSF jobid.
Configuration menu - View commit details
-
Copy full SHA for 146f1ff - Browse repository at this point
Copy the full SHA 146f1ffView commit details