Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testsuite: fix some test races and improve debugging #5609

Merged
merged 10 commits into from
Dec 7, 2023

Commits on Dec 7, 2023

  1. testsuite: skip a cmddriver test when testing installed flux(1)

    Problem: One of the tests in t1102-cmddriver.t assumes that the `flux`
    executable will not be found first in /bin or /usr/bin, but this may
    not be the case when FLUX_TEST_INSTALLED_PATH is being used.
    
    Skip the test if flux is found first in /bin or /usr/bin.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    15a8873 View commit details
    Browse the repository at this point in the history
  2. testsuite: improve reliability of flux-top queue tests

    Problem: The queue output tests in t2801-top-cmd.t often fail in
    CI.
    
    The issue seems to be related to the checks for 0 matching lines
    at the end of some of the tests. As a guess, sometimes one or more
    lines of output match due to timing issues, and so the test fails.
    
    Just remove these overly punctilious checks.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    4b4e611 View commit details
    Browse the repository at this point in the history
  3. testsuite: avoid inherently racy test in t0015-job-output.py

    Problem: A test in t0015-job-output.py attempts to ensure that a
    FileNotFoundError is returned from a job output watch when the job
    guest namespace has been created, but the output eventlog does
    not yet exist. But this test has a race if the job is released
    before the output watch RPC is recieved by the job-info module,
    and there is no good way to avoid the race during the test since the
    `output_event_watch` call is synchronous and called from a thread in
    the test
    
    Just remove the racy test so that occasional failures in CI are
    avoided.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    e3b439a View commit details
    Browse the repository at this point in the history
  4. testsuite: add some debugging in sporadically failing tests

    Problem: There are a couple tests that occasionally fail in ci without
    any clues as to the source of the failure.
    
    Add debugging to these tests for future use.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    ad1d894 View commit details
    Browse the repository at this point in the history
  5. testsuite: fix potential races in t2201-job-cmd.t

    Problem: The multiple job kill tests in t2201-job-cmd.t occasionally
    fail in ci.
    
    These tests will be more reliable if the test waits for the shell.init
    event, so add that synchronization to affected tests.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    5d1d742 View commit details
    Browse the repository at this point in the history
  6. testsuite: do not suppress stderr in t2607-job-shell-input.t

    Problem: The job shell input tests in t2607-job-shell-input.t redirect
    stderr to files that are then never used, making debugging the tests
    in ci impossible.
    
    Do not suppress stderr so errors appear in ci logs.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    177eb0a View commit details
    Browse the repository at this point in the history
  7. testsuite: skip t3307-system-leafcrash.t with --chain-lint

    Problem: t3307-system-leafcrash.t has been seen to hang when used
    with --chain-lint.
    
    The assumption is that this series of tests become racy with
    --chain-lint because some tests end up being skipped. Just skip the
    entire test when --chain-lint is used.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    0d27137 View commit details
    Browse the repository at this point in the history
  8. testsuite: improve reliability of t2712-python-cli-alloc.t

    Problem: The test `flux alloc --bg can be interrupted` fails
    sporadically in ci.
    
    This may be due to an inherent race condition handling SIGINT.  Send
    SIGINT a second time in the test to hopefully increase reliability.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    9d7fc42 View commit details
    Browse the repository at this point in the history
  9. testsuite: fix possible race in t0005-exec.t

    Problem: The 'signal forwarding works' test occasionally fails in ci.
    
    A potential issue is that the test_signal.sh test is run twice,
    once with SIGINT and once with SIGTERM, and the sleepready.out
    logfile is not cleaned up between the two runs. If the subsequent
    waitfail invocation starts before the background process overwrites
    the existing sleepready.out, then the test could prematurely send
    SIGTERM before flux-exec is ready.
    
    Remove the output file in the test script to avoid this race.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    20a25d9 View commit details
    Browse the repository at this point in the history
  10. python: handle missing /proc/PID/environ in LSF uri resolver

    Problem: There's a small race in the LSF uri resolver if a flux-broker
    process for the calling user exits between when it is listed by ps(1)
    and the resolver attempts to open the /proc/PID/environ file. In this
    case, instead of continuing to try other brokers `flux-uri` aborts with
    
     flux-uri: ERROR: No such file or directory: '/proc/PID/environ'
    
    Catch the FileNotFound exception when trying to read /proc/PID/environ
    and treat this the same as if the broker did not match the LSF jobid.
    grondo committed Dec 7, 2023
    Configuration menu
    Copy the full SHA
    146f1ff View commit details
    Browse the repository at this point in the history