Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4 #56572

Closed
vstinner opened this issue Jun 19, 2011 · 13 comments
Closed

Comments

@vstinner
Copy link
Member

BPO 12363
Nosy @pitrou, @vstinner
Files
  • test_siginterrupt.patch
  • test_siginterrupt.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-06-23.09:49:14.064>
    created_at = <Date 2011-06-19.17:21:45.265>
    labels = []
    title = 'test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4'
    updated_at = <Date 2011-07-01.14:00:02.402>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2011-07-01.14:00:02.402>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-06-23.09:49:14.064>
    closer = 'vstinner'
    components = []
    creation = <Date 2011-06-19.17:21:45.265>
    creator = 'vstinner'
    dependencies = []
    files = ['22409', '22421']
    hgrepos = []
    issue_num = 12363
    keywords = ['patch']
    message_count = 13.0
    messages = ['138647', '138652', '138664', '138761', '138762', '138763', '138793', '138803', '138832', '138833', '138861', '139579', '139582']
    nosy_count = 4.0
    nosy_names = ['pitrou', 'vstinner', 'neologix', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue12363'
    versions = []

    @vstinner
    Copy link
    Member Author

    ======================================================================
    FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_signal.py", line 396, in test_siginterrupt_on
        self.assertTrue(i)
    AssertionError: False is not true

    http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1586/steps/test/logs/stdio
    http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1587/steps/test/logs/stdio

    @vstinner
    Copy link
    Member Author

    Seen also on OpenSolaris:

    test test_signal failed -- Traceback (most recent call last):
      File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_signal.py", line 399, in test_siginterrupt_on
        self.assertTrue(i)
    AssertionError: False is not true

    http://www.python.org/dev/buildbot/all/builders/AMD64%20OpenIndiana%203.x/builds/1443/steps/test/logs/stdio

    @vstinner
    Copy link
    Member Author

    I tried to patch the test to use a semaphore, but my patch was not reliable (don't remove completly the race condition).

    Here is a patch using a subprocess to:

    • have only one thread
    • have a timeout on the blocking read (select cannot be used in the test, select always fail with EINTR, the kernel doesn't restart it)
    • not touch signal handling of the parent process

    It is also based on time: it uses alarm() to raise a signal in one second, and use an hardcoded timeout of 3 seconds. But it doesn't need tricky synchronization between two processes.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 20, 2011

    The patch looks good to me.
    In the subprocess, why not use the standard 0 exit code in case of success?
    Also, points 1 and 3 could be handled simply by having the parent
    process send a signal to the child (but this wouldn't address the
    timeout issue).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 20, 2011

    New changeset 968b9ff9a059 by Victor Stinner in branch 'default':
    Close bpo-12363: fix a race condition in siginterrupt() tests
    http://hg.python.org/cpython/rev/968b9ff9a059

    @python-dev python-dev mannequin closed this as completed Jun 20, 2011
    @vstinner
    Copy link
    Member Author

    In the subprocess, why not use the standard 0 exit code
    in case of success?

    Something outside my code may exit Python with the code 0. Even if it unlikely, I prefer to use uncommon exit codes, to ensure that the child process executed "correctly" my code. A better check would be to write a specific pattern to stdout, and check stdout, but it would be overkill.

    Also, points 1 and 3 could be handled simply by having
    the parent process send a signal to the child
    (but this wouldn't address the timeout issue).

    (Hum, points 1 and 3: "have only one thread" and "not touch signal handling of the parent process".)

    True, but I would like to write a more reliable test, and I don't know how to synchronize two processes for this test case.

    Because your first sentence was "The patch looks good to me.", let's try this new test in our buildbots.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 21, 2011

    The test still fails on FreeBSD 7.2, Tiger and Debian parallel:

    """
    ======================================================================
    FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py",
    line 379, in test_siginterrupt_on
        self.assertTrue(interrupted)
    AssertionError: False is not true

    ======================================================================
    FAIL: test_without_siginterrupt (test.test_signal.SiginterruptTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py",
    line 372, in test_without_siginterrupt
        self.assertTrue(interrupted)
    AssertionError: False is not true
    """

    Here's a sample strace output on my Linux box when it fails:

    2120 21:00:18.755448 rt_sigaction(SIGALRM, {0x810b332, [], 0}, NULL,
    8) = 0 <0.000019>
    2120 21:00:18.755690 alarm(1) = 0 <0.000024>
    2120 21:00:18.755803 read(3, <unfinished ...>
    [...]
    2120 21:00:19.755811 <... read resumed> 0xb72cad90, 1) = ?
    ERESTARTSYS (To be restarted) <0.999973>
    2120 21:00:19.755864 --- SIGALRM (Alarm clock) @ 0 (0) ---
    2120 21:00:19.756094 sigreturn() = ? (mask now []) <0.000019>
    2120 21:00:19.756434 alarm(1) = 0 <0.000023>
    2120 21:00:19.756547 read(3, <unfinished ...>

    sigaction is called without SA_RESTART, but the syscall is restarted anyway...
    Apart from removing those tests, I don't see what we can do here.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jun 21, 2011

    Duh, don't know what I was thinking: the syscall is not restarted
    (even though ERESTARTSYS is displayed by strace): the real problem is
    that the 3s timeout to communicate is not enough, because spawning a
    new interpreter can take a long time (Antoine created an issue some
    time ago about the high number of syscalls per import). If it takes
    more than 1s, the test will fail.
    Also, I guess it's worst on FreeBSD because subprocess.Popen uses
    close_fds=True by default, and FreeBSD has a huge default
    _SC_OPEN_MAX.
    I've attached a patch passing close_fds=False to spawn_python, and
    running the test only once, because otherwise this would require a
    timeout quite large.

    @neologix neologix mannequin reopened this Jun 21, 2011
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jun 22, 2011

    New changeset aff0a7b0cb12 by Victor Stinner in branch 'default':
    Issue bpo-12363: improve siginterrupt() tests
    http://hg.python.org/cpython/rev/aff0a7b0cb12

    @vstinner
    Copy link
    Member Author

    Apart from removing those tests, I don't see what we can do here.

    The previous version of the test rarely failed (only sometimes on the FreeBSD 6.4 buildbox). We may revert my commits to restore the previous test if the new tests fail more often.

    the real problem is that the 3s timeout to communicate is not
    enough, because spawning a new interpreter can take a long time

    Right, I added a basic synchronization code to wait until the beginning of the test. Let see if it is better or worse ;-)

    @vstinner
    Copy link
    Member Author

    All Python 3.x buildbots are green (except FreeBSD 7.2, but it's not related to this issue). Let close this issue.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 1, 2011

    New changeset 8250f04d5a41 by Victor Stinner in branch '3.2':
    Issue bpo-12363: improve siginterrupt() tests
    http://hg.python.org/cpython/rev/8250f04d5a41

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 1, 2011

    New changeset 3f30cfe51315 by Victor Stinner in branch '3.2':
    Issue bpo-12363: increase the timeout of siginterrupt() tests
    http://hg.python.org/cpython/rev/3f30cfe51315

    New changeset 423268537083 by Victor Stinner in branch 'default':
    (merge 3.2) Issue bpo-12363: increase the timeout of siginterrupt() tests
    http://hg.python.org/cpython/rev/423268537083

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant