test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4 #56572

vstinner · 2011-06-19T17:21:45Z

BPO	12363
Nosy	@pitrou, @vstinner
Files	test_siginterrupt.patch test_siginterrupt.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2011-06-23.09:49:14.064>
created_at = <Date 2011-06-19.17:21:45.265>
labels = []
title = 'test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4'
updated_at = <Date 2011-07-01.14:00:02.402>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2011-07-01.14:00:02.402>
actor = 'python-dev'
assignee = 'none'
closed = True
closed_date = <Date 2011-06-23.09:49:14.064>
closer = 'vstinner'
components = []
creation = <Date 2011-06-19.17:21:45.265>
creator = 'vstinner'
dependencies = []
files = ['22409', '22421']
hgrepos = []
issue_num = 12363
keywords = ['patch']
message_count = 13.0
messages = ['138647', '138652', '138664', '138761', '138762', '138763', '138793', '138803', '138832', '138833', '138861', '139579', '139582']
nosy_count = 4.0
nosy_names = ['pitrou', 'vstinner', 'neologix', 'python-dev']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue12363'
versions = []

vstinner · 2011-06-19T17:21:45Z

======================================================================
FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_signal.py", line 396, in test_siginterrupt_on
    self.assertTrue(i)
AssertionError: False is not true

http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1586/steps/test/logs/stdio
http://www.python.org/dev/buildbot/all/builders/x86%20FreeBSD%206.4%203.x/builds/1587/steps/test/logs/stdio

vstinner · 2011-06-19T17:58:00Z

Seen also on OpenSolaris:

test test_signal failed -- Traceback (most recent call last):
  File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_signal.py", line 399, in test_siginterrupt_on
    self.assertTrue(i)
AssertionError: False is not true

http://www.python.org/dev/buildbot/all/builders/AMD64%20OpenIndiana%203.x/builds/1443/steps/test/logs/stdio

vstinner · 2011-06-19T21:08:50Z

I tried to patch the test to use a semaphore, but my patch was not reliable (don't remove completly the race condition).

Here is a patch using a subprocess to:

have only one thread
have a timeout on the blocking read (select cannot be used in the test, select always fail with EINTR, the kernel doesn't restart it)
not touch signal handling of the parent process

It is also based on time: it uses alarm() to raise a signal in one second, and use an hardcoded timeout of 3 seconds. But it doesn't need tricky synchronization between two processes.

neologix · 2011-06-20T20:51:45Z

The patch looks good to me.
In the subprocess, why not use the standard 0 exit code in case of success?
Also, points 1 and 3 could be handled simply by having the parent
process send a signal to the child (but this wouldn't address the
timeout issue).

python-dev · 2011-06-20T21:28:06Z

New changeset 968b9ff9a059 by Victor Stinner in branch 'default':
Close bpo-12363: fix a race condition in siginterrupt() tests
http://hg.python.org/cpython/rev/968b9ff9a059

vstinner · 2011-06-20T21:28:17Z

In the subprocess, why not use the standard 0 exit code
in case of success?

Something outside my code may exit Python with the code 0. Even if it unlikely, I prefer to use uncommon exit codes, to ensure that the child process executed "correctly" my code. A better check would be to write a specific pattern to stdout, and check stdout, but it would be overkill.

Also, points 1 and 3 could be handled simply by having
the parent process send a signal to the child
(but this wouldn't address the timeout issue).

(Hum, points 1 and 3: "have only one thread" and "not touch signal handling of the parent process".)

True, but I would like to write a more reliable test, and I don't know how to synchronize two processes for this test case.

Because your first sentence was "The patch looks good to me.", let's try this new test in our buildbots.

neologix · 2011-06-21T19:09:54Z

The test still fails on FreeBSD 7.2, Tiger and Debian parallel:

"""
======================================================================
FAIL: test_siginterrupt_on (test.test_signal.SiginterruptTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py",
line 379, in test_siginterrupt_on
    self.assertTrue(interrupted)
AssertionError: False is not true

======================================================================
FAIL: test_without_siginterrupt (test.test_signal.SiginterruptTest)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd7/build/Lib/test/test_signal.py",
line 372, in test_without_siginterrupt
    self.assertTrue(interrupted)
AssertionError: False is not true
"""

Here's a sample strace output on my Linux box when it fails:

2120 21:00:18.755448 rt_sigaction(SIGALRM, {0x810b332, [], 0}, NULL,
8) = 0 <0.000019>
2120 21:00:18.755690 alarm(1) = 0 <0.000024>
2120 21:00:18.755803 read(3, <unfinished ...>
[...]
2120 21:00:19.755811 <... read resumed> 0xb72cad90, 1) = ?
ERESTARTSYS (To be restarted) <0.999973>
2120 21:00:19.755864 --- SIGALRM (Alarm clock) @ 0 (0) ---
2120 21:00:19.756094 sigreturn() = ? (mask now []) <0.000019>
2120 21:00:19.756434 alarm(1) = 0 <0.000023>
2120 21:00:19.756547 read(3, <unfinished ...>

sigaction is called without SA_RESTART, but the syscall is restarted anyway...
Apart from removing those tests, I don't see what we can do here.

neologix · 2011-06-21T21:05:57Z

Duh, don't know what I was thinking: the syscall is not restarted
(even though ERESTARTSYS is displayed by strace): the real problem is
that the 3s timeout to communicate is not enough, because spawning a
new interpreter can take a long time (Antoine created an issue some
time ago about the high number of syscalls per import). If it takes
more than 1s, the test will fail.
Also, I guess it's worst on FreeBSD because subprocess.Popen uses
close_fds=True by default, and FreeBSD has a huge default
_SC_OPEN_MAX.
I've attached a patch passing close_fds=False to spawn_python, and
running the test only once, because otherwise this would require a
timeout quite large.

python-dev · 2011-06-22T20:16:36Z

New changeset aff0a7b0cb12 by Victor Stinner in branch 'default':
Issue bpo-12363: improve siginterrupt() tests
http://hg.python.org/cpython/rev/aff0a7b0cb12

vstinner · 2011-06-22T20:20:43Z

Apart from removing those tests, I don't see what we can do here.

The previous version of the test rarely failed (only sometimes on the FreeBSD 6.4 buildbox). We may revert my commits to restore the previous test if the new tests fail more often.

the real problem is that the 3s timeout to communicate is not
enough, because spawning a new interpreter can take a long time

Right, I added a basic synchronization code to wait until the beginning of the test. Let see if it is better or worse ;-)

vstinner · 2011-06-23T09:49:14Z

All Python 3.x buildbots are green (except FreeBSD 7.2, but it's not related to this issue). Let close this issue.

python-dev · 2011-07-01T13:26:04Z

New changeset 8250f04d5a41 by Victor Stinner in branch '3.2':
Issue bpo-12363: improve siginterrupt() tests
http://hg.python.org/cpython/rev/8250f04d5a41

python-dev · 2011-07-01T14:00:02Z

New changeset 3f30cfe51315 by Victor Stinner in branch '3.2':
Issue bpo-12363: increase the timeout of siginterrupt() tests
http://hg.python.org/cpython/rev/3f30cfe51315

New changeset 423268537083 by Victor Stinner in branch 'default':
(merge 3.2) Issue bpo-12363: increase the timeout of siginterrupt() tests
http://hg.python.org/cpython/rev/423268537083

python-dev mannequin closed this as completed Jun 20, 2011

neologix mannequin reopened this Jun 21, 2011

vstinner closed this as completed Jun 23, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4 #56572

test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4 #56572

vstinner commented Jun 19, 2011

vstinner commented Jun 19, 2011

vstinner commented Jun 19, 2011

vstinner commented Jun 19, 2011

neologix mannequin commented Jun 20, 2011

python-dev mannequin commented Jun 20, 2011

vstinner commented Jun 20, 2011

neologix mannequin commented Jun 21, 2011

neologix mannequin commented Jun 21, 2011

python-dev mannequin commented Jun 22, 2011

vstinner commented Jun 22, 2011

vstinner commented Jun 23, 2011

python-dev mannequin commented Jul 1, 2011

python-dev mannequin commented Jul 1, 2011

test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4 #56572

test_signal.test_without_siginterrupt() sporadic failures on FreeBSD 6.4 #56572

Comments

vstinner commented Jun 19, 2011

vstinner commented Jun 19, 2011

vstinner commented Jun 19, 2011

vstinner commented Jun 19, 2011

neologix mannequin commented Jun 20, 2011

python-dev mannequin commented Jun 20, 2011

vstinner commented Jun 20, 2011

neologix mannequin commented Jun 21, 2011

neologix mannequin commented Jun 21, 2011

python-dev mannequin commented Jun 22, 2011

vstinner commented Jun 22, 2011

vstinner commented Jun 23, 2011

python-dev mannequin commented Jul 1, 2011

python-dev mannequin commented Jul 1, 2011