-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.7] test_io: race condition in test_interrupted_write_text() (test_io hangs on x86 Gentoo Refleaks 2.7) #75912
Comments
test_io is running since longer than 5 hours on x86 Gentoo Refleaks 2.7: http://buildbot.python.org/all/#/builders/78/builds/1/steps/4/logs/stdio 2:12:11 load avg: 3.26 [402/403] test_contains passed -- running: test_io (1739 sec) http://buildbot.python.org/all/#/builders/78/builds/1/steps/4/logs/stdio |
I interrupted the build. |
test_io was blocked a second time: http://buildbot.python.org/all/#/builders/78/builds/3 running: test_io (68517 sec) |
I didn't see the failure recently. I close the issue. |
test_io is currently hung on that builder: buildbot 13541 0.0 0.1 4920 2508 pts/1 Ss+ May28 0:00 \_ make buildbottest TESTOPTS=-j2 -R 3:3 -u-cpu TESTPYTHONOPTS= TESTTIMEOUT=11700 |
I succeeded to attach gdb to a regrtest slave process stuck in test_io... but somehow attaching the process into gdb unblocked the process... and the test completed. Sadly, the first time that I attached the process, I forgot to allow a directory to load python-gdb.py. So I had to detach and attach again the process to get a working "py-bt" command. Maybe this action unblocked Python. It's hard to say. (gdb) py-bt
Traceback (most recent call first):
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1126, in _flush_unlocked
def _flush_unlocked(self):
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1104, in write
self._flush_unlocked()
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/_pyio.py", line 1302, in write
return BufferedWriter.write(self, b)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 1186, in check_writes
self.assertEqual(bufio.write(contents[n:n+size]), size)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 1723, in test_writes_and_peek
self.check_writes(_peek)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 329, in run
testMethod()
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/case.py", line 393, in __call__
return self.run(*args, **kwds)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 108, in run
test(result)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 70, in __call__
return self.run(*args, **kwds)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 108, in run
test(result)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/unittest/suite.py", line 70, in __call__
return self.run(*args, **kwds)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1461, in run
test(result)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1535, in _run_suite
result = runner.run(suite)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/support/__init__.py", line 1626, in run_unittest
_run_suite(suite)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/test_io.py", line 3367, in test_main
support.run_unittest(*tests)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1361, in run_the_test
indirect_test()
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1375, in dash_R
run_the_test()
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1239, in runtest_inner
huntrleaks)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 1044, in runtest
return runtest_inner(test, verbose, quiet, huntrleaks, pgo, testdir)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 513, in main
result = runtest(*args, **kwargs)
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 2025, in main_in_temp_cwd
main()
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/test/regrtest.py", line 2038, in <module>
main_in_temp_cwd()
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 72, in _run_code
exec code in run_globals
File "/buildbot/buildarea/2.7.ware-gentoo-x86.refleak/build/Lib/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name) (gdb) where |
Hum, I succeeded to reproduce a hang in test_io using the command: buildbot@stormageddon ~/python27 $ ./python -m test -v -m 'test_interrupted*' -F test_io ---------------------------------------------------------------------- OK Traces in gdb of the stuck process: ***
(gdb) py-bt
Traceback (most recent call first):
Waiting for the GIL
<built-in method acquire of thread.lock object at remote 0xb6c09258>
File "/buildbot/python27/Lib/threading.py", line 340, in wait
waiter.acquire()
File "/buildbot/python27/Lib/threading.py", line 940, in join
self.__block.wait()
File "/buildbot/python27/Lib/test/test_io.py", line 3161, in check_interrupted_write
t.join()
File "/buildbot/python27/Lib/test/test_io.py", line 3186, in test_interrupted_write_text
self.check_interrupted_write("xy", b"xy", mode="w", encoding="ascii")
File "/buildbot/python27/Lib/unittest/case.py", line 329, in run
testMethod()
File "/buildbot/python27/Lib/unittest/case.py", line 393, in __call__
return self.run(*args, **kwds)
File "/buildbot/python27/Lib/unittest/suite.py", line 108, in run
test(result)
File "/buildbot/python27/Lib/unittest/suite.py", line 70, in __call__
return self.run(*args, **kwds)
File "/buildbot/python27/Lib/unittest/suite.py", line 108, in run
test(result)
File "/buildbot/python27/Lib/unittest/suite.py", line 70, in __call__
return self.run(*args, **kwds)
File "/buildbot/python27/Lib/unittest/runner.py", line 151, in run
test(result)
File "/buildbot/python27/Lib/test/support/__init__.py", line 1535, in _run_suite
result = runner.run(suite)
File "/buildbot/python27/Lib/test/support/__init__.py", line 1626, in run_unittest
_run_suite(suite)
File "/buildbot/python27/Lib/test/test_io.py", line 3367, in test_main
support.run_unittest(*tests)
File "/buildbot/python27/Lib/test/regrtest.py", line 1236, in runtest_inner
indirect_test()
File "/buildbot/python27/Lib/test/regrtest.py", line 1044, in runtest
return runtest_inner(test, verbose, quiet, huntrleaks, pgo, testdir)
File "/buildbot/python27/Lib/test/regrtest.py", line 837, in local_runtest
testdir=testdir)
File "/buildbot/python27/Lib/test/regrtest.py", line 851, in main
result = local_runtest()
File "/buildbot/python27/Lib/test/regrtest.py", line 2025, in main_in_temp_cwd
main()
File "/buildbot/python27/Lib/test/__main__.py", line 3, in <module>
regrtest.main_in_temp_cwd()
File "/buildbot/python27/Lib/runpy.py", line 72, in _run_code
exec code in run_globals
File "/buildbot/python27/Lib/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name) (gdb) info threads
(gdb) t 2 (gdb) py-bt
Traceback (most recent call first):
<built-in function read>
File "/buildbot/python27/Lib/test/test_io.py", line 3141, in _read
s = os.read(r, 1)
File "/buildbot/python27/Lib/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/buildbot/python27/Lib/threading.py", line 801, in __bootstrap_inner
self.run()
File "/buildbot/python27/Lib/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
*** And now I recall how much I hate this test. It made me crazy in Python 2 for a long time. This specific test was my motiviation to add signal.pthread_sigmask() to Python 3.3. Note: I also implemented the PEP-475 in Python 3.5 to make Python more reliable when getting signals. |
Sadly, issues with test_io.test_interrupted*() are old. See for example the bpo-23680. Two years ago, Martin Panter saw test_io.test_interrupted_write_text() hangs on Python 3.6 and on AMD64 FreeBSD 9.x 3.5: Martin proposed a fix using signal.pthread_kill() rather than scheduling a SIGALRM signal in 1 second, but the fix was not merged and I closed the issue since I didn't see the issue recently. |
I'm unable to reproduce this issue on my Fedora 28 laptop (Linux kernel 4.16.11-300.fc28.x86_64). In the previous message, I reproduced the issue directly on Zach's Gentoo buildbot builder. |
test_io hangs often on Gentoo Refleaks 2.7 buildbot: Zachary Ware, who owns the builder, has to interrupt the test regularly. So I disabled test_io in regrtest when --huntrleaks is used, since at least one bug has been identified in test_io in test_interrupted*(). |
AMD64 FreeBSD 10.x Shared 3.7 issue: http://buildbot.python.org/all/#/builders/124/builds/380 0:15:14 load avg: 0.18 [415/415/1] test_io crashed (Exit code 1) Thread 0x0000000802006400 (most recent call first): |
I wrote PR 11225 which may fix it. |
The test can be fixed in Python 2.7 by exposing pthread_sigmask() somehow, or at least pthread_sigmask(SIG_BLOCK, [SIGALARM]), but honestly, I don't think that it's worth it. The test only very rarely hangs, and the bug has already been fixed in Python 3. I close the issue. I fixed a race condition in test_io of Python 3. (The bot will shortly backport the fix to 3.7). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: