Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_socket: testCmsgTrunc0() deadlock on ARM Raspbian 3.x #110167

Closed
vstinner opened this issue Oct 1, 2023 · 1 comment
Closed

test_socket: testCmsgTrunc0() deadlock on ARM Raspbian 3.x #110167

vstinner opened this issue Oct 1, 2023 · 1 comment
Labels
tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

vstinner commented Oct 1, 2023

test_socket adds a lock on addCleanup:

class ThreadSafeCleanupTestCase:
    """Subclass of unittest.TestCase with thread-safe cleanup methods.

    This subclass protects the addCleanup() and doCleanups() methods
    with a recursive lock.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._cleanup_lock = threading.RLock()

    def addCleanup(self, *args, **kwargs):
        with self._cleanup_lock:
            return super().addCleanup(*args, **kwargs)

    def doCleanups(self, *args, **kwargs):
        with self._cleanup_lock:
            return super().doCleanups(*args, **kwargs)

Problem: what happens if a thread calls addCleanup() while the main thread is calling doCleanups()? Well, a deadlock.

ARM Raspbian 3.x:

0:40:31 load avg: 3.05 [337/467/1] test_socket worker non-zero exit code (Exit code 1) -- running (1): test_math (1 min 4 sec)
Timeout (0:40:00)!
Thread 0xf5dea440 (most recent call first):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_socket.py", line 230 in addCleanup
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_socket.py", line 3576 in newFDs
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_socket.py", line 3611 in createAndSendFDs
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_socket.py", line 3841 in _testCmsgTrunc0
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_socket.py", line 402 in clientRun

Thread 0xf7acb040 (most recent call first):
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/threading.py", line 348 in wait
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/threading.py", line 648 in wait
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/case.py", line 597 in _callCleanup
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/case.py", line 673 in doCleanups
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/test_socket.py", line 235 in doCleanups
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/case.py", line 640 in run
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/case.py", line 692 in __call__
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/suite.py", line 122 in run
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/suite.py", line 84 in __call__
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/suite.py", line 122 in run
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/suite.py", line 84 in __call__
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/suite.py", line 122 in run
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/suite.py", line 84 in __call__
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/unittest/runner.py", line 240 in run
  File "/var/lib/buildbot/workers/3.x.gps-raspbian.nondebug/build/Lib/test/support/__init__.py", line 1155 in _run_suite
  ...

build: https://buildbot.python.org/all/#/builders/424/builds/5065

Linked PRs

@vstinner vstinner added the tests Tests in the Lib/test dir label Oct 1, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Oct 5, 2023
Increase support.LOOPBACK_TIMEOUT from 5 to 10 seconds. Also increase
the timeout depending on the --timeout option. For example, for a
test timeout of 40 minutes (ARM Raspbian 3.x), use LOOPBACK_TIMEOUT
of 20 seconds instead of 5 seconds before.
@vstinner
Copy link
Member Author

vstinner commented Oct 5, 2023

Patch to trigger the bug:

diff --git a/Lib/test/test_socket.py b/Lib/test/test_socket.py
index 09605f7e77..d3638bccc2 100644
--- a/Lib/test/test_socket.py
+++ b/Lib/test/test_socket.py
@@ -215,6 +215,8 @@ def setUp(self):
         self.serv = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDPLITE)
         self.port = socket_helper.bind_port(self.serv)
 
+CLEANUP_EVENT = threading.Event()
+
 class ThreadSafeCleanupTestCase:
     """Subclass of unittest.TestCase with thread-safe cleanup methods.
 
@@ -232,6 +234,7 @@ def addCleanup(self, *args, **kwargs):
 
     def doCleanups(self, *args, **kwargs):
         with self._cleanup_lock:
+            CLEANUP_EVENT.set()
             return super().doCleanups(*args, **kwargs)
 
 class SocketCANTest(unittest.TestCase):
@@ -2818,7 +2821,7 @@ class SendrecvmsgBase(ThreadSafeCleanupTestCase):
 
     # Time in seconds to wait before considering a test failed, or
     # None for no timeout.  Not all tests actually set a timeout.
-    fail_timeout = support.LOOPBACK_TIMEOUT
+    fail_timeout = 1e-9
 
     def setUp(self):
         self.misc_event = threading.Event()
@@ -3573,6 +3576,7 @@ def newFDs(self, n):
         fds = []
         for i in range(n):
             fd, path = tempfile.mkstemp()
+            CLEANUP_EVENT.wait()
             self.addCleanup(os.unlink, path)
             self.addCleanup(os.close, fd)
             os.write(fd, str(i).encode())

vstinner added a commit to vstinner/cpython that referenced this issue Oct 5, 2023
Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its timeout. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
vstinner added a commit to vstinner/cpython that referenced this issue Oct 5, 2023
Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its timeout. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
vstinner added a commit to vstinner/cpython that referenced this issue Oct 5, 2023
Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its thread. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
vstinner added a commit that referenced this issue Oct 5, 2023
Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its thread. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 5, 2023
…10416)

Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its thread. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
(cherry picked from commit 318f5df)

Co-authored-by: Victor Stinner <vstinner@python.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 5, 2023
…10416)

Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its thread. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
(cherry picked from commit 318f5df)

Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit that referenced this issue Oct 5, 2023
Increase support.LOOPBACK_TIMEOUT from 5 to 10 seconds. Also increase
the timeout depending on the --timeout option. For example, for a
test timeout of 40 minutes (ARM Raspbian 3.x), use LOOPBACK_TIMEOUT
of 20 seconds instead of 5 seconds before.
vstinner added a commit to vstinner/cpython that referenced this issue Oct 5, 2023
…hon#110413)

Increase support.LOOPBACK_TIMEOUT from 5 to 10 seconds. Also increase
the timeout depending on the --timeout option. For example, for a
test timeout of 40 minutes (ARM Raspbian 3.x), use LOOPBACK_TIMEOUT
of 20 seconds instead of 5 seconds before.

(cherry picked from commit 0db2f14)
@vstinner vstinner closed this as completed Oct 5, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Oct 5, 2023
…hon#110413)

Increase support.LOOPBACK_TIMEOUT from 5 to 10 seconds. Also increase
the timeout depending on the --timeout option. For example, for a
test timeout of 40 minutes (ARM Raspbian 3.x), use LOOPBACK_TIMEOUT
of 20 seconds instead of 5 seconds before.

(cherry picked from commit 0db2f14)
vstinner added a commit that referenced this issue Oct 5, 2023
#110424)

gh-110167: Fix test_socket deadlock in doCleanups() (GH-110416)

Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its thread. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
(cherry picked from commit 318f5df)

Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit that referenced this issue Oct 5, 2023
#110423)

gh-110167: Fix test_socket deadlock in doCleanups() (GH-110416)

Fix a deadlock in test_socket when server fails with a timeout but
the client is still running in its thread. Don't hold a lock to call
cleanup functions in doCleanups(). One of the cleanup function waits
until the client completes, whereas the client could deadlock if it
called addCleanup() in such situation.

doCleanups() is called when the server completed, but the client can
still be running in its thread especially if the server failed with a
timeout. Don't put a lock on doCleanups() to prevent deadlock between
addCleanup() called in the client and doCleanups() waiting for
self.done.wait of ThreadableTest._setUp().
(cherry picked from commit 318f5df)

Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit that referenced this issue Oct 5, 2023
…10413) (#110427)

gh-110167: Increase support.LOOPBACK_TIMEOUT to 10 seconds (#110413)

Increase support.LOOPBACK_TIMEOUT from 5 to 10 seconds. Also increase
the timeout depending on the --timeout option. For example, for a
test timeout of 40 minutes (ARM Raspbian 3.x), use LOOPBACK_TIMEOUT
of 20 seconds instead of 5 seconds before.

(cherry picked from commit 0db2f14)
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 5, 2023
…ds (pythonGH-110413) (pythonGH-110427)

pythongh-110167: Increase support.LOOPBACK_TIMEOUT to 10 seconds (pythonGH-110413)

Increase support.LOOPBACK_TIMEOUT from 5 to 10 seconds. Also increase
the timeout depending on the --timeout option. For example, for a
test timeout of 40 minutes (ARM Raspbian 3.x), use LOOPBACK_TIMEOUT
of 20 seconds instead of 5 seconds before.

(cherry picked from commit 350d89b)

Co-authored-by: Victor Stinner <vstinner@python.org>
(cherry picked from commit 0db2f14)
vstinner added a commit that referenced this issue Oct 5, 2023
…nds (GH-110413) (GH-110427) (#110440)

[3.12] gh-110167: Increase support.LOOPBACK_TIMEOUT to 10 seconds (GH-110413) (GH-110427)

gh-110167: Increase support.LOOPBACK_TIMEOUT to 10 seconds (GH-110413)

Increase support.LOOPBACK_TIMEOUT from 5 to 10 seconds. Also increase
the timeout depending on the --timeout option. For example, for a
test timeout of 40 minutes (ARM Raspbian 3.x), use LOOPBACK_TIMEOUT
of 20 seconds instead of 5 seconds before.

(cherry picked from commit 350d89b)

Co-authored-by: Victor Stinner <vstinner@python.org>
(cherry picked from commit 0db2f14)

Co-authored-by: Victor Stinner <vstinner@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir
Projects
None yet
Development

No branches or pull requests

1 participant