Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in test_concurrent_futures #56573

Closed
vstinner opened this issue Jun 19, 2011 · 9 comments
Closed

Deadlock in test_concurrent_futures #56573

vstinner opened this issue Jun 19, 2011 · 9 comments
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

BPO 12364
Nosy @pitrou, @vstinner
Files
  • issue.patch: patch
  • itest.py: test program
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-01-08.09:35:49.529>
    created_at = <Date 2011-06-19.17:25:15.276>
    labels = ['type-bug', 'tests']
    title = 'Deadlock in test_concurrent_futures'
    updated_at = <Date 2012-01-08.09:35:49.527>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2012-01-08.09:35:49.527>
    actor = 'rosslagerwall'
    assignee = 'rosslagerwall'
    closed = True
    closed_date = <Date 2012-01-08.09:35:49.529>
    closer = 'rosslagerwall'
    components = ['Tests']
    creation = <Date 2011-06-19.17:25:15.276>
    creator = 'vstinner'
    dependencies = []
    files = ['24128', '24129']
    hgrepos = []
    issue_num = 12364
    keywords = ['patch']
    message_count = 9.0
    messages = ['138648', '138765', '138766', '138767', '138768', '150497', '150498', '150851', '150853']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'vstinner', 'neologix', 'rosslagerwall', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12364'
    versions = ['Python 3.3']

    @vstinner
    Copy link
    Member Author

    [271/356/1] test_concurrent_futures
    Traceback (most recent call last):
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 268, in _feed
        send(obj)
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 229, in send
        self._send_bytes(memoryview(buf))
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 423, in _send_bytes
        self._send(struct.pack("=i", len(buf)))
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/connection.py", line 392, in _send
        n = write(self._handle, buf)
    OSError: [Errno 32] Broken pipe
    Timeout (1:00:00)!
    Thread 0x00000954:
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/queues.py", line 252 in _feed
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner
      File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap

    Thread 0x00000953:
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 146 in poll
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/forking.py", line 166 in wait
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/multiprocessing/process.py", line 150 in join
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 208 in shutdown_worker
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 264 in _queue_management_worker
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 690 in run
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 737 in _bootstrap_inner
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 710 in _bootstrap

    Thread 0x00000001:
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 237 in wait
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/threading.py", line 851 in join
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/concurrent/futures/process.py", line 395 in shutdown
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 67 in tearDown
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 407 in _executeTestPart
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 463 in run
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/case.py", line 514 in __call__
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 105 in run
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/unittest/suite.py", line 67 in __call__
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1166 in run
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1254 in _run_suite
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/support.py", line 1280 in run_unittest
    File "/home2/buildbot/slave/3.x.loewis-sun/build/Lib/test/test_concurrent_futures.py", line 628 in test_main
    File "./Lib/test/regrtest.py", line 1043 in runtest_inner
    File "./Lib/test/regrtest.py", line 841 in runtest
    File "./Lib/test/regrtest.py", line 668 in main
    File "./Lib/test/regrtest.py", line 1618 in <module>
    *** Error code 1
    make: Fatal error: Command failed for target `buildbottest'
    program finished with exit code 1

    See commit e6e7e42efdc2 of the issue bpo-12310.

    @vstinner vstinner added the tests Tests in the Lib/test dir label Jun 19, 2011
    @vstinner
    Copy link
    Member Author

    Message on a stackoverflow thread:

    "I have suffered from the same problem, even if connecting on localhost in python 2.7.1. After a day of debugging i found the cause and a workaround:

    Cause: BaseProxy class has thread local storage which caches the connection, which is reused for future connections causing "broken pipe" errors even on creating a new Manager

    Workaround: Delete the cached connection before reconnecting

    if address in BaseProxy._address_to_local:
        del BaseProxy._address_to_local[self.address][0].connection"

    http://stackoverflow.com/questions/3649458/broken-pipe-when-using-python-multiprocessing-managers-basemanager-syncmanager/5884967#5884967

    ---

    See also maybe the (closed) issue bpo-11663: multiprocessing doesn't detect killed processes

    @vstinner
    Copy link
    Member Author

    Connection._send_bytes() has a comment about broken pipes:

        def _send_bytes(self, buf):
            # For wire compatibility with 3.2 and lower
            n = len(buf)
            self._send(struct.pack("=i", len(buf)))
            # The condition is necessary to avoid "broken pipe" errors
            # when sending a 0-length buffer if the other end closed the pipe.
            if n > 0:
                self._send(buf)

    But the OSError(32, "Broken pipe") occurs on sending the buffer size (a chunk of 4 bytes: self._send(struct.pack("=i", len(buf)))), not on sending the buffer content.

    See also maybe the (closed) issue bpo-9205: Parent process hanging in multiprocessing if children terminate unexpectedly

    @vstinner
    Copy link
    Member Author

    Ah, submit a new task after the manager shutdown fails with OSError(32, 'Broken pipe'). Example:
    ---------------

    from multiprocessing.managers import BaseManager
    
    class MathsClass(object):
        def foo(self):
            return 42
    
    class MyManager(BaseManager):
        pass
    
    MyManager.register('Maths', MathsClass)
    
    if __name__ == '__main__':
        manager = MyManager()
        manager.start()
        maths = manager.Maths()
        maths.foo()
        manager.shutdown()
        try:
            maths.foo()
        finally:
            manager.shutdown()

    This example doesn't hang, but this issue is about concurrent.futures, not multiprocessing.

    @vstinner
    Copy link
    Member Author

    Oh, I think that I found a deadlock (or something like that):
    ----------------------------

    import concurrent.futures
    import faulthandler
    import os
    import signal
    import time
    
    def work(n):
        time.sleep(0.1)
    
    def main():
        faulthandler.register(signal.SIGUSR1)
        print("pid: %s" % os.getpid())
        with concurrent.futures.ProcessPoolExecutor() as executor:
            for number, prime in executor.map(work, range(100)):
                print("shutdown")
                executor.shutdown()
                print("shutdown--")
    
    if __name__ == '__main__':
        main()

    Trace:
    ----------------------------
    Thread 0x00007fbfc83bd700:
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait
    File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 252 in _feed
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap

    Thread 0x00007fbfc8bbe700:
    File "/home/haypo/prog/HG/cpython/Lib/multiprocessing/queues.py", line 101 in put
    File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 268 in _queue_management_worker
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 690 in run
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 737 in _bootstrap_inner
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 710 in _bootstrap

    Current thread 0x00007fbfcc2e3700:
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 237 in wait
    File "/home/haypo/prog/HG/cpython/Lib/threading.py", line 851 in join
    File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/process.py", line 395 in shutdown
    File "/home/haypo/prog/HG/cpython/Lib/concurrent/futures/base.py", line 570 in __exit_
    File "y.py", line 17 in main
    File "y.py", line 20 in <module>
    ----------------------------
    There are two child processes, but both are zombies (displayed as "<defunct>" by ps). Send SIGUSR1 signal to the frozen process to display the traceback (thanks to faulthandler).

    @vstinner vstinner changed the title Timeout (1 hour) in test_concurrent_futures.tearDown() on sparc solaris10 gcc 3.x Deadlock in test_concurrent_futures Jul 5, 2011
    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Jan 3, 2012

    Retrieving the result of a future after the executor has been shut down can cause a hang.

    It seems like this regression was introduced in a76257a99636. This regression exists only for ProcessPoolExecutor.

    The problem is that even if there are pending work items, the processes are still signaled to shut down leaving the pending work items permanently unfinished. The patch simply removes the call to shut down the processes when there are pending work items.

    Attached is a patch.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 3, 2012

    Well I was sure I had added this code for a reason, but the tests seem to run without...
    Just a comment: the test isn't ProcessPoolExecutor-specific, so it should really be in the generic tests.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 8, 2012

    New changeset 26389e9efa9c by Ross Lagerwall in branch '3.2':
    Issue bpo-12364: Fix a hang in concurrent.futures.ProcessPoolExecutor.
    http://hg.python.org/cpython/rev/26389e9efa9c

    New changeset 25f879011102 by Ross Lagerwall in branch 'default':
    Merge with 3.2 for bpo-12364.
    http://hg.python.org/cpython/rev/25f879011102

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Jan 8, 2012

    Thanks!

    @rosslagerwall rosslagerwall mannequin closed this as completed Jan 8, 2012
    @rosslagerwall rosslagerwall mannequin self-assigned this Jan 8, 2012
    @rosslagerwall rosslagerwall mannequin added the type-bug An unexpected behavior, bug, or error label Jan 8, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants