Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FREEZE] ERROR at teardown of TestsProcessPoolForkserverExecutor.test_map_chunksize #328

Closed
ogrisel opened this issue Feb 3, 2022 · 3 comments · Fixed by #348
Closed
Labels

Comments

@ogrisel
Copy link
Collaborator

ogrisel commented Feb 3, 2022

(randomly) observed on the CI on master:

  • test_loky linux-py37
2022-02-03T15:49:22.5326916Z tests/test_process_executor_forkserver.py::TestsProcessPoolForkserverExecutor::test_map_chunksize PASSED [ 30%]
2022-02-03T15:49:22.5337930Z tests/test_process_executor_forkserver.py::TestsProcessPoolForkserverExecutor::test_map_chunksize ERROR [ 30%]
2022-02-03T15:49:34.3116493Z tests/test_process_executor_forkserver.py::TestsProcessPoolForkserverExecutor::test_traceback ERROR [ 31%]
2022-02-03T15:49:34.3127891Z tests/test_process_executor_forkserver.py::TestsProcessPoolForkserverExecutor::test_traceback ERROR [ 31%]
2022-02-03T15:49:46.1632822Z tests/test_process_executor_forkserver.py::TestsProcessPoolForkserverExecutor::test_thread_safety ERROR [ 31%]
2022-02-03T15:49:46.1691274Z tests/test_process_executor_forkserver.py::TestsProcessPoolForkserverExecutor::test_thread_safety ERROR [ 31%]teardown class with <multiprocessing.context.ForkServerContext object at 0x7f2f65670650>
2022-02-03T15:49:46.1693555Z 
2022-02-03T15:49:46.1694429Z 
2022-02-03T15:49:46.1695774Z ==================================== ERRORS ====================================
2022-02-03T15:49:46.1697429Z __ ERROR at teardown of TestsProcessPoolForkserverExecutor.test_map_chunksize __
2022-02-03T15:49:46.1699014Z 
2022-02-03T15:49:46.1700462Z self = <tests.test_process_executor_forkserver.TestsProcessPoolForkserverExecutor object at 0x7f2f5c869750>
2022-02-03T15:49:46.1707111Z method = <bound method ExecutorTest.test_map_chunksize of <tests.test_process_executor_forkserver.TestsProcessPoolForkserverExecutor object at 0x7f2f5c869750>>
2022-02-03T15:49:46.1707839Z 
2022-02-03T15:49:46.1708224Z     def teardown_method(self, method):
2022-02-03T15:49:46.1708679Z         # Make sure executor is not broken if it should not be
2022-02-03T15:49:46.1710048Z         executor = getattr(self, 'executor', None)
2022-02-03T15:49:46.1710502Z         if executor is not None:
2022-02-03T15:49:46.1710974Z             expect_broken_pool = hasattr(method, "broken_pool")  # old pytest
2022-02-03T15:49:46.1711477Z             for mark in getattr(method, "pytestmark", []):
2022-02-03T15:49:46.1711927Z                 if mark.name == "broken_pool":
2022-02-03T15:49:46.1712342Z                     expect_broken_pool = True
2022-02-03T15:49:46.1712816Z             is_actually_broken = executor._flags.broken is not None
2022-02-03T15:49:46.1713315Z             assert is_actually_broken == expect_broken_pool
2022-02-03T15:49:46.1713706Z     
2022-02-03T15:49:46.1714039Z             t_start = time.time()
2022-02-03T15:49:46.1714479Z >           executor.shutdown(wait=True, kill_workers=True)
2022-02-03T15:49:46.1714729Z 
2022-02-03T15:49:46.1715173Z executor   = <loky.process_executor.ProcessPoolExecutor object at 0x7f2f5c876290>
2022-02-03T15:49:46.1715673Z expect_broken_pool = False
2022-02-03T15:49:46.1716065Z is_actually_broken = False
2022-02-03T15:49:46.1716736Z method     = <bound method ExecutorTest.test_map_chunksize of <tests.test_process_executor_forkserver.TestsProcessPoolForkserverExecutor object at 0x7f2f5c869750>>
2022-02-03T15:49:46.1717595Z self       = <tests.test_process_executor_forkserver.TestsProcessPoolForkserverExecutor object at 0x7f2f5c869750>
2022-02-03T15:49:46.1718168Z t_start    = 1643903302.785902
2022-02-03T15:49:46.1718371Z 
2022-02-03T15:49:46.1718728Z tests/_executor_mixin.py:154: 
2022-02-03T15:49:46.1719181Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-02-03T15:49:46.1719682Z loky/process_executor.py:1157: in shutdown
2022-02-03T15:49:46.1720129Z     executor_manager_thread.join()
2022-02-03T15:49:46.1720707Z         executor_manager_thread = <_ExecutorManagerThread(ExecutorManagerThread, started daemon 139841316583168)>
2022-02-03T15:49:46.1721432Z         executor_manager_thread_wakeup = <loky.process_executor._ThreadWakeup object at 0x7f2f5c876c50>
2022-02-03T15:49:46.1721961Z         kill_workers = True
2022-02-03T15:49:46.1722453Z         self       = <loky.process_executor.ProcessPoolExecutor object at 0x7f2f5c876290>
2022-02-03T15:49:46.1722938Z         wait       = True
2022-02-03T15:49:46.1723421Z /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py:1044: in join
2022-02-03T15:49:46.1723926Z     self._wait_for_tstate_lock()
2022-02-03T15:49:46.1724469Z         self       = <_ExecutorManagerThread(ExecutorManagerThread, started daemon 139841316583168)>
2022-02-03T15:49:46.1724979Z         timeout    = None
2022-02-03T15:49:46.1725406Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2022-02-03T15:49:46.1725660Z 
2022-02-03T15:49:46.1726122Z self = <_ExecutorManagerThread(ExecutorManagerThread, started daemon 139841316583168)>
2022-02-03T15:49:46.1726838Z block = True, timeout = -1
2022-02-03T15:49:46.1727043Z 
2022-02-03T15:49:46.1727647Z     def _wait_for_tstate_lock(self, block=True, timeout=-1):
2022-02-03T15:49:46.1728163Z         # Issue #18808: wait for the thread state to be gone.
2022-02-03T15:49:46.1728894Z         # At the end of the thread's life, after all knowledge of the thread
2022-02-03T15:49:46.1729471Z         # is removed from C data structures, C code releases our _tstate_lock.
2022-02-03T15:49:46.1730036Z         # This method passes its arguments to _tstate_lock.acquire().
2022-02-03T15:49:46.1730717Z         # If the lock is acquired, the C code is done, and self._stop() is
2022-02-03T15:49:46.1731255Z         # called.  That sets ._is_stopped to True, and ._tstate_lock to None.
2022-02-03T15:49:46.1731736Z         lock = self._tstate_lock
2022-02-03T15:49:46.1732196Z         if lock is None:  # already determined that the C code is done
2022-02-03T15:49:46.1732661Z             assert self._is_stopped
2022-02-03T15:49:46.1733059Z >       elif lock.acquire(block, timeout):
2022-02-03T15:49:46.1733552Z E       Failed: Timeout >60.0s
2022-02-03T15:49:46.1733755Z 
2022-02-03T15:49:46.1734044Z block      = True
2022-02-03T15:49:46.1734465Z lock       = <locked _thread.lock object at 0x7f2f5c880e10>
2022-02-03T15:49:46.1735046Z self       = <_ExecutorManagerThread(ExecutorManagerThread, started daemon 139841316583168)>
2022-02-03T15:49:46.1735714Z timeout    = -1
2022-02-03T15:49:46.1735892Z 
2022-02-03T15:49:46.1736344Z /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py:1060: Failed
2022-02-03T15:49:46.1737159Z ---------------------------- Captured stderr setup -----------------------------
2022-02-03T15:49:46.1737784Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392947200
2022-02-03T15:49:46.1738405Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392943104
2022-02-03T15:49:46.1739018Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392939008
2022-02-03T15:49:46.1739630Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392934912
2022-02-03T15:49:46.1740183Z [DEBUG:MainProcess:MainThread] Queue._after_fork()
2022-02-03T15:49:46.1740746Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392930816
2022-02-03T15:49:46.1741360Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392926720
2022-02-03T15:49:46.1743391Z [DEBUG:MainProcess:MainThread] ProcessPoolExecutor is setup
2022-02-03T15:49:46.1744209Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392898048
2022-02-03T15:49:46.1744940Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392893952
2022-02-03T15:49:46.1745649Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392889856
2022-02-03T15:49:46.1746355Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392885760
2022-02-03T15:49:46.1747076Z [DEBUG:MainProcess:MainThread] created semlock with handle 139841392881664
2022-02-03T15:49:46.1749031Z [DEBUG:MainProcess:MainThread] Adjust process count : {2673: <ForkServerProcess(ForkServerProcess-207, started)>, 2674: <ForkServerProcess(ForkServerProcess-208, started)>, 2675: <ForkServerProcess(ForkServerProcess-209, started)>, 2676: <ForkServerProcess(ForkServerProcess-210, started)>, 2677: <ForkServerProcess(ForkServerProcess-211, started)>}
2022-02-03T15:49:46.1750426Z [DEBUG:MainProcess:MainThread] _start_executor_manager_thread called
2022-02-03T15:49:46.1751116Z [DEBUG:MainProcess:ExecutorManagerThread] Queue._start_thread()
2022-02-03T15:49:46.1751802Z [DEBUG:MainProcess:ExecutorManagerThread] doing self._thread.start()
2022-02-03T15:49:46.1752503Z [DEBUG:MainProcess:QueueFeederThread] starting thread to feed data to pipe
2022-02-03T15:49:46.1753202Z [DEBUG:MainProcess:ExecutorManagerThread] ... done self._thread.start()
2022-02-03T15:49:46.1754104Z [DEBUG/ForkServerProcess-207] Using 'cloudpickle' for serialization.
2022-02-03T15:49:46.1754975Z [DEBUG/ForkServerProcess-211] Using 'cloudpickle' for serialization.
2022-02-03T15:49:46.1755889Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841392943104
2022-02-03T15:49:46.1822137Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841392939008
2022-02-03T15:49:46.1823521Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841392934912
2022-02-03T15:49:46.1824413Z [DEBUG/ForkServerProcess-207] Queue._after_fork()
2022-02-03T15:49:46.1825314Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841392930816
2022-02-03T15:49:46.1826250Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841392926720
2022-02-03T15:49:46.1827941Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841541672960
2022-02-03T15:49:46.1828917Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841541668864
2022-02-03T15:49:46.1829977Z [DEBUG/ForkServerProcess-209] Using 'cloudpickle' for serialization.
2022-02-03T15:49:46.1830924Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841541664768
2022-02-03T15:49:46.1831858Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841541660672
2022-02-03T15:49:46.1833019Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841541607424
2022-02-03T15:49:46.1834749Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841392947200
2022-02-03T15:49:46.1838919Z [DEBUG/ForkServerProcess-207] recreated blocker with handle 139841392898048
2022-02-03T15:49:46.1840247Z [INFO/ForkServerProcess-207] child process calling self.run()
2022-02-03T15:49:46.1841155Z [DEBUG/ForkServerProcess-207] Worker started with timeout=None
2022-02-03T15:49:46.1842085Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841392943104
2022-02-03T15:49:46.1843006Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841392939008
2022-02-03T15:49:46.1843949Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841392934912
2022-02-03T15:49:46.1844812Z [DEBUG/ForkServerProcess-211] Queue._after_fork()
2022-02-03T15:49:46.1845688Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841392930816
2022-02-03T15:49:46.1846604Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841392926720
2022-02-03T15:49:46.1847545Z ----------------------------- Captured stderr call -----------------------------
2022-02-03T15:49:46.1848457Z [DEBUG/ForkServerProcess-208] Using 'cloudpickle' for serialization.
2022-02-03T15:49:46.1849374Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841541672960
2022-02-03T15:49:46.1850303Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841541668864
2022-02-03T15:49:46.1851181Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841541664768
2022-02-03T15:49:46.1852058Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841541660672
2022-02-03T15:49:46.1852934Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841541607424
2022-02-03T15:49:46.1853804Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841392947200
2022-02-03T15:49:46.1854674Z [DEBUG/ForkServerProcess-211] recreated blocker with handle 139841392881664
2022-02-03T15:49:46.1855494Z [INFO/ForkServerProcess-211] child process calling self.run()
2022-02-03T15:49:46.1856297Z [DEBUG/ForkServerProcess-211] Worker started with timeout=None
2022-02-03T15:49:46.1857109Z [DEBUG/ForkServerProcess-207] psutil return memory size: 25661440
2022-02-03T15:49:46.1857963Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841392943104
2022-02-03T15:49:46.1858822Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841392939008
2022-02-03T15:49:46.1859699Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841392934912
2022-02-03T15:49:46.1860489Z [DEBUG/ForkServerProcess-209] Queue._after_fork()
2022-02-03T15:49:46.1861310Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841392930816
2022-02-03T15:49:46.1862375Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841392926720
2022-02-03T15:49:46.1863263Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841541672960
2022-02-03T15:49:46.1864137Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841541668864
2022-02-03T15:49:46.1865012Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841541664768
2022-02-03T15:49:46.1865889Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841541660672
2022-02-03T15:49:46.1866750Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841541607424
2022-02-03T15:49:46.1867620Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841392947200
2022-02-03T15:49:46.1868693Z [DEBUG/ForkServerProcess-209] recreated blocker with handle 139841392889856
2022-02-03T15:49:46.1869518Z [INFO/ForkServerProcess-209] child process calling self.run()
2022-02-03T15:49:46.1870318Z [DEBUG/ForkServerProcess-209] Worker started with timeout=None
2022-02-03T15:49:46.1871113Z [DEBUG/ForkServerProcess-211] psutil return memory size: 25665536
2022-02-03T15:49:46.1872242Z [DEBUG/ForkServerProcess-210] Using 'cloudpickle' for serialization.
2022-02-03T15:49:46.1873132Z --------------------------- Captured stderr teardown ---------------------------
2022-02-03T15:49:46.1874066Z [DEBUG:MainProcess:MainThread] shutting down executor <loky.process_executor.ProcessPoolExecutor object at 0x7f2f5c876290>
2022-02-03T15:49:46.1875211Z [DEBUG:MainProcess:ExecutorManagerThread] terminate process ForkServerProcess-211, reason: executor shutting down
2022-02-03T15:49:46.1876179Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841392943104
2022-02-03T15:49:46.1877061Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841392939008
2022-02-03T15:49:46.1877930Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841392934912
2022-02-03T15:49:46.1878728Z [DEBUG/ForkServerProcess-208] Queue._after_fork()
2022-02-03T15:49:46.1879534Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841392930816
2022-02-03T15:49:46.1880409Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841392926720
2022-02-03T15:49:46.1881278Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841541672960
2022-02-03T15:49:46.1882154Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841541668864
2022-02-03T15:49:46.1883024Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841541664768
2022-02-03T15:49:46.1883873Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841541660672
2022-02-03T15:49:46.1884742Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841541607424
2022-02-03T15:49:46.1885618Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841392947200
2022-02-03T15:49:46.1886456Z [DEBUG/ForkServerProcess-209] psutil return memory size: 25669632
2022-02-03T15:49:46.1887304Z [DEBUG/ForkServerProcess-208] recreated blocker with handle 139841392893952
2022-02-03T15:49:46.1888110Z [INFO/ForkServerProcess-208] child process calling self.run()
2022-02-03T15:49:46.1888905Z [DEBUG/ForkServerProcess-208] Worker started with timeout=None
2022-02-03T15:49:46.1889747Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841392943104
2022-02-03T15:49:46.1890619Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841392939008
2022-02-03T15:49:46.1891473Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841392934912
2022-02-03T15:49:46.1892264Z [DEBUG/ForkServerProcess-210] Queue._after_fork()
2022-02-03T15:49:46.1893080Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841392930816
2022-02-03T15:49:46.1893947Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841392926720
2022-02-03T15:49:46.1894817Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841541672960
2022-02-03T15:49:46.1895671Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841541668864
2022-02-03T15:49:46.1907277Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841541664768
2022-02-03T15:49:46.1908377Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841541660672
2022-02-03T15:49:46.1909266Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841541607424
2022-02-03T15:49:46.1910137Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841392947200
2022-02-03T15:49:46.1911012Z [DEBUG/ForkServerProcess-210] recreated blocker with handle 139841392885760
2022-02-03T15:49:46.1911844Z [INFO/ForkServerProcess-210] child process calling self.run()
2022-02-03T15:49:46.1912647Z [DEBUG/ForkServerProcess-210] Worker started with timeout=None
2022-02-03T15:49:46.1912985Z 
2022-02-03T15:49:46.1913672Z +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
2022-02-03T15:49:46.1914022Z 
2022-02-03T15:49:46.1914512Z ~~~~~~~~~~~~~~~~~ Stack of QueueFeederThread (139841299801856) ~~~~~~~~~~~~~~~~~
2022-02-03T15:49:46.1915212Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 890, in _bootstrap
2022-02-03T15:49:46.1915808Z     self._bootstrap_inner()
2022-02-03T15:49:46.1916388Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 926, in _bootstrap_inner
2022-02-03T15:49:46.1917072Z     self.run()
2022-02-03T15:49:46.1917611Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 870, in run
2022-02-03T15:49:46.1918238Z     self._target(*self._args, **self._kwargs)
2022-02-03T15:49:46.1918778Z   File "/home/vsts/work/1/s/loky/backend/queues.py", line 141, in _feed
2022-02-03T15:49:46.1919271Z     nwait()
2022-02-03T15:49:46.1919800Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 296, in wait
2022-02-03T15:49:46.1920372Z     waiter.acquire()
2022-02-03T15:49:46.1920596Z 
2022-02-03T15:49:46.1921080Z ~~~~~~~~~~~~~~~ Stack of ExecutorManagerThread (139841316583168) ~~~~~~~~~~~~~~~
2022-02-03T15:49:46.1921776Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 890, in _bootstrap
2022-02-03T15:49:46.1922368Z     self._bootstrap_inner()
2022-02-03T15:49:46.1922963Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 926, in _bootstrap_inner
2022-02-03T15:49:46.1923524Z     self.run()
2022-02-03T15:49:46.1924011Z   File "/home/vsts/work/1/s/loky/process_executor.py", line 558, in run
2022-02-03T15:49:46.1924558Z     self.flag_executor_shutting_down()
2022-02-03T15:49:46.1925156Z   File "/home/vsts/work/1/s/loky/process_executor.py", line 754, in flag_executor_shutting_down
2022-02-03T15:49:46.1925772Z     self.kill_workers(reason="executor shutting down")
2022-02-03T15:49:46.1926369Z   File "/home/vsts/work/1/s/loky/process_executor.py", line 764, in kill_workers
2022-02-03T15:49:46.1926917Z     recursive_terminate(p)
2022-02-03T15:49:46.1927454Z   File "/home/vsts/work/1/s/loky/backend/utils.py", line 26, in recursive_terminate
2022-02-03T15:49:46.1928044Z     _recursive_terminate_with_psutil(process)
2022-02-03T15:49:46.1928642Z   File "/home/vsts/work/1/s/loky/backend/utils.py", line 46, in _recursive_terminate_with_psutil
2022-02-03T15:49:46.1929203Z     process.join()
2022-02-03T15:49:46.1929783Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/process.py", line 140, in join
2022-02-03T15:49:46.1930401Z     res = self._popen.wait(timeout)
2022-02-03T15:49:46.1931721Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
2022-02-03T15:49:46.1933220Z     return self.poll(os.WNOHANG if timeout == 0.0 else 0)
2022-02-03T15:49:46.1933953Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/popen_forkserver.py", line 61, in poll
2022-02-03T15:49:46.1934629Z     if not wait([self.sentinel], timeout):
2022-02-03T15:49:46.1935258Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/connection.py", line 921, in wait
2022-02-03T15:49:46.1935874Z     ready = selector.select(timeout)
2022-02-03T15:49:46.1936463Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/selectors.py", line 415, in select
2022-02-03T15:49:46.1941371Z     fd_event_list = self._selector.poll(timeout)
2022-02-03T15:49:46.1941719Z 
2022-02-03T15:49:46.1942993Z ~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-12 (139841485211392) ~~~~~~~~~~~~~~~~~~~~~
2022-02-03T15:49:46.1943738Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 890, in _bootstrap
2022-02-03T15:49:46.1944334Z     self._bootstrap_inner()
2022-02-03T15:49:46.1944918Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 926, in _bootstrap_inner
2022-02-03T15:49:46.1945490Z     self.run()
2022-02-03T15:49:46.1946287Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/threading.py", line 870, in run
2022-02-03T15:49:46.1946888Z     self._target(*self._args, **self._kwargs)
2022-02-03T15:49:46.1947528Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/resource_sharer.py", line 142, in _serve
2022-02-03T15:49:46.1948182Z     with self._listener.accept() as conn:
2022-02-03T15:49:46.1948818Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/connection.py", line 453, in accept
2022-02-03T15:49:46.1949554Z     c = self._listener.accept()
2022-02-03T15:49:46.1950165Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/multiprocessing/connection.py", line 599, in accept
2022-02-03T15:49:46.1950802Z     s, self._last_accepted = self._socket.accept()
2022-02-03T15:49:46.1951414Z   File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/socket.py", line 212, in accept
2022-02-03T15:49:46.1951984Z     fd, addr = self._accept()
2022-02-03T15:49:46.1952221Z 
2022-02-03T15:49:46.1952711Z +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
2022-02-03T15:49:46.1953365Z _____ ERROR at setup of TestsProcessPoolForkserverExecutor.test_traceback ______

https://dev.azure.com/joblib/loky/_build/results?buildId=1574&view=logs&j=690d157a-d5ad-54bf-a85d-89b5d05cf663&t=35eaaa5a-db59-5698-331c-8016de4f39a3

@ogrisel
Copy link
Collaborator Author

ogrisel commented Feb 9, 2022

Similar problems:

_____ ERROR at teardown of TestsProcessPoolLokyExecutor.test_killed_child ______

self = <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor object at 0x7fd459298a60>
method = <bound method ExecutorTest.test_killed_child of <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor object at 0x7fd459298a60>>

    def teardown_method(self, method):
        # Make sure executor is not broken if it should not be
        executor = getattr(self, 'executor', None)
        if executor is not None:
            expect_broken_pool = hasattr(method, "broken_pool")  # old pytest
            for mark in getattr(method, "pytestmark", []):
                if mark.name == "broken_pool":
                    expect_broken_pool = True
            is_actually_broken = executor._flags.broken is not None
>           assert is_actually_broken == expect_broken_pool
E           AssertionError

executor   = <loky.process_executor.ProcessPoolExecutor object at 0x7fd456a8efb0>
expect_broken_pool = True
is_actually_broken = False
mark       = Mark(name='broken_pool', args=(), kwargs={})
method     = <bound method ExecutorTest.test_killed_child of <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor object at 0x7fd459298a60>>
self       = <tests.test_process_executor_loky.TestsProcessPoolLokyExecutor object at 0x7fd459298a60>

@ogrisel
Copy link
Collaborator Author

ogrisel commented Feb 9, 2022

It seems that we might have a race condition with the executor manager thread that is responsible to flag the process pool executor as broken. Let me try to fix this teardown_method.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Feb 9, 2022

Actually #328 (comment) is not the same problem as the traceback in the description:

  • the above problem (on test_killed_child) is about not detecting a broken executor in time;
  • while the original issue (on test_map_chunksize) report is about a timeout acquiring a lock when shutting down a non-broken executor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant