Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test using multiprocessing only hangs when ran with test suite, works alone #11174

Closed
rabyj opened this issue Jul 5, 2023 · 13 comments
Closed
Labels
type: question general question, might be closed after 2 weeks of inactivity

Comments

@rabyj
Copy link

rabyj commented Jul 5, 2023

TL;DR:
If you're experiencing a hanging issue during tests with a program using multiprocessing, try adding the following code at the beginning of your problematic test function:

def a_test():
    import multiprocessing as mp
    mp.set_start_method("spawn", force=True)
    # [... rest of test ...]

Description

I've encountered an odd test hanging issue with a program that uses multiprocessing.
The program completes:

  • When run individually from the command line
  • As part of a test function invoked alone via pytest

The program hangs indefinitely when run as part of a full suite of tests.

I wanted to at least document what I experienced even if I can't create a simple minimal example, since I found a workaround that could help others, and no other issue mentioning something like this. All issues are for single tests.

Environment

python: 3.8.10
pytest: 7.2.0
pytest-mock: 3.10.0
OS: Ubuntu 20.04.6 LTS

Details

The problem occurs with a test that runs a program to completion, which internally uses multiprocessing (via pool.starmap on a generator). When the test is run individually, it completes without problems within roughly 5 seconds. However, when the test is included in a full test suite and run with pytest, it hangs indefinitely (for over 6 minutes at least).

The issue persists even when using concurrent.futures in place of multiprocessing (without a partial as seen below).

My workaround is using mp.set_start_method("spawn", force=True) at the start of the test.

Fix in immediate context:

    func = partial(run_prediction, estimator=estimator, name=name, logdir=logdir)
    items = enumerate(ea_handler.yield_split())

    mp.set_start_method("spawn", force=True) # --> the fix, also works by being at the complete start of the test
    with mp.Pool(nb_workers) as pool:
        pool.starmap(func, items)

While the fix makes the test run slower (about 20 seconds), at least it doesn't hang indefinitely.

I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!

Complete pip Environment

pip list output - click me

Package                       Version          Location                                                           
----------------------------- ---------------- -------------------------------------------------------------------
absl-py                       1.4.0            
aiohttp                       3.8.4            
aiosignal                     1.3.1            
alabaster                     0.7.13           
alembic                       1.11.1           
anyio                         3.7.0            
argon2-cffi                   21.3.0           
argon2-cffi-bindings          21.2.0           
astroid                       2.15.5           
asttokens                     2.2.1            
async-timeout                 4.0.2            
attrs                         23.1.0           
autopage                      0.5.1            
Babel                         2.12.1           
backcall                      0.2.0            
beautifulsoup4                4.12.2           
black                         23.1.0           
bleach                        6.0.0            
cachetools                    5.3.1            
certifi                       2023.5.7         
cffi                          1.15.1           
cfgv                          3.3.1            
charset-normalizer            3.1.0            
click                         8.1.3            
cliff                         4.3.0            
cloudpickle                   2.2.1            
cmaes                         0.9.1            
cmd2                          2.4.3            
colorama                      0.4.6            
colorlog                      6.7.0            
comet-ml                      3.21.0           
comm                          0.1.3            
commonmark                    0.9.1            
configobj                     5.0.8            
cycler                        0.11.0           
debugpy                       1.6.7            
decorator                     5.1.1            
defusedxml                    0.7.1            
dill                          0.3.6            
distlib                       0.3.6            
doc2dash                      2.4.1            
docutils                      0.20.1           
dulwich                       0.21.5           
EpiLaP                        0.1              $HOME/Projects/sources/epi_ml/src/python
everett                       3.2.0            
exceptiongroup                1.1.1            
executing                     1.2.0            
fastjsonschema                2.17.1           
filelock                      3.12.0           
fonttools                     4.39.4           
frozenlist                    1.3.3            
fsspec                        2023.5.0         
gitdb                         4.0.10           
GitPython                     3.1.31           
google-auth                   2.19.0           
google-auth-oauthlib          1.0.0            
greenlet                      2.0.2            
grpcio                        1.54.2           
h5py                          3.6.0            
identify                      2.5.24           
idna                          3.4              
imagesize                     1.4.1            
imbalanced-learn              0.8.0            
importlib-metadata            4.13.0           
importlib-resources           5.12.0           
iniconfig                     2.0.0            
ipykernel                     6.23.1           
ipython                       8.12.2           
ipython-genutils              0.2.0            
ipywidgets                    8.0.6            
isort                         5.12.0           
jedi                          0.18.2           
Jinja2                        3.1.2            
joblib                        1.2.0            
jsonschema                    4.17.3           
jupyter                       1.0.0            
jupyter-client                8.2.0            
jupyter-console               6.6.3            
jupyter-core                  5.3.0            
jupyter-events                0.6.3            
jupyter-server                2.6.0            
jupyter-server-mathjax        0.2.6            
jupyter-server-terminals      0.4.4            
jupyterlab-pygments           0.2.2            
jupyterlab-widgets            3.0.7            
kaleido                       0.2.1            
kiwisolver                    1.4.4            
lazy-object-proxy             1.9.0            
lightgbm                      3.3.5            
llvmlite                      0.40.0           
Mako                          1.2.4            
Markdown                      3.4.3            
MarkupSafe                    2.1.2            
matplotlib                    3.5.3            
matplotlib-inline             0.1.6            
mccabe                        0.7.0            
mistune                       2.0.5            
multidict                     6.0.4            
mypy-extensions               1.0.0            
nbclassic                     0.5.3            
nbclient                      0.7.2            
nbconvert                     7.2.9            
nbdime                        3.1.0            
nbformat                      5.7.0            
nest-asyncio                  1.5.6            
nodeenv                       1.8.0            
notebook                      6.5.4            
notebook-shim                 0.2.3            
numba                         0.57.0           
numpy                         1.22.4           
nvidia-ml-py3                 7.352.0          
oauthlib                      3.2.2            
optuna                        3.0.6            
overrides                     7.3.1            
packaging                     23.1             
pandas                        1.4.4            
pandocfilters                 1.5.0            
parso                         0.8.3            
pathspec                      0.11.1           
pbr                           5.11.1           
pexpect                       4.8.0            
pickleshare                   0.7.5            
Pillow                        9.5.0            
pip                           20.0.2           
pkg-resources                 0.0.0            
pkgutil-resolve-name          1.3.10           
platformdirs                  3.5.1            
plotly                        5.11.0           
pluggy                        1.0.0            
pre-commit                    3.0.4            
prettytable                   3.7.0            
prometheus-client             0.17.0           
prompt-toolkit                3.0.38           
protobuf                      3.20.1           
psutil                        5.9.5            
ptyprocess                    0.7.0            
pure-eval                     0.2.2            
pyaml                         23.5.9           
pyasn1                        0.5.0            
pyasn1-modules                0.3.0            
pycparser                     2.21             
pyDeprecate                   0.3.2            
Pygments                      2.15.1           
pylint                        2.16.4           
pyparsing                     3.0.9            
pyperclip                     1.8.2            
pyrsistent                    0.19.3           
pytest                        7.2.0            
pytest-mock                   3.10.0           
pytest-timeout                2.1.0            
python-dateutil               2.8.2            
python-json-logger            2.0.7            
pytorch-lightning             1.6.5            
pytz                          2023.3           
PyYAML                        6.0              
pyzmq                         25.1.0           
qtconsole                     5.4.3            
QtPy                          2.3.1            
requests                      2.31.0           
requests-oauthlib             1.3.1            
requests-toolbelt             1.0.0            
rfc3339-validator             0.1.4            
rfc3986-validator             0.1.1            
rich                          11.2.0           
rsa                           4.9              
scikit-learn                  1.1.2            
scikit-optimize               0.9.0+unofficial 
scipy                         1.8.1            
semantic-version              2.10.0           
Send2Trash                    1.8.2            
setuptools                    44.0.0           
shap                          0.41.0           
six                           1.16.0           
slicer                        0.0.7            
smmap                         5.0.0            
sniffio                       1.3.0            
snowballstemmer               2.2.0            
soupsieve                     2.4.1            
sphinx                        7.0.1            
sphinx-gallery                0.11.1           
sphinxcontrib-applehelp       1.0.4            
sphinxcontrib-devhelp         1.0.2            
sphinxcontrib-htmlhelp        2.0.1            
sphinxcontrib-jsmath          1.0.1            
sphinxcontrib-qthelp          1.0.3            
sphinxcontrib-serializinghtml 1.1.5            
SQLAlchemy                    2.0.15           
stack-data                    0.6.2            
stevedore                     5.1.0            
tabulate                      0.8.10           
tenacity                      8.2.2            
tensorboard                   2.13.0           
tensorboard-data-server       0.7.0            
terminado                     0.17.1           
threadpoolctl                 3.1.0            
tinycss2                      1.2.1            
toml                          0.10.2           
tomli                         2.0.1            
tomlkit                       0.11.8           
torch                         1.10.2           
torchinfo                     1.6.3            
torchmetrics                  0.7.2            
tornado                       6.3.2            
tqdm                          4.65.0           
traitlets                     5.9.0            
typing-extensions             4.4.0            
urllib3                       2.0.2            
virtualenv                    20.23.0          
wcwidth                       0.2.6            
webencodings                  0.5.1            
websocket-client              1.5.2            
Werkzeug                      2.3.4            
wheel                         0.40.0           
widgetsnbextension            4.0.7            
wrapt                         1.15.0           
wurlitzer                     3.0.3            
yarl                          1.9.2            
zipp                          3.15.0       

@RonnyPfannschmidt
Copy link
Member

do the hangups also happen with pytest -s

@rabyj
Copy link
Author

rabyj commented Jul 5, 2023

yup. Traceback is not very helpful, since it's just the timeout one, but I'll still share for reference

Traceback: Stack of Threads

+++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-6 (139742417913600) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results
    task = get()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-5 (139742283679488) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 528, in _handle_tasks
    for taskseq, set_length in iter(taskqueue.get, None):

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-4 (139739615721216) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 519, in _handle_workers
    cls._wait_for_updates(current_sentinels, change_notifier)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 499, in _wait_for_updates
    wait(sentinels, timeout=timeout)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 931, in wait
    ready = selector.select(timeout)
  File "/usr/lib/python3.8/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)

+++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++

FAILURES

============================================== FAILURES ===============================================
_______________________________________ test_binary_classifier ________________________________________

test_dir = PosixPath('/tmp/pytest-of-rabyj/pytest-98/other_estimators1')

    @pytest.mark.timeout(120)
    def test_binary_classifier(test_dir: Path):
        """Test if binary models are properly supported."""
        # Setting up environment variables
        # import multiprocessing as mp  # pylint: disable=import-outside-toplevel
    
        # mp.set_start_method("spawn", force=True)
    
        os.environ["MIN_CLASS_SIZE"] = "1"
        os.environ["LABEL_LIST"] = '["female", "male"]'
        epi_ml.core.estimators.NFOLD_PREDICT = 2
    
        # Loading paths
        datasource = EpiAtlasTreatmentTestData.default_test_data(
            label_category="sex"
        ).epiatlas_dataset.datasource
    
        current_dir = Path(__file__).parent.resolve()
    
        hparams_file = current_dir.parent / "fixtures" / "other_estimators_hparams.json"
    
        sys.argv = [
            "other_estimators.py",
            "--predict",
            "sex",
            f"{datasource.hdf5_file}",
            f"{datasource.chromsize_file}",
            f"{datasource.metadata_file}",
            str(test_dir),
            "--hyperparams",
            f"{hparams_file}",
        ]
    
>       main_module()

mains/other_estimators_test.py:80: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../epi_ml/other_estimators.py:226: in main
    estimators.run_predictions(ea_handler, estimator, model_name, cli.logdir)
../epi_ml/core/estimators.py:400: in run_predictions
    pool.starmap(func, items)
/usr/lib/python3.8/multiprocessing/pool.py:372: in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
/usr/lib/python3.8/multiprocessing/pool.py:765: in get
    self.wait(timeout)
/usr/lib/python3.8/multiprocessing/pool.py:762: in wait
    self._event.wait(timeout)
/usr/lib/python3.8/threading.py:558: in wait
    signaled = self._cond.wait(timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Condition(<unlocked _thread.lock object at 0x7f1850acbdb0>, 0)>, timeout = None

    def wait(self, timeout=None):
        """Wait until notified or until a timeout occurs.
    
        If the calling thread has not acquired the lock when this method is
        called, a RuntimeError is raised.
    
        This method releases the underlying lock, and then blocks until it is
        awakened by a notify() or notify_all() call for the same condition
        variable in another thread, or until the optional timeout occurs. Once
        awakened or timed out, it re-acquires the lock and returns.
    
        When the timeout argument is present and not None, it should be a
        floating point number specifying a timeout for the operation in seconds
        (or fractions thereof).
    
        When the underlying lock is an RLock, it is not released using its
        release() method, since this may not actually unlock the lock when it
        was acquired multiple times recursively. Instead, an internal interface
        of the RLock class is used, which really unlocks it even when it has
        been recursively acquired several times. Another internal interface is
        then used to restore the recursion level when the lock is reacquired.
    
        """
        if not self._is_owned():
            raise RuntimeError("cannot wait on un-acquired lock")
        waiter = _allocate_lock()
        waiter.acquire()
        self._waiters.append(waiter)
        saved_state = self._release_save()
        gotit = False
        try:    # restore state no matter what (e.g., KeyboardInterrupt)
            if timeout is None:
>               waiter.acquire()
E               Failed: Timeout >120.0s

/usr/lib/python3.8/threading.py:302: Failed

@nicoddemus nicoddemus added the type: question general question, might be closed after 2 weeks of inactivity label Jul 5, 2023
@nicoddemus
Copy link
Member

I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!

Not sure either, but perhaps using a different timeout method might yield different results?

@rabyj
Copy link
Author

rabyj commented Jul 5, 2023

perhaps using a different timeout method might yield different results?

The timeout was to avoid the specific test hanging forever, so the test suite can finish. I actually installed a timeout module (pytest-timeout: 2.1.0) BECAUSE it was hanging forever and causing my test suite to not be able to finish.

With this in mind, I honestly don't understand how the timeout method would be related to the bug :/

@nicoddemus
Copy link
Member

With this in mind, I honestly don't understand how the timeout method would be related to the bug :/

Out of insights then, sorry.

@lorinkoz
Copy link

Same is happening for me with multiprocessing.pool.ThreadPool, but the fix is not working in that case. I assume it's because "spawn" is not being called in this context. Any suggestions? Thanks!

@rabyj
Copy link
Author

rabyj commented Dec 12, 2023

@lorinkoz
Recently, I thought the problem had came back for me, but by using pytest -s I noticed something else entirely was happening. So do try to debug using that.

Otherwise, make sure the following code happens before any multiprocessing import, so make sure there's no way the multiprocessing module is imported before your individual test (e.g. imported through another import)

import multiprocessing as mp
mp.set_start_method("spawn", force=True)

Those are the only two things that come to mind, hope it helps.

@peterdudfield
Copy link

Thanks you this helped me

@jfecunha
Copy link

jfecunha commented Feb 1, 2024

TL;DR: If you're experiencing a hanging issue during tests with a program using multiprocessing, try adding the following code at the beginning of your problematic test function:

def a_test():
    import multiprocessing as mp
    mp.set_start_method("spawn", force=True)
    # [... rest of test ...]

Description

I've encountered an odd test hanging issue with a program that uses multiprocessing. The program completes:

  • When run individually from the command line
  • As part of a test function invoked alone via pytest

The program hangs indefinitely when run as part of a full suite of tests.

I wanted to at least document what I experienced even if I can't create a simple minimal example, since I found a workaround that could help others, and no other issue mentioning something like this. All issues are for single tests.

Environment

python: 3.8.10 pytest: 7.2.0 pytest-mock: 3.10.0 OS: Ubuntu 20.04.6 LTS

Details

The problem occurs with a test that runs a program to completion, which internally uses multiprocessing (via pool.starmap on a generator). When the test is run individually, it completes without problems within roughly 5 seconds. However, when the test is included in a full test suite and run with pytest, it hangs indefinitely (for over 6 minutes at least).

The issue persists even when using concurrent.futures in place of multiprocessing (without a partial as seen below).

My workaround is using mp.set_start_method("spawn", force=True) at the start of the test.

Fix in immediate context:

    func = partial(run_prediction, estimator=estimator, name=name, logdir=logdir)
    items = enumerate(ea_handler.yield_split())

    mp.set_start_method("spawn", force=True) # --> the fix, also works by being at the complete start of the test
    with mp.Pool(nb_workers) as pool:
        pool.starmap(func, items)

While the fix makes the test run slower (about 20 seconds), at least it doesn't hang indefinitely.

I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!

Complete pip Environment

pip list output - click me

@rabyj thanks for sharing this. I had a similar problem using the pytest-console-scripts library to test some scripts. In my case, it was happening on the GitlabCI, the test was stuck until the job reached the timeout. Using the multiprocessing spawn allowed the test to work without issues.

@marvin-jens
Copy link

marvin-jens commented Feb 16, 2024

EDIT: I found that if I make sure to explicitly close all files that are opened in a sub-process, I no longer need to set mp.set_start_method("spawn", force=True.

I ran into this problem as well. However, mp.set_start_method("spawn", force=True) causes pickle errors in my code ("object is not the same..."), which do not happen with the defaults. Are there alternative ways to get around the hanging sub-process issue that avoid using spawn ?

@nicoddemus
Copy link
Member

Closing this for now then.

For the record: pytest just runs each test function in sequence, if the test function is blocking somehow, probably the code being tested needs some adjustment -- my point is that pytest does not have a "known issue" with multiprocessing at all, it all depends on the user code running inside the tests.

If this issue comes up again, it is probably best to work on a MWE so a proper solution can be discussed.

@pierrebarbera
Copy link

for future visitors: I ran into this issue and the spawn vs fork change did not resolve it, but updating freezegun did!

@rabyj
Copy link
Author

rabyj commented Aug 23, 2024

@pierrebarbera freezegun was not part of my modules, seems like a package specific problem.

Then again, maybe MY problem was package specific but I never found out the culprit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question general question, might be closed after 2 weeks of inactivity
Projects
None yet
Development

No branches or pull requests

8 participants