Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joblib makes pandas' data manipulation hang with multiprocessing.Pool. #980

Open
fx-kirin opened this issue Dec 27, 2019 · 1 comment
Open

Joblib makes pandas' data manipulation hang with multiprocessing.Pool. #980

fx-kirin opened this issue Dec 27, 2019 · 1 comment

Comments

@fx-kirin
Copy link

@fx-kirin fx-kirin commented Dec 27, 2019

Joblib 0.14.1 hangs pandas data flow in child process with multiprocessing.Pool.map. If you use Joblib 0.13.2or remove import joblib, the problem doesn't happen.

It seems like to depend on how much data you manage in a main process and child one. It doesn't cause dead lock with less amount of data than that I used in the sample code. I'm not sure why but some blocking causing dead lock.

I put the code and data here.

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8

import pandas as pd
import multiprocessing
import os

import numpy as np
import pandas as pd
import joblib


def method(df_path):
    print("starting method")
    df = pd.read_pickle(df_path)
    print("pickle loaded")
    df.copy()
    print("exit method")


def main():
    df = pd.read_pickle("df")
    df.copy()

    print("Starting pool")
    pool = multiprocessing.Pool(1)
    pool.map(method, ['df' for _ in range(10)])


if __name__ == "__main__":
    main()

Environment

Ubuntu 18.04
Python 3.6.7

$ pipdeptree
asn1crypto==1.2.0
conda==4.8.0
  - pycosat [required: >=0.6.3, installed: 0.6.3]
  - requests [required: >=2.12.4, installed: 2.22.0]
    - certifi [required: >=2017.4.17, installed: 2019.11.28]
    - chardet [required: >=3.0.2,<3.1.0, installed: 3.0.4]
    - idna [required: >=2.5,<2.9, installed: 2.8]
    - urllib3 [required: >=1.21.1,<1.26,!=1.25.1,!=1.25.0, installed: 1.25.7]
  - ruamel-yaml [required: >=0.11.14, installed: 0.11.14]
conda-package-handling==1.6.0
  - six [required: Any, installed: 1.13.0]
ipdb==0.12.3
  - ipython [required: >=5.1.0, installed: 7.10.2]
    - backcall [required: Any, installed: 0.1.0]
    - decorator [required: Any, installed: 4.4.1]
    - jedi [required: >=0.10, installed: 0.15.2]
      - parso [required: >=0.5.2, installed: 0.5.2]
    - pexpect [required: Any, installed: 4.7.0]
      - ptyprocess [required: >=0.5, installed: 0.6.0]
    - pickleshare [required: Any, installed: 0.7.5]
    - prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.2]
      - wcwidth [required: Any, installed: 0.1.7]
    - pygments [required: Any, installed: 2.5.2]
    - setuptools [required: >=18.5, installed: 42.0.2.post20191203]
    - traitlets [required: >=4.2, installed: 4.3.3]
      - decorator [required: Any, installed: 4.4.1]
      - ipython-genutils [required: Any, installed: 0.2.0]
      - six [required: Any, installed: 1.13.0]
  - setuptools [required: Any, installed: 42.0.2.post20191203]
mkl-fft==1.0.15
  - numpy [required: Any, installed: 1.17.4]
mkl-random==1.1.0
  - numpy [required: Any, installed: 1.17.4]
mkl-service==2.3.0
  - six [required: Any, installed: 1.13.0]
mysqlclient==1.4.6
packaging==19.2
  - pyparsing [required: >=2.0.2, installed: 2.4.5]
  - six [required: Any, installed: 1.13.0]
pandas==0.25.3
  - numpy [required: >=1.13.3, installed: 1.17.4]
  - python-dateutil [required: >=2.6.1, installed: 2.8.1]
    - six [required: >=1.5, installed: 1.13.0]
  - pytz [required: >=2017.2, installed: 2019.3]
pipdeptree==0.13.2
  - pip [required: >=6.0.0, installed: 19.3.1]
pycrypto==2.6.1
pynvim==0.4.0
  - greenlet [required: Any, installed: 0.4.15]
  - msgpack [required: >=0.5.0, installed: 0.6.2]
pyOpenSSL==19.1.0
  - cryptography [required: >=2.8, installed: 2.8]
    - cffi [required: >=1.8,!=1.11.3, installed: 1.13.2]
      - pycparser [required: Any, installed: 2.19]
    - six [required: >=1.4.1, installed: 1.13.0]
  - six [required: >=1.5.2, installed: 1.13.0]
PySocks==1.7.1
PyYAML==3.12
rpdb==0.1.6
scikit-learn==0.21.3
  - joblib [required: >=0.11, installed: 0.14.1]
  - numpy [required: >=1.11.0, installed: 1.17.4]
  - scipy [required: >=0.17.0, installed: 1.1.0]
SQLAlchemy==1.3.12
tqdm==4.40.2
wheel==0.33.6
@fx-kirin fx-kirin changed the title Joblib make pandas' data manipulation hang with multiprocessing.Pool. Joblib makes pandas' data manipulation hang with multiprocessing.Pool. Dec 27, 2019
@brcharron
Copy link

@brcharron brcharron commented Jan 10, 2020

I had the same issue (hanging with pandas and multiprocessing.Pool.map), although I had not imported (or even installed) joblib explicitly. Forcing the joblib version to 0.13.2 solve it. Thanks a lot for the report or I would never have found the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants