do not enable Parallel if multiprocessing.synchronize is N/A #39

yarikoptic · 2012-07-12T22:08:34Z

Having finally tracked down not working multiprocessing.cpu_count on kfreebsd debian systems I ended up with
https://buildd.debian.org/status/fetch.php?pkg=statsmodels&arch=kfreebsd-i386&ver=0.4.2-1&stamp=1342080335
(look in the bottom)

So I wondered, shouldn't I just patch joblib also to require 'import multiprocessing.synchronize' to complete correctly for the check of multiprocessing existence

…herwise problem is currently present with python on bsd systems

GaelVaroquaux · 2012-07-13T15:10:06Z

Having finally tracked down not working multiprocessing.cpu_count

Thanks a lot!

on kfreebsd debian systems I ended up with
https://buildd.debian.org/status/fetch.php?pkg=statsmodels&arch=kfreebsd-i386&ver=0.4.2-1&stamp=1342080335
(look in the bottom)

So I wondered, shouldn't I just patch joblib also to require 'import multiprocessing.synchronize' to complete correctly for the check of multiprocessing existence

Another option would be to provide only limited parallel functionality.
For instance having joblib.cpu_count just return 1 with a warning. Thus
'n_jobs=-1' wouldn't work, but 'n_jobs=2' would work.

Would that solve the problem on kfreebsd?

Gael

yarikoptic · 2012-07-13T18:21:05Z

Sorry Gael -- I am confused...

for a moment I thought that there is some other implementation in Parallel allowing to avoid multiprocessing.Pool but that seems to be not the case, so what did you mean by "limited parallel functionality"

if joblib.cpu_count just returns 1 -- that would indeed be in effect for no explicit n_jobs setting and would fail for both n_jobs=-1 and 2

GaelVaroquaux · 2012-07-13T18:30:46Z

On Fri, Jul 13, 2012 at 11:21:08AM -0700, Yaroslav Halchenko wrote:

for a moment I thought that there is some other implementation in Parallel allowing to avoid multiprocessing.Pool but that seems to be not the case, so what did you mean by "limited parallel functionality"

Indeed, it is not the case. Are you saying that multiprocessing.Pool is
completely broken in kfreebsd?

G

yarikoptic · 2012-07-13T18:46:04Z

to say the truth -- I do not know for sure (I am not a freebsd user/developer, yet ;) ) -- but that is what error msg/bug report suggests
http://bugs.python.org/issue3770

yarikoptic · 2012-07-14T15:21:03Z

I guess there should be no harm if I push this patch into Debian package (need to fix FTBFS for statsmodels with it), right? ;-)

GaelVaroquaux · 2012-07-23T09:20:36Z

Sorry Yarik for the delay. I am merging this in.

Do you need me to release quickly a bug fix release, or can I wait a bit. I'd like to merge in a pull request by Olivier Grisel before the next release, but it's not quite ready.

yarikoptic · 2012-07-23T14:40:15Z

and I was slow myself only yesterday to discover that this is not effective with python2.6 (works fine with 2.7) -- I will look into it shortly...
no release is necessary for my purposes -- I can always carry such patches and a release (unless purely bugfix) -- might be trickier to persuade release team to allow to sneak in

GaelVaroquaux · 2012-07-23T14:41:28Z

and I was slow myself only yesterday to discover that this is not effective with python2.6 (works fine with 2.7) -- I will look into it shortly...

OK, just do a new pull request, and I'll try to be quicker to merge it.

yarikoptic · 2012-07-23T17:31:26Z

Not sure what would be the clean way now... I see two possibilities

do check by the type of system and disable multiprocessing for bsd systems.

pros -- descriptive and quick

cons -- they might come up with a locking mechanism later on so for transparent support you would require other solution (e.g. 2.) anyways
Elaborate the test for availability to actually trying to construct some dummy pool, e.g. smth like (just typing here untested)

try:
   _ = mp.Pool()
except (ImportError, OSError):
  # ImportError would be raised by python >= 2.7 (including 3.x) and OSError by python2.6
  multiprocessing = False

what do you think? should I proceed with 2?

GaelVaroquaux · 2012-07-23T17:37:44Z

On Mon, Jul 23, 2012 at 10:31:29AM -0700, Yaroslav Halchenko wrote:

Elaborate the test for availability to actually trying to construct some dummy pool, e.g. smth like (just typing here untested)
try:
   _ = mp.Pool()
except (ImportError, OSError):
  # ImportError would be raised by python >= 2.7 (including 3.x) and OSError by python2.6
  multiprocessing = False

I think that 2 garanties to always have a workable system and not prevent
multiprocessing when it could work. One problem I see is that it would
slow down significantly the import, and I know that some people care
about this. I think that we should probably either test like this only on
systems that require it, or try to test a more atomic error. Would you
mind pasting the full traceback that this create on bsd with Python 2.6.

Also, if we go down that route, we need to make really sure that that
temporary pool gets closed immediately, to avoid leaving zombie
processes.

Thanks a lot for your efforts!

yarikoptic · 2012-07-23T17:42:41Z

I think that 2 garanties to always have a workable system and not prevent
multiprocessing when it could work. One problem I see is that it would
slow down significantly the import, and I know that some people care
about this. I think that we should probably either test like this only on
systems that require it

good idea -- so 1. + 2. then ;) (i.e. test 2. only on bsd systems)

or try to test a more atomic error.

there indeed might be a better check -- I just have failed to find it since it
is somewhere deeper in _multiprocessing extension

Would you
mind pasting the full traceback that this create on bsd with Python 2.6.

yoh@nebsd:~/deb/statsmodels-0.4.2$ python2.6 -c 'import multiprocessing as mp; _ = mp.Pool()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.6/multiprocessing/__init__.py", line 227, in Pool
    return Pool(processes, initializer, initargs)
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 84, in __init__
    self._setup_queues()
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 131, in _setup_queues
    self._inqueue = SimpleQueue()
  File "/usr/lib/python2.6/multiprocessing/queues.py", line 328, in __init__
    self._rlock = Lock()
  File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 117, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1)
  File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 49, in __init__
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 78] Function not implemented

Also, if we go down that route, we need to make really sure that that
temporary pool gets closed immediately, to avoid leaving zombie
processes.

do you mean _.close() for that pool? (if it succeeded to create if of
cause ;))

Yaroslav O. Halchenko
Postdoctoral Fellow, Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

GaelVaroquaux · 2012-07-23T17:48:32Z

On Mon, Jul 23, 2012 at 10:42:42AM -0700, Yaroslav Halchenko wrote:

there indeed might be a better check -- I just have failed to find it since it
is somewhere deeper in _multiprocessing extension

yoh@nebsd:~/deb/statsmodels-0.4.2$ python2.6 -c 'import multiprocessing as mp; _ = mp.Pool()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.6/multiprocessing/__init__.py", line 227, in Pool
    return Pool(processes, initializer, initargs)
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 84, in __init__
    self._setup_queues()
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 131, in _setup_queues
    self._inqueue = SimpleQueue()
  File "/usr/lib/python2.6/multiprocessing/queues.py", line 328, in __init__
    self._rlock = Lock()
  File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 117, in __init__
    SemLock.__init__(self, SEMAPHORE, 1, 1)
  File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 49, in __init__
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 78] Function not implemented

How about trying to instanciate a multiprocessing.Semaphore? It is part
of the public API and should be significantly faster to instanciate than
a full 'Pool'. The question is: will it fail when a Pool fails?

Also, if we go down that route, we need to make really sure that that
temporary pool gets closed immediately, to avoid leaving zombie
processes.

do you mean _.close() for that pool? (if it succeeded to create if of
cause ;))

Yes, join and close.

yarikoptic · 2012-07-23T17:53:49Z

On Mon, 23 Jul 2012, Gael Varoquaux wrote:

File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 49, in init
sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 78] Function not implemented
How about trying to instanciate a multiprocessing.Semaphore? It is part
of the public API and should be significantly faster to instanciate than
a full 'Pool'. The question is: will it fail when a Pool fails?

Good call -- it does fail the same way:

yoh@nebsd:~/deb/statsmodels-0.4.2$ python2.6 -c 'import multiprocessing as mp; _ = mp.Semaphore()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.6/multiprocessing/__init__.py", line 192, in Semaphore
    return Semaphore(value)
  File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 81, in __init__
    SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX)
  File "/usr/lib/python2.6/multiprocessing/synchronize.py", line 49, in __init__
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 78] Function not implemented

yoh@nebsd:~/deb/statsmodels-0.4.2$ python2.7 -c 'import multiprocessing as mp; _ = mp.Semaphore()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/multiprocessing/__init__.py", line 196, in Semaphore
    from multiprocessing.synchronize import Semaphore
  File "/usr/lib/python2.7/multiprocessing/synchronize.py", line 59, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.

yoh@nebsd:~/deb/statsmodels-0.4.2$ python3.2 -c 'import multiprocessing as mp; _ = mp.Semaphore()'
Traceback (most recent call last):
  File "/usr/lib/python3.2/multiprocessing/synchronize.py", line 54, in <module>
    from _multiprocessing import SemLock
ImportError: cannot import name SemLock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.2/multiprocessing/__init__.py", line 195, in Semaphore
    from multiprocessing.synchronize import Semaphore
  File "/usr/lib/python3.2/multiprocessing/synchronize.py", line 59, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.

Also, if we go down that route, we need to make really sure that that
temporary pool gets closed immediately, to avoid leaving zombie
processes.
do you mean _.close() for that pool? (if it succeeded to create if of
cause ;))
Yes, join and close.

with semaphore it should be easier... would be ok if I just del it? i.e.

In [17]: sem = multiprocessing.Semaphore()
In [18]: del sem

to leave no trace?

Yaroslav O. Halchenko
Postdoctoral Fellow, Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

GaelVaroquaux · 2012-07-23T17:57:27Z

On Mon, Jul 23, 2012 at 10:53:51AM -0700, Yaroslav Halchenko wrote:

How about trying to instanciate a multiprocessing.Semaphore? It is part
of the public API and should be significantly faster to instanciate than
a full 'Pool'. The question is: will it fail when a Pool fails?

Good call -- it does fail the same way:

I do believe that this is the root of our problem. Looks like we have a
winner.

On my box, it take 30us to create a Semaphore:

In [1]: import multiprocessing

In [2]: %timeit multiprocessing.Semaphore()
10000 loops, best of 3: 28.9 us per loop

thus, I think that we can put this check on every os, which avoids ugly
code.

yarikoptic · 2012-07-23T18:18:25Z

BTW -- while unittesting for the new "fix" on the kfreebsd box (will send a PR
after test it enough ;) ) on top of current master -- got following
failure:

======================================================================
FAIL: joblib.test.test_disk.test_disk_used
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/deb/joblib/joblib/test/test_disk.py", line 41, in test_disk_used
    nose.tools.assert_true(disk_used(cachedir) < target_size + 12)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 294 tests in 23.476s

FAILED (failures=1)

and that is because

print "DISK USED: %r target_size: %r" % (disk_used(cachedir), target_size)

gives

DISK USED: 1042 target_size: 1024

Yaroslav O. Halchenko
Postdoctoral Fellow, Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

yarikoptic · 2012-07-23T18:33:18Z

btw -- is python3 fully supported ? I am just getting some failures and
not sure if that is just my sloppy way to test (so I should not whine
and do not bother about python3 for now) or joblib's problem... ?

e.g. stalls for me:

> nosetests3 -s -v /home/yoh/deb/gits/joblib/build/py3k/joblib
...
joblib.test.test_my_exceptions.test_inheritance ... ok
Failure: NameError (name 'file' is not defined) ... ERROR
joblib.test.test_parallel.test_cpu_count ... ok
joblib.test.test_parallel.test_simple_parallel([0, 1, 4, 9, 16], [0, 1, 4, 9, 16]) ... ok
joblib.test.test_parallel.test_simple_parallel([0, 1, 4, 9, 16], [0, 1, 4, 9, 16]) ... ok
joblib.test.test_parallel.test_simple_parallel([0, 1, 4, 9, 16], [0, 1, 4, 9, 16]) ... ok
joblib.test.test_parallel.test_simple_parallel([0, 1, 4, 9, 16], [0, 1, 4, 9, 16]) ... ok
Unhandled exception in thread started by

Yaroslav O. Halchenko
Postdoctoral Fellow, Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

…thon 2.6 See joblib#39

ENH: 2nd stage check for functional multiprocessing. Issue warning ot…

37058f2

…herwise problem is currently present with python on bsd systems

GaelVaroquaux closed this Jul 23, 2012

vene pushed a commit to vene/joblib that referenced this pull request Jul 18, 2014

ENH: making 2nd stage checking for multiprocessing compatible with py…

8aa6e48

…thon 2.6 See joblib#39

tlevine mentioned this pull request Oct 26, 2016

Dependencies #417

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not enable Parallel if multiprocessing.synchronize is N/A #39

do not enable Parallel if multiprocessing.synchronize is N/A #39

yarikoptic commented Jul 12, 2012

GaelVaroquaux commented Jul 13, 2012

yarikoptic commented Jul 13, 2012

GaelVaroquaux commented Jul 13, 2012

yarikoptic commented Jul 13, 2012

yarikoptic commented Jul 14, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

do not enable Parallel if multiprocessing.synchronize is N/A #39

do not enable Parallel if multiprocessing.synchronize is N/A #39

Conversation

yarikoptic commented Jul 12, 2012

GaelVaroquaux commented Jul 13, 2012

yarikoptic commented Jul 13, 2012

GaelVaroquaux commented Jul 13, 2012

yarikoptic commented Jul 13, 2012

yarikoptic commented Jul 14, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

GaelVaroquaux commented Jul 23, 2012

yarikoptic commented Jul 23, 2012

yarikoptic commented Jul 23, 2012