util.apply_parallel does not have "compute('processes')" embedded #5409

alexdesiqueira · 2021-05-25T16:41:49Z

Description

Saw this question on Stack Overflow today. The answer goes:

I looked at the source code, and apply_parallel relies on this Dask command
(...)
But I found that it needs .compute('processes') at the end of it to guarantee multiple cpu's. So now I'm just using Dask itself

Is there something missing on our end, or is this something we would solve with documentation?

Way to reproduce

The example available on that SO question...

from numpy import random, ones, zeros_like
from skimage.util import apply_parallel

def f3(im):
    print('here')
    for _ in range(10000):
        u=random.random(100000) 

    return zeros_like(im)

if __name__=='__main__':

    im=ones((2,4))

    f = lambda img: f3(img)
    im2=apply_parallel(f,im,chunks=1)

... and the code that "solves" it:

import dask.array as da
im2 = da.from_array(im,chunks=2)
proc = im2.map_overlap(f, depth=0).compute(scheduler='processes')

The text was updated successfully, but these errors were encountered:

alexdesiqueira · 2021-05-25T17:49:09Z

It seems that that could be solved using compute=True in apply_parallel:

compute : bool, optional
    If ``True``, compute eagerly returning a NumPy Array.
    If ``False``, compute lazily returning a Dask Array.
    If ``None`` (default), compute based on array type provided

grlee77 · 2021-05-26T14:14:09Z

Yes, if compute is False (or None when the input is a Dask array) you will need to call compute on the value returned by apply_parallel.

im2 = im2.compute(scheduler='processes')

Depending on the function being accelerated, you may want to try scheduler='threaded' as well as it should have less overhead. If the code being run does not release the GIL, though, only 1 thread will be used. Using processes gets around limitations related to the GIL, but has the overhead of launching multiple Python processes.

If you use compute=True, apply parallel will call compute internally. To control the scheduler used in that case, one can use dask.config.set.

rfezzani · 2021-10-18T07:43:47Z

@alexdesiqueira and @grlee77, can we consider this issue solved or do we plan any other action?

grlee77 · 2021-10-18T18:13:25Z

I think it can be resolved by improving docs. @alexdesiqeira started a PR for that in #5407

grlee77 mentioned this issue Jun 4, 2021

2021's calendar of community management #5169

Closed

rfezzani added the type: bug label Oct 18, 2021

jarrodmillman removed the 🐛 Bug label Sep 7, 2023

lagru added the 🐛 Bug label Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util.apply_parallel does not have "compute('processes')" embedded #5409

util.apply_parallel does not have "compute('processes')" embedded #5409

alexdesiqueira commented May 25, 2021

alexdesiqueira commented May 25, 2021

grlee77 commented May 26, 2021

rfezzani commented Oct 18, 2021

grlee77 commented Oct 18, 2021

util.apply_parallel does not have "compute('processes')" embedded #5409

util.apply_parallel does not have "compute('processes')" embedded #5409

Comments

alexdesiqueira commented May 25, 2021

Description

Way to reproduce

alexdesiqueira commented May 25, 2021

grlee77 commented May 26, 2021

rfezzani commented Oct 18, 2021

grlee77 commented Oct 18, 2021