Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Numpy arrays shareable among related processes. #7533

Closed
wants to merge 11 commits into from

Conversation

matejak
Copy link

@matejak matejak commented Apr 9, 2016

This pull request introduces work that has been seen here earlier:
https://bitbucket.org/cleemesser/numpy-sharedmem/overview (shmarray.py)
Originally written by David Baddeley, then maintained by Chris Lee-Messer, and finally treated by me. The license is a BSD one.

This pull requests introduces np.shm module with empty, ones, zeros and copy functions that can behave the same as their ordinary numpy counterparts with one exception - the output array can be shared between processes, allowing for streamlined parallel data processing with low overhead.

This PR also features a test suite and documentation.
Accepting it would open door to parallelize complicated data processing (for instance processing using external modules) if the problem has a well-defined array-in --- array-out interface.

What I am not too sure about:

  • Where to place the shm.py file,
  • Does it really work on non-Unix platforms?
  • Contents of the __new__ method of the shmarray class are still a mystery to me although I have read the "subclassing ndarray" guide. Could you please drop an eye on it?

@seberg
Copy link
Member

seberg commented Apr 10, 2016

Can you send a mail to the mailing list about this? My first observations:

  • This does not work on windows (I assume no; see the test errors)?
  • The name(s) seems unnecessarily shortened.
  • Can you explain why the ndarray subclass is needed? Subclasses can be rather annoying to get right, and also for other reasonsl.

In the end the big question is probably whether or not to include it into NumPy proper (especially if it only works on some systems) and the answer to that should be discussed on the list.

@charris
Copy link
Member

charris commented Apr 10, 2016

Note that if tests are in numpy/core, the implementation should also be there.

@matejak
Copy link
Author

matejak commented Apr 10, 2016

Thank you for your suggestions, I will certainly post to the mailing list, but polishing the request a bit first seems as a good idea to me.

Anyway, good news everyone --- I have checked today and I have been able to get it working on MS Windows too.

to stay the same even if you use the :mod:`threading` module due to a
CPython feature called `GIL <https://wiki.python.org/moin/GlobalInterpreterLock>`_
(global interpreter lock). GIL ensures that only one thread is active
at a time, so threre is no true multitasking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite right. Many operations with NumPy/SciPy/pandas release the GIL, which makes multi-threading quite variable. Alternatively, IO also generally also releases the GIL. So multi-threading is only not viable if the inner loop of your code is in pure Python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is a typo in threre

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nodd The typo is fixed.
@shoyer As I have explained in the ML, I see the main benefit of this PR in use cases when for third-party modules that do complicated calculations (involving creation of Python objects) that have a numpy interface.
I totally agree with you that NumPy/SciPy/pandas don't fall into this cathegory.

@shoyer
Copy link
Member

shoyer commented Apr 11, 2016

This needs tests and justification for custom pickling methods, which are not used in any of the current examples. I'm pretty sure the reason why they exist is so you can pass these arrays into methods like multiprocessing.Pool.apply_async, concurrent.futures.Executor.submit or joblib.Parallel as an argument. If that works, I would definitely use it for the examples -- it's much cleaner form to pass around arrays as arguments instead of relying on global variables.

I do see some value in providing a canonical right way to construct shared memory arrays in NumPy, but I'm not very happy with this solution, because it appears to require convoluted and error prone setup (you definitely need to create shared memory arrays in the parent process before launching any subprocesses) and terrible code organization (with the global variables). Frankly, I would switch to using something like Numba or Cython for my inner loop (or another programming language without the GIL) before I would suggest adopting the current approach.

If there's some way to we can paper over the boilerplate such that users can use it without understanding the arcana of multiprocessing, then yes, that would be great. But otherwise I'm not sure there's anything to be gained by putting it in a library rather than referring users to the examples on StackOverflow [1] [2].

In my mind, there is roughly an appropriate level of boilerplate for user code:

def do_something(shape, args):
    shared_array = np.shared_empty_array(shape)
    joblib.Parallel(delayed(func)(shared_array, other, ...) for other in args)

Finally, even if this can be done cleanly it's not clear to me that this belongs in NumPy. Joblib already solves exactly this sort of problem. Because it write temporary files to /dev/shm/ if available, this is basically a solved problem on Linux.

[1] http://stackoverflow.com/questions/10721915/shared-memory-objects-in-python-multiprocessing
[2] http://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing

@matejak
Copy link
Author

matejak commented Apr 11, 2016

The discussion not concerning implementation details is likely to move to the ML:
https://mail.scipy.org/pipermail/numpy-discussion/2016-April/075342.html

@matejak
Copy link
Author

matejak commented Apr 18, 2016

I have added a quite complete example of a module that wraps around multiprocessing.Pool and that can also perform progress notification. I believe that some of the boilerplate can be avoided by introducing factory functions, would you like me to look into that?

@charris
Copy link
Member

charris commented May 9, 2016

The concurrent module is new in Python 3.2 so not available in 2.7. Tests and whatever else needs to be fixed to account for that.

@charris
Copy link
Member

charris commented May 9, 2016

Please squash the commits into relevant bits and follow the commit message format in doc/source/dev/gitwash/development_workflow.rst.

Also, the name shm is not informative, perhaps a simple shared_memory or something along those lines?

@charris charris changed the title Numpy arrays shareable among related processes. ENH: Numpy arrays shareable among related processes. May 9, 2016
@matejak
Copy link
Author

matejak commented May 16, 2016

@charris Can I assume that merging this to numpy is not ruled out? I will gladly work on improving this PR, but obviously not if it is guaranteed that it won't be accepted.

@charris
Copy link
Member

charris commented May 16, 2016

It is not my area, but the comments on the mailing list were somewhat skeptical. I'd be inclined to leave this out unless several people make a strong case for its inclusion. @njsmith @sturlamolden Thoughts.

@shoyer
Copy link
Member

shoyer commented May 16, 2016

@matejak please take a look at the recent mailing list discussion. That's where you need to convince (some) people that this is useful. But my opinion is that is that this abstraction is too leaky to be a good fit for numpy.

@sturlamolden
Copy link
Contributor

It might fit better in SciPy, if there were such a package as scipy.parallel, bit there isn't. On the other hand, joblib is in a package of its own, so maybe it fits better outside of everything? I don't know.

@David-Baddeley
Copy link

As the original author of shmarray, I think it's unfortunately too much of a cludge to go in either numpy or scipy. It's useful for what I do, and I'd love to see an easy and robust way of handling shared numpy arrays, but whilst shmarray is easy, it's not robust or particularly intuitive and you really need to know the limitations /internal architecture to be able to use it.

To clear up a few of the points made above and add some context:

  • joblib is not really an option if you want shared memory under windows
  • I'd only only ever use this approach when it's unavoidable. In my use case, the inner loop was already coded in c and calling a c-library which is not thread safe.
  • global variables are not necessary. The "canonical" test case (i.e. the use case I wrote the module for) does not use globals (see snippet below, or full file at https://bitbucket.org/david_baddeley/python-microscopy/src/tip/PYME/LMVis/visHelpers.py). The args to multiprocessing.Process, are however pickled within the multiprocessing module before being sent to the child processes, hence the need for the custom pickling behaviour.
  • when writing the module I'd originally hoped that the custom pickling would also allow the object to be passed into Queues and Pipes, but this doesn't work (it's been a while, but I think that the memory needs to be allocated before the 'fork' call).
def rendJitTri(im, x, y, jsig, mcp, imageBounds, pixelSize, n=1):
    '''Helper function which runs on each spawned process'''
    for i in range(n):
        scipy.random.seed()

        Imc = scipy.rand(len(x)) < mcp

        if isinstance(jsig, numpy.ndarray):
            jsig2 = jsig[Imc]
        else:
            jsig2 - float(jsig)

        T = delaunay.Triangulation(x[Imc] +  jsig2*scipy.randn(Imc.sum()), y[Imc] +  jsig2*scipy.randn(Imc.sum()))

        rendTri(T, imageBounds, pixelSize, im=im)


def rendJitTriang(x,y,n,jsig, mcp, imageBounds, pixelSize):
    '''Perform kernel density estimation using jittered triangulation'''
    sizeX = int((imageBounds.x1 - imageBounds.x0)/pixelSize)
    sizeY = int((imageBounds.y1 - imageBounds.y0)/pixelSize)

    im = shmarray.zeros((sizeX, sizeY))


    x = shmarray.create_copy(x)
    y = shmarray.create_copy(y)
    if type(jsig) == numpy.ndarray:
        jsig = shmarray.create_copy(jsig)


    nCPUs = multiprocessing.cpu_count()

    tasks = (n/nCPUs)*numpy.ones(nCPUs, 'i')
    tasks[:(n%nCPUs)] += 1

    processes = [multiprocessing.Process(target = rendJitTri, args=(im, x, y, jsig, mcp, imageBounds, pixelSize, nIt)) for nIt in tasks]

    for p in processes:
        p.start()

    for p in processes:
        p.join()

    return im/n

@sturlamolden
Copy link
Contributor

sturlamolden commented May 19, 2016

when writing the module I'd originally hoped that the custom pickling would also allow the object to be >passed into Queues and Pipes, but this doesn't work (it's been a while, but I think that the memory >needs to be allocated before the 'fork' call).

Anonymous shared memory must be allocated before the fork call. Named shared memory can be allocated after the fork call. multiprocessing.Array allocates anonymous shared memory. It must be instatiated before forking, and can only be shared by inheritance (not by pickling).

We have to use named segments (System V IPC shmget, POSIX open_shm, or mmap from /tmp on Linux) in order for multiprocessing.Queue to work.

@sturlamolden
Copy link
Contributor

joblib is not really an option if you want shared memory under windows

Joblib is also crippled on Mac and FreeBSD, since /tmp is not backed by tmpfs on these systems. In practice, joblib only provides shared memory on Linux.

@charris
Copy link
Member

charris commented May 19, 2016

Taking all this together, I'm going to close this. @matejak If you want to pursue the matter, feel free to yell.

@charris charris closed this May 19, 2016
@jakirkham
Copy link
Contributor

Not to necro an old thread, but more to share something with those that may look for this sort of thing.

One can build a custom ctypes array that has all the advantages of multiprocessing's RawArray. Plus it keeps its metadata (shape, type, etc.) even when being pickled, which make it easy to share between processes. Can also use the buffer of this object with NumPy easily. Here's an example. Code needed to implement this is fairly simple as it has multiprocessing do all the heavy lifting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants