WIP: Shared arrays #43

ogrisel · 2012-07-23T22:54:10Z

Early pull request to introduce a new datastructure that blends the good features of np.memmap and multiprocessing.Array for working with joblib.Parallel without exhausting the memory when dealing with large data arrays.

This can be considered an alternative or complementary solution to PR #40 for issue #38.

TODO:

more tests (let's reach 99% coverage)
write docstrings
what should be SharedArray(10) + 3? A regular numpy array? If so implement it and test it.
implement a as_shared_datastructure to reallocate scipy.sparse and other nested datastructures with arrays to make it easier to work in a multiprocessing context
integration with joblib.load and joblib.Memory.cache (maybe with a shared=True option)?
add a share_memory option to joblib.Parallel to call as_shared_datastructure on the args?
some narrative documentation (once everything is)

GaelVaroquaux · 2012-07-24T05:55:39Z

Nitpicks (I am starting with the nitpicks, because they don't require an understanding of the code):

I'd prefer the file to be named share_array.py
assharedarray -> as_shared_array. That way I don't read 'ass hared array'
In the tests, I'd like to see a test using Parallel that checks that we are indeed having a view.

Now to the big picture:

Do we want to have the same object to do anonymous and file-based memmap? I think that focusing on anynomous memmaps would simplify the code
Is there a reason to duplicate file-based memmaps functionality from numpy?
Right now, I believe that any array created from a SharedArray will be a SharedArray:

In [1]: from joblib import sharedarray

In [2]: a = np.zeros((10, 10))

In [3]: b = sharedarray.assharedarray(a)

In [4]: b
Out[4]: 
SharedArray([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [5]: b[:2]
Out[5]: 
SharedArray([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [6]: b[:2] + 3
Out[6]: 
SharedArray([[ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.]])

However, the way that you have coded it, their is memmapping going on for Out[5] but not Ou[6](because you are checking for). Thus the SharedArray is Out[6] is not actually usable in a multiprocessing context.

I do believe that it is a desired feature. If you code it the other way around, as people do their computations, they will end up with heaps of shared arrays. Each one of these have an associated file descriptor, and at the end of the day, you run out of file descriptors (the infamous 'too many files open' error). This happens to me when I work with memmapped arrays.

So, we want this feature that daughter arrays do not rely on memmapping, but right now it is confusing to the user, as the user has the impression that he has arrays that can be shared across processes. I am not sure how address this problem, but I think that using the priority mechanism and the inheritance model of numpy, we can improve things. In the least, we can probably avoid that grand-daughter arrays are ShareArrays.

In this light, I suggest putting aray_priority to -9999: this means that when operating with other array subclasses, this subclass will always loose in the subclass-coercion mechanism (see http://docs.scipy.org/doc/numpy/reference/arrays.classes.html).

Also, I wonder if array_prepare and array_wrap could not be made in a clever way, so that in the cases for which _mmap is None, standard ndarrays are created instead of SharedArrays.

ogrisel · 2012-07-24T08:02:59Z

I'd prefer the file to be named share_array.py

I suppose you meant shared_array.py

assharedarray -> as_shared_array. That way I don't read 'ass hared array'

Alright, I just wanted to be consistent with np.asanyarray

In the tests, I'd like to see a test using Parallel that checks that we are indeed having a view.

Yes this is planned. The lock feature is missing too.

Now to the big picture:

Do we want to have the same object to do anonymous and file-based memmap?
I think that focusing on anynomous memmaps would simplify the code

Indeed but they would share a lot of common code with the file-based variant.

Is there a reason to duplicate file-based memmaps functionality from numpy?

Yes precisely to solve the multiprocessing memory copy of memmaps (and add the lock feature to them). This was my initial use case (the anonymous mode is an almost free bonus of this refactoring).

I will try to do experiments with array priorities and do some open file descriptors profiling tonight. Thanks for this first review.

ogrisel · 2012-07-24T09:21:58Z

Good news: for anonymous shared arrays, there is no attached filedescriptor:

In [1]: from joblib.sharedarray import assharedarray

In [2]: import numpy as np

In [3]: a = np.zeros(10)

In [6]: %time l = [assharedarray(a) for _ in range(10000)]
CPU times: user 1.35 s, sys: 0.10 s, total: 1.45 s
Wall time: 1.46 s

In [7]: l[0] is l[1]
Out[7]: False

In [8]: l[0]
Out[8]: SharedArray([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

Before the running this:

~> sudo lsof  | wc -l
    3445

After running this:

~> sudo lsof  | wc -l
    3449

ogrisel · 2012-07-25T01:12:29Z

I decided to split the file based and anonymous memory case in distinct classes as you requested. I have also tried to summarize what remains to be done in the description of this PR based on your feedback + other considerations.

GaelVaroquaux · 2012-07-25T05:26:00Z

I decided to split the file based and anonymous memory case in distinct classes as you requested.

That was a suggestion, and not a request. I can be convinced otherwise.

I have also tried to summarize what remains to be done in the description of this PR based on your feedback + other considerations.

Cool. I am not sure what the shared=True option in cache/load would be
for.

ogrisel · 2012-07-25T07:09:51Z

That was a suggestion, and not a request. I can be convinced otherwise.

The code is now much simpler so I think it's better this way :)

I am not sure what the shared=True option in cache/load would be
for.

The goal would be to avoid doing useless memory allocation in a non shareable array of cached serialized results prior to feeding a call to parallel.

…rays

ogrisel · 2012-08-01T12:26:30Z

Just a quick progress note on this. The coverage of the current implementation is good but there are still 2 outstanding issues:

I need to find out how to override __array_priority__ and maybe __array_wrap__ to make the SharedArray class behave correctly with operators (e.g. SharedArray(10) + 3 should return a regular numpy array to make it explicit that memory copy happens instead of returning a fake SharedArray instance as is currently the case).
the current design can segfault if the original SharedArray is garbage collected while there is still pickled versions of it waiting in a queue of a multiprocessing.Pool for instance. This issue is probably best handled by spawning a custom multiprocessing.Manager that handles the allocation of the original shared memory and the reference counts from any SharedArray instance or pickle that needs it. More prototyping work is required to devise what's best and robust.

GaelVaroquaux · 2012-08-01T17:37:34Z

OK, I think that I am going to do a new minor release of joblib not
waiting for this PR to be merged. That way we can get out a bugfix only
release of joblib.

ogrisel · 2012-08-01T21:19:17Z

No pbm. Don't wait for me, this is still a WIP.

glouppe · 2012-08-21T07:14:54Z

@ogrisel Have you made progress on this? I am not so familiar with joblib codebase, but if I can be of any help, please tell me! (I can dive into it) Shared arrays are definitely something I'd like to see properly implemented.

travisbot · 2012-08-21T08:49:34Z

This pull request fails (merged 54749eb into ad5fd41).

ogrisel · 2012-08-21T08:57:28Z

@glouppe yes I decided to stop using anonymous mmap as it would make it much to complex to implement proper multiprocess garbage collection and use tempfile instead. That might incur some overhead though. I still have to implement early garbage collection + gc tests with pre allocated multiprocessing pools + operators priority.

travisbot · 2012-08-21T08:59:09Z

This pull request fails (merged cf24d21 into ad5fd41).

travisbot · 2012-08-21T09:07:03Z

This pull request fails (merged 8595102c into ad5fd41).

travisbot · 2012-08-21T09:11:32Z

This pull request fails (merged d24e5d1 into 8aa6e48).

glouppe · 2012-08-27T12:15:32Z

@ogrisel How can I help you with any of these things? Feel free to delegate some work. I'd be glad to help.

ogrisel · 2012-08-27T12:28:18Z

I have been given it a thought this WE and I cannot come up with a good solution anymore. The current code has two issues:

it leaks temporary files until the process exits even if all the pickled and live instances are collected. One could add a method to explicitly let the user collect the tempfile when he / she knows that know more running or queued processes will need it in the future. I had started to implement some reference counting in shared memory for multiple process but this is too complicated to implement, or even impossible if the multiprocessing Pool is forked before the allocation of the shared array submitted to the pool queue.
overriding the __reduce__ method will make the joblib memoizer (the Memory.cache method) fail to detect changes in the actual data. It would need to be aware that computing the digest should be done on the data buffer instead of the results of a pickle (I need to check how it's implemented to know whether this is already the case or not). Anyway that might break in other usage of the pickler.

I am thinking that the best way to go would be to not use a custom __reduce__ method on the SharedArray and memmap classes but instead to find a way to make our (joblib's) multiprocessing Pool instances use a Queue implementation that has a customize pickler.

GaelVaroquaux · 2012-08-30T15:04:30Z

@ogrisel: I cannot allocate time on this before the sprint, but I'd love
to hash on these problems at the sprint.

ogrisel · 2012-08-30T15:21:46Z

I have started to work on multiprocessing.Pool + multiprocessing.queues.SimpleQueue subclasses that make it possible to register custom reducers and thus to handle mmap'ed arrays without subclassing them. That will be much cleaner IMHO. I hope I will have time to work on this this WE and start experimenting on a branch for sklearn integration to address the RandomForest use case.

ogrisel · 2012-09-09T17:19:36Z

I am closing this PR as the approach in #44 looks much better.

ogrisel added 8 commits July 23, 2012 15:52

WIP: initial start from np.memmap

f7292e5

basic implementation for anonymous shared array

9b672cc

some tests for assharedarray

c5ec065

Integration with np.memmap

f70561d

More tests for init from file object or filename

4e91006

increasing test coverage

b9c95b6

better docstring for assharedarray

ac18ae4

Even more tests for memmap integration

f0d1f78

ogrisel added 7 commits July 25, 2012 01:03

renamings

0b1332b

more tests

38cacfa

Step 1 of the SharedArray / SharedMemmap split

16ac441

Step 2 of the SharedArray / SharedMemmap split

679ae10

Step 3 of the SharedArray / SharedMemmap split

a9824af

use canonical way to allocate SharedArray in tests

a86e820

use canonical way to allocate SharedArray in as_shared_array

6fd2e48

ogrisel added 8 commits July 28, 2012 19:36

Test shared array and parallel update

81e0a91

test shared array copy-on-write

730de9d

readonly and writeonly mode do not make sense for anonymous shared ar…

251d20f

…rays

remove useless offset management for anonymous arrays

1c593e4

fixed recursive pickling support + prepare tests for operators

c4b93e5

tests on parallel usage of pickled shared array

2dbca0e

fix potential bug in future test

11454c0

note for later on GC handling of share arrays

b5e83eb

new version of shared arrays implemented using tempfiles

54749eb

fix broken assertion on 32 bit systems

cf24d21

restaure python 2.5 compat (thanks travisbot)

d24e5d1

ogrisel mentioned this pull request Sep 2, 2012

[MRG] Custom pickling pool for no-copy memmap handling with multiprocessing #44

Merged

3 tasks

ogrisel closed this Sep 9, 2012

JinpengLI mentioned this pull request Sep 9, 2013

Too many open files using memory mapping #81

Closed

vene mentioned this pull request Jul 19, 2014

joblib.Parallel does not work with memmap'd arrays as input #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Shared arrays #43

WIP: Shared arrays #43

ogrisel commented Jul 23, 2012

GaelVaroquaux commented Jul 24, 2012

ogrisel commented Jul 24, 2012

ogrisel commented Jul 24, 2012

ogrisel commented Jul 25, 2012

GaelVaroquaux commented Jul 25, 2012

ogrisel commented Jul 25, 2012

ogrisel commented Aug 1, 2012

GaelVaroquaux commented Aug 1, 2012

ogrisel commented Aug 1, 2012

glouppe commented Aug 21, 2012

travisbot commented Aug 21, 2012

ogrisel commented Aug 21, 2012

travisbot commented Aug 21, 2012

travisbot commented Aug 21, 2012

travisbot commented Aug 21, 2012

glouppe commented Aug 27, 2012

ogrisel commented Aug 27, 2012

GaelVaroquaux commented Aug 30, 2012

ogrisel commented Aug 30, 2012

ogrisel commented Sep 9, 2012

WIP: Shared arrays #43

WIP: Shared arrays #43

Conversation

ogrisel commented Jul 23, 2012

GaelVaroquaux commented Jul 24, 2012

ogrisel commented Jul 24, 2012

ogrisel commented Jul 24, 2012

ogrisel commented Jul 25, 2012

GaelVaroquaux commented Jul 25, 2012

ogrisel commented Jul 25, 2012

ogrisel commented Aug 1, 2012

GaelVaroquaux commented Aug 1, 2012

ogrisel commented Aug 1, 2012

glouppe commented Aug 21, 2012

travisbot commented Aug 21, 2012

ogrisel commented Aug 21, 2012

travisbot commented Aug 21, 2012

travisbot commented Aug 21, 2012

travisbot commented Aug 21, 2012

glouppe commented Aug 27, 2012

ogrisel commented Aug 27, 2012

GaelVaroquaux commented Aug 30, 2012

ogrisel commented Aug 30, 2012

ogrisel commented Sep 9, 2012