New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding parallel_voxel_fit decorator #1418
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really check the code because I'm not too sure what it'll do, but here are some minor things.
return SillyFit(self, data) | ||
|
||
def predict(self, S0): | ||
return np.ones(10) * S0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use numpy.full()
? Or if S0
is an array, 10 * S0
?
def test_parallel_voxel_fit(): | ||
voxel_fit(SillyParallelModel, SillyFit) | ||
model = SillyParallelModel() | ||
# Test with a mask and few processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean 'processes'?
dipy/reconst/multi_voxel.py
Outdated
|
||
# Get non null index from mask | ||
indexes = np.argwhere(mask) | ||
# convert indexes to tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You sometime start with a
# Majuscule
# miniscule
Could you please keep it standard? Here and elsewhere.
dipy/reconst/multi_voxel.py
Outdated
|
||
# Get number of processes | ||
nb_processes = int(kwargs['nb_processes']) if 'nb_processes' in kwargs else cpu_count() | ||
nb_processes = cpu_count() if nb_processes < 1 else nb_processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's clearer this way.
nb_processes = int(kwargs.get('nb_processes', '0'))
nb_processes = nb_processes if nb_processes >= 1 else cpu_count()
dipy/reconst/multi_voxel.py
Outdated
if nb_processes == 1: | ||
return single_voxel_fit(model, data, *args, **kwargs) | ||
|
||
# Get non null index from mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not singular, indexes or indices.
dipy/reconst/multi_voxel.py
Outdated
""" | ||
Each pool process calls this initializer. | ||
Load the array to be populated into | ||
that process's global namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the \n here? The doc can also reach 79 characters.
dipy/reconst/multi_voxel.py
Outdated
numpy ndarray that you want to convert. | ||
lock : boolean | ||
controls the access to the shared array. When you shared | ||
array is a read only access, you do not need lock. Otherwise, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When your shared array has a ... ? need a lock?
Thank you @nilgoyette for the review, I made the changes. |
Hello @skoudoro, Thank you for updating !
Comment last updated on August 21, 2018 at 16:52 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #1418 +/- ##
==========================================
- Coverage 87.32% 87.22% -0.11%
==========================================
Files 246 248 +2
Lines 31806 31936 +130
Branches 3450 3467 +17
==========================================
+ Hits 27775 27855 +80
- Misses 3210 3249 +39
- Partials 821 832 +11
Continue to review full report at Codecov.
|
Oh, damn, I wasn't aware that you can't use |
or maybe, we should update our Numpy minimal version. I wonder what is the Numpy version of Centos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. I had a couple of comments.
Have you had a chance to profile this? Does it really lead to any improvement in a realistic test-case?
Also - could you please rebase this on master? I think that at least some of the test failures should be resolved now.
dipy/reconst/multi_voxel.py
Outdated
global shared_arr | ||
|
||
|
||
def shm_as_ndarray(mp_array, shape=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the naming here: there's nothing specific to spherical harmonics here, is there? Or does "shm" stand for something else here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, bad naming, shm
for shared multiprocessing array
. I will changed that
dipy/reconst/multi_voxel.py
Outdated
return np.asarray(result) | ||
|
||
|
||
def ndarray_to_shm(arr, lock=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And same here
dipy/reconst/multi_voxel.py
Outdated
# shared_arr = np.ctypeslib.as_array(arr_to_populate) | ||
# shared_arr = shared_arr.reshape(shape) | ||
shared_arr = shm_as_ndarray(arr_to_populate, shape) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to have some unit tests for the functions above. If I understand correctly, right now they are tested only through the decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I will add them.
BTW, I still wonder if these functions (shm_as_array
and ndarray_to_shm
) are in the right place because they can be used for other purposes like peaks_from_model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in a new dipy.core.parallel
module?
aee0aa0
to
554be01
Compare
…work to do : - improve memory management. it is growing to much - improve speed, "only" 2 times faster on my machine.
we use now shared array
ba0063d
to
1bfb729
Compare
… has never passed on windows
return a list of tuple(voxel index, model fitted instance) | ||
""" | ||
model, chunks = arguments | ||
return [(idx, model.fit(shared_arr[idx])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check list in case of old memory issue.
dipy/reconst/multi_voxel.py
Outdated
raise ValueError("mask and data shape do not match") | ||
|
||
# Get number of processes | ||
nb_processes = int(kwargs.pop('nb_processes', '0')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the default should be None for all available cpus.
for i in range(1, len(chunks_spacing))] | ||
|
||
# Create shared array | ||
shared_arr_in = ndarray_to_mparray(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add in the comment that you make a single copy of the data here. If you make a copy.
copied shared array | ||
""" | ||
|
||
array1d = arr.ravel(order='A') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This copy is not needed. Use directly arr.
return a list of tuple(voxel index, model fitted instance) | ||
""" | ||
model, chunks = arguments | ||
return [(idx, model.fit(shared_arr[idx])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write it in a single line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the benefit. it will be harder to read and understand if I change that
2614.87695312, 2316.55371094, 2267.7722168]) | ||
|
||
noisy_multi = np.zeros((2, 2, 1, len(gtab.bvals))) | ||
noisy_multi[0, 1, 0] = noisy_multi[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this break line is not very good for reading...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I will change that
I would expect an enormous overhead in either platform - would it be worth trying to parallelize using threading? (that said, I have externally tried with threads, and it doesn't seem to yield any gains, which is surprising at the very least) EDIT: to add more information, it seems to me that some models (I have tried only with DKI) may have a time floor, so unless some operation that takes time (regardless of the amount of data being passed in) is parallelized, I don't really see any gains in passing more or less data. |
See also discussion in nipreps/eddymotion#101. Indeed, some models (DTI and DKI, in particular) will not benefit from more parallelization, because we're already using multi-threaded matrix operations at the numpy level. This kind of parallelization would be most beneficial for models that can't utilize that kind of parallelization, or at least can't do that easily. |
Incidentally, is this PR superseded by #2539? |
I don't think that forking for every voxel will be of great benefit. An ITK approach (chunking long arrays of voxels into sections) may be more effective with subprocesses. I will check on the project linked by Ariel and report back. |
This PR is an updated version of PR #1135 and #642. I have simplified and refactored the code of @MrBago and @sahmed95.
You just need to add this decorator (
@parallel_voxel_fit
) to your model for splitting your data into small chunks and run them through several processors.To define the number of processors, you just need to do:
model.fit(data, nb_processes=6)
Currently, all the code is in
reconst/multi_voxe.py
but I suppose that this is not the right place and any ideas are welcome.Can you have a look @nilgoyette @MrBago @arokem @sahmed95 @Garyfallidis?
Thanks!
Note: In order for multiprocessing to be beneficial, the fitting task needs to be significant. Significant means bigger than the overhead created by multiprocessing during inter-process communication. If it is not the case, use
multi_voxel_fit
decorator.Note:
multiprocessing
is usingfork()
on Linux when it starts a new child process. On Windows -- which does not support fork() -- multiprocessing is using the win32 API call CreateProcess. It creates an entirely new process from any given executable (which is quite slow)