Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding parallel_voxel_fit decorator #1418

Closed
wants to merge 21 commits into from

Conversation

skoudoro
Copy link
Member

@skoudoro skoudoro commented Feb 6, 2018

This PR is an updated version of PR #1135 and #642. I have simplified and refactored the code of @MrBago and @sahmed95.

You just need to add this decorator (@parallel_voxel_fit ) to your model for splitting your data into small chunks and run them through several processors.

To define the number of processors, you just need to do: model.fit(data, nb_processes=6)

Currently, all the code is in reconst/multi_voxe.py but I suppose that this is not the right place and any ideas are welcome.

Can you have a look @nilgoyette @MrBago @arokem @sahmed95 @Garyfallidis?

Thanks!

Note: In order for multiprocessing to be beneficial, the fitting task needs to be significant. Significant means bigger than the overhead created by multiprocessing during inter-process communication. If it is not the case, use multi_voxel_fit decorator.

Note: multiprocessing is using fork() on Linux when it starts a new child process. On Windows -- which does not support fork() -- multiprocessing is using the win32 API call CreateProcess. It creates an entirely new process from any given executable (which is quite slow)

Copy link
Contributor

@nilgoyette nilgoyette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really check the code because I'm not too sure what it'll do, but here are some minor things.

return SillyFit(self, data)

def predict(self, S0):
return np.ones(10) * S0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use numpy.full()? Or if S0 is an array, 10 * S0?

def test_parallel_voxel_fit():
voxel_fit(SillyParallelModel, SillyFit)
model = SillyParallelModel()
# Test with a mask and few processors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean 'processes'?


# Get non null index from mask
indexes = np.argwhere(mask)
# convert indexes to tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sometime start with a

# Majuscule
# miniscule

Could you please keep it standard? Here and elsewhere.


# Get number of processes
nb_processes = int(kwargs['nb_processes']) if 'nb_processes' in kwargs else cpu_count()
nb_processes = cpu_count() if nb_processes < 1 else nb_processes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clearer this way.

nb_processes = int(kwargs.get('nb_processes', '0'))
nb_processes = nb_processes if nb_processes >= 1 else cpu_count()

if nb_processes == 1:
return single_voxel_fit(model, data, *args, **kwargs)

# Get non null index from mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not singular, indexes or indices.

"""
Each pool process calls this initializer.
Load the array to be populated into
that process's global namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the \n here? The doc can also reach 79 characters.

numpy ndarray that you want to convert.
lock : boolean
controls the access to the shared array. When you shared
array is a read only access, you do not need lock. Otherwise,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When your shared array has a ... ? need a lock?

@skoudoro
Copy link
Member Author

skoudoro commented Feb 7, 2018

Thank you @nilgoyette for the review, I made the changes.

@skoudoro skoudoro closed this Apr 30, 2018
@skoudoro skoudoro deleted the parallel-voxel-fit branch April 30, 2018 17:07
@skoudoro skoudoro restored the parallel-voxel-fit branch April 30, 2018 17:07
@skoudoro skoudoro reopened this May 8, 2018
@pep8speaks
Copy link

pep8speaks commented May 8, 2018

Hello @skoudoro, Thank you for updating !

Line 23:81: E501 line too long (91 > 80 characters)

Line 99:15: E221 multiple spaces before operator
Line 104:81: E501 line too long (82 > 80 characters)
Line 128:1: W293 blank line contains whitespace

Comment last updated on August 21, 2018 at 16:52 Hours UTC

@codecov-io
Copy link

codecov-io commented May 8, 2018

Codecov Report

Merging #1418 into master will decrease coverage by 0.1%.
The diff coverage is 88.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1418      +/-   ##
==========================================
- Coverage   87.32%   87.22%   -0.11%     
==========================================
  Files         246      248       +2     
  Lines       31806    31936     +130     
  Branches     3450     3467      +17     
==========================================
+ Hits        27775    27855      +80     
- Misses       3210     3249      +39     
- Partials      821      832      +11
Impacted Files Coverage Δ
dipy/reconst/tests/test_mapmri.py 98.54% <ø> (-0.01%) ⬇️
dipy/core/parallel.py 100% <100%> (ø)
dipy/reconst/tests/test_cross_validation.py 100% <100%> (ø) ⬆️
dipy/reconst/tests/test_dsi_metrics.py 92% <100%> (+0.33%) ⬆️
dipy/reconst/tests/test_dsi.py 97.67% <100%> (+0.02%) ⬆️
dipy/tracking/life.py 97.8% <100%> (ø) ⬆️
dipy/reconst/forecast.py 91.7% <100%> (-0.52%) ⬇️
dipy/reconst/ivim.py 79.5% <100%> (ø) ⬆️
dipy/reconst/fwdti.py 83.01% <100%> (-11.33%) ⬇️
dipy/reconst/mapmri.py 90.02% <100%> (-0.26%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e1da86...d7845b5. Read the comment docs.

@nilgoyette
Copy link
Contributor

Oh, damn, I wasn't aware that you can't use np.full! I'll stop advising people to use it in DiPy.

@skoudoro
Copy link
Member Author

skoudoro commented May 8, 2018

or maybe, we should update our Numpy minimal version. I wonder what is the Numpy version of Centos

Copy link
Contributor

@arokem arokem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. I had a couple of comments.

Have you had a chance to profile this? Does it really lead to any improvement in a realistic test-case?

Also - could you please rebase this on master? I think that at least some of the test failures should be resolved now.

global shared_arr


def shm_as_ndarray(mp_array, shape=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about the naming here: there's nothing specific to spherical harmonics here, is there? Or does "shm" stand for something else here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, bad naming, shm for shared multiprocessing array. I will changed that

return np.asarray(result)


def ndarray_to_shm(arr, lock=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same here

# shared_arr = np.ctypeslib.as_array(arr_to_populate)
# shared_arr = shared_arr.reshape(shape)
shared_arr = shm_as_ndarray(arr_to_populate, shape)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have some unit tests for the functions above. If I understand correctly, right now they are tested only through the decorator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I will add them.

BTW, I still wonder if these functions (shm_as_array and ndarray_to_shm) are in the right place because they can be used for other purposes like peaks_from_model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in a new dipy.core.parallel module?

return a list of tuple(voxel index, model fitted instance)
"""
model, chunks = arguments
return [(idx, model.fit(shared_arr[idx]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check list in case of old memory issue.

raise ValueError("mask and data shape do not match")

# Get number of processes
nb_processes = int(kwargs.pop('nb_processes', '0'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the default should be None for all available cpus.

for i in range(1, len(chunks_spacing))]

# Create shared array
shared_arr_in = ndarray_to_mparray(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add in the comment that you make a single copy of the data here. If you make a copy.

copied shared array
"""

array1d = arr.ravel(order='A')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copy is not needed. Use directly arr.

return a list of tuple(voxel index, model fitted instance)
"""
model, chunks = arguments
return [(idx, model.fit(shared_arr[idx]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write it in a single line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the benefit. it will be harder to read and understand if I change that

2614.87695312, 2316.55371094, 2267.7722168])

noisy_multi = np.zeros((2, 2, 1, len(gtab.bvals)))
noisy_multi[0, 1, 0] = noisy_multi[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this break line is not very good for reading...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will change that

@skoudoro skoudoro removed this from the 1.0 milestone Jul 25, 2019
@skoudoro skoudoro linked an issue Jun 22, 2021 that may be closed by this pull request
@oesteban
Copy link
Contributor

oesteban commented Dec 2, 2022

Note: multiprocessing is using fork() on Linux when it starts a new child process. On Windows -- which does not support fork() -- multiprocessing is using the win32 API call CreateProcess. It creates an entirely new process from any given executable (which is quite slow)

I would expect an enormous overhead in either platform - would it be worth trying to parallelize using threading? (that said, I have externally tried with threads, and it doesn't seem to yield any gains, which is surprising at the very least)

EDIT: to add more information, it seems to me that some models (I have tried only with DKI) may have a time floor, so unless some operation that takes time (regardless of the amount of data being passed in) is parallelized, I don't really see any gains in passing more or less data.

@arokem
Copy link
Contributor

arokem commented Dec 3, 2022

See also discussion in nipreps/eddymotion#101. Indeed, some models (DTI and DKI, in particular) will not benefit from more parallelization, because we're already using multi-threaded matrix operations at the numpy level. This kind of parallelization would be most beneficial for models that can't utilize that kind of parallelization, or at least can't do that easily.

@arokem
Copy link
Contributor

arokem commented Dec 3, 2022

Incidentally, is this PR superseded by #2539?

@oesteban
Copy link
Contributor

oesteban commented Dec 4, 2022

I don't think that forking for every voxel will be of great benefit. An ITK approach (chunking long arrays of voxels into sections) may be more effective with subprocesses. I will check on the project linked by Ariel and report back.

@skoudoro
Copy link
Member Author

Incidentally, is this PR superseded by #2539?

Yes, closing in favor of #2539

@skoudoro skoudoro closed this Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multprocessing the multivoxel fit
8 participants