New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding parallel_voxel_fit decorator #1418

Open
wants to merge 21 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@skoudoro
Member

skoudoro commented Feb 6, 2018

This PR is an updated version of PR #1135 and #642. I have simplified and refactored the code of @MrBago and @sahmed95.

You just need to add this decorator (@parallel_voxel_fit ) to your model for splitting your data into small chunks and run them through several processors.

To define the number of processors, you just need to do: model.fit(data, nb_processes=6)

Currently, all the code is in reconst/multi_voxe.py but I suppose that this is not the right place and any ideas are welcome.

Can you have a look @nilgoyette @MrBago @arokem @sahmed95 @Garyfallidis?

Thanks!

Note: In order for multiprocessing to be beneficial, the fitting task needs to be significant. Significant means bigger than the overhead created by multiprocessing during inter-process communication. If it is not the case, use multi_voxel_fit decorator.

Note: multiprocessing is using fork() on Linux when it starts a new child process. On Windows -- which does not support fork() -- multiprocessing is using the win32 API call CreateProcess. It creates an entirely new process from any given executable (which is quite slow)

@nilgoyette

I didn't really check the code because I'm not too sure what it'll do, but here are some minor things.

return SillyFit(self, data)
def predict(self, S0):
return np.ones(10) * S0

This comment has been minimized.

@nilgoyette

nilgoyette Feb 7, 2018

Contributor

Why not use numpy.full()? Or if S0 is an array, 10 * S0?

def test_parallel_voxel_fit():
voxel_fit(SillyParallelModel, SillyFit)
model = SillyParallelModel()
# Test with a mask and few processors

This comment has been minimized.

@nilgoyette

nilgoyette Feb 7, 2018

Contributor

Do you mean 'processes'?

# Get non null index from mask
indexes = np.argwhere(mask)
# convert indexes to tuple

This comment has been minimized.

@nilgoyette

nilgoyette Feb 7, 2018

Contributor

You sometime start with a

# Majuscule
# miniscule

Could you please keep it standard? Here and elsewhere.

# Get number of processes
nb_processes = int(kwargs['nb_processes']) if 'nb_processes' in kwargs else cpu_count()
nb_processes = cpu_count() if nb_processes < 1 else nb_processes

This comment has been minimized.

@nilgoyette

nilgoyette Feb 7, 2018

Contributor

I think it's clearer this way.

nb_processes = int(kwargs.get('nb_processes', '0'))
nb_processes = nb_processes if nb_processes >= 1 else cpu_count()
if nb_processes == 1:
return single_voxel_fit(model, data, *args, **kwargs)
# Get non null index from mask

This comment has been minimized.

@nilgoyette

nilgoyette Feb 7, 2018

Contributor

Not singular, indexes or indices.

"""
Each pool process calls this initializer.
Load the array to be populated into
that process's global namespace

This comment has been minimized.

@nilgoyette

nilgoyette Feb 7, 2018

Contributor

Why the \n here? The doc can also reach 79 characters.

numpy ndarray that you want to convert.
lock : boolean
controls the access to the shared array. When you shared
array is a read only access, you do not need lock. Otherwise,

This comment has been minimized.

@nilgoyette

nilgoyette Feb 7, 2018

Contributor

When your shared array has a ... ? need a lock?

@skoudoro

This comment has been minimized.

Member

skoudoro commented Feb 7, 2018

Thank you @nilgoyette for the review, I made the changes.

@skoudoro skoudoro closed this Apr 30, 2018

@skoudoro skoudoro deleted the skoudoro:parallel-voxel-fit branch Apr 30, 2018

@skoudoro skoudoro restored the skoudoro:parallel-voxel-fit branch Apr 30, 2018

@skoudoro skoudoro reopened this May 8, 2018

@pep8speaks

This comment has been minimized.

pep8speaks commented May 8, 2018

Hello @skoudoro, Thank you for updating !

Line 23:81: E501 line too long (91 > 80 characters)

Line 99:15: E221 multiple spaces before operator
Line 104:81: E501 line too long (82 > 80 characters)
Line 128:1: W293 blank line contains whitespace

Comment last updated on August 21, 2018 at 16:52 Hours UTC
@codecov-io

This comment has been minimized.

codecov-io commented May 8, 2018

Codecov Report

Merging #1418 into master will decrease coverage by 0.1%.
The diff coverage is 88.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1418      +/-   ##
==========================================
- Coverage   87.32%   87.22%   -0.11%     
==========================================
  Files         246      248       +2     
  Lines       31806    31936     +130     
  Branches     3450     3467      +17     
==========================================
+ Hits        27775    27855      +80     
- Misses       3210     3249      +39     
- Partials      821      832      +11
Impacted Files Coverage Δ
dipy/reconst/tests/test_mapmri.py 98.54% <ø> (-0.01%) ⬇️
dipy/reconst/csdeconv.py 89.93% <100%> (ø) ⬆️
dipy/reconst/tests/test_dsi_deconv.py 96.22% <100%> (+0.07%) ⬆️
dipy/reconst/shore.py 85.15% <100%> (ø) ⬆️
dipy/reconst/tests/test_cross_validation.py 100% <100%> (ø) ⬆️
dipy/reconst/fwdti.py 83.01% <100%> (-11.33%) ⬇️
dipy/core/parallel.py 100% <100%> (ø)
dipy/tracking/life.py 97.8% <100%> (ø) ⬆️
dipy/reconst/ivim.py 79.5% <100%> (ø) ⬆️
dipy/reconst/dsi.py 81.42% <100%> (+0.1%) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e1da86...d7845b5. Read the comment docs.

@nilgoyette

This comment has been minimized.

Contributor

nilgoyette commented May 8, 2018

Oh, damn, I wasn't aware that you can't use np.full! I'll stop advising people to use it in DiPy.

@skoudoro

This comment has been minimized.

Member

skoudoro commented May 8, 2018

or maybe, we should update our Numpy minimal version. I wonder what is the Numpy version of Centos

@arokem

Generally looks good. I had a couple of comments.

Have you had a chance to profile this? Does it really lead to any improvement in a realistic test-case?

Also - could you please rebase this on master? I think that at least some of the test failures should be resolved now.

global shared_arr
def shm_as_ndarray(mp_array, shape=None):

This comment has been minimized.

@arokem

arokem Jun 12, 2018

Member

I wonder about the naming here: there's nothing specific to spherical harmonics here, is there? Or does "shm" stand for something else here?

This comment has been minimized.

@skoudoro

skoudoro Jun 12, 2018

Member

right, bad naming, shm for shared multiprocessing array. I will changed that

return np.asarray(result)
def ndarray_to_shm(arr, lock=False):

This comment has been minimized.

@arokem

arokem Jun 12, 2018

Member

And same here

# shared_arr = np.ctypeslib.as_array(arr_to_populate)
# shared_arr = shared_arr.reshape(shape)
shared_arr = shm_as_ndarray(arr_to_populate, shape)

This comment has been minimized.

@arokem

arokem Jun 12, 2018

Member

I would love to have some unit tests for the functions above. If I understand correctly, right now they are tested only through the decorator.

This comment has been minimized.

@skoudoro

skoudoro Jun 12, 2018

Member

That's true, I will add them.

BTW, I still wonder if these functions (shm_as_array and ndarray_to_shm) are in the right place because they can be used for other purposes like peaks_from_model

This comment has been minimized.

@arokem

arokem Jun 14, 2018

Member

Maybe in a new dipy.core.parallel module?

@skoudoro skoudoro force-pushed the skoudoro:parallel-voxel-fit branch from aee0aa0 to 554be01 Jun 12, 2018

@skoudoro skoudoro force-pushed the skoudoro:parallel-voxel-fit branch from ba0063d to 1bfb729 Jul 27, 2018

return a list of tuple(voxel index, model fitted instance)
"""
model, chunks = arguments
return [(idx, model.fit(shared_arr[idx]))

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 8, 2018

Member

Check list in case of old memory issue.

raise ValueError("mask and data shape do not match")
# Get number of processes
nb_processes = int(kwargs.pop('nb_processes', '0'))

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 8, 2018

Member

Maybe the default should be None for all available cpus.

for i in range(1, len(chunks_spacing))]
# Create shared array
shared_arr_in = ndarray_to_mparray(data)

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 8, 2018

Member

Add in the comment that you make a single copy of the data here. If you make a copy.

copied shared array
"""
array1d = arr.ravel(order='A')

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 8, 2018

Member

This copy is not needed. Use directly arr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment