New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize tensor fitting #727

Merged
merged 16 commits into from Oct 17, 2015

Conversation

Projects
None yet
3 participants
@dimrozakis
Contributor

dimrozakis commented Oct 7, 2015

Speed up certain tensor fitting functions by vectorizing calculations.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 7, 2015

Hi @dimrozakis and thank you for your contribution. Can you give us an idea of what is the actual speedup before and after your change?

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 7, 2015

With a given set of unmasked dMRI dicom files (128, 128, 38, 39) I get the following measurements in master:

OLS fitting completed in 40.53 secs.
WLS fitting completed in 135.74 secs.

and the following in the branch of this PR (diff measurements is the absolute difference between tensor calculated in the master and the PR branch for the same exam with the same fitting method):

OLS fitting completed in 4.72 secs.
Diff min/mean/max: 0.0, 2.0502642549085253e-19, 1.474514954580286e-17
WLS fitting completed in 45.56 secs.
Diff min/mean/max: 0.0, 1.2172205038436504e-16, 7.6679114446864816e-15
@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 7, 2015

Oh nice! :) That's a good speedup 👍

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 7, 2015

The travis bot with numpy 1.6.0 and scipy 0.9.0 is failing. Could it be that some of the vectorization that you are using is available with more recent numpy versions?
https://travis-ci.org/nipy/dipy/builds/84175253

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 7, 2015

This:

  File "/home/travis/build/nipy/dipy/venv/lib/python2.7/site-packages/dipy/utils/arrfuncs.py", line 38, in pinv_vec

    u, s, v = np.linalg.svd(a, full_matrices=False)

  File "/home/travis/build/nipy/dipy/venv/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1271, in svd

    _assertRank2(a)

  File "/home/travis/build/nipy/dipy/venv/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 155, in _assertRank2

    two-dimensional' % len(a.shape)

numpy.linalg.linalg.LinAlgError: 3-dimensional array given. Array must be two-dimensional

seems to indicate that numpy.linalg.svd can't accept arrays with more than 2 dimensions in numpy 1.6.0. @Garyfallidis how do you suggest we handle this?

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 7, 2015

I guess an option would be to either set a condition based on installed numpy version or catch a numpy.linalg.LinAlgError exception and fall back to iterating numpy.linalg.svd over each 2D array contained in the ND array. I don't think that trying to implement a vectorized version of numpy.linalg.svd in the code base of dipy just so it can work properly with numpy 1.6.0 would be a great idea (or a practical one). @Garyfallidis, your thoughts?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 7, 2015

What you suggest seems reasonable (two first options). Because the dti part is a core part of the reconstruction module it would be great to hear what other people think about this change. Also we should look into using the svd part from scipy not numpy maybe this was already resolved there (just brainstorming here). Finally, maybe this vectorization strategy will be interesting in other parts of Dipy too. More tomorrow... keep hacking :)

@dimrozakis dimrozakis force-pushed the advantis-io:opt_tensor_fit branch 3 times, most recently from bc28349 to 0fc60d9 Oct 8, 2015

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 8, 2015

Finally all tests pass!

The main issue was that both numpy.linalg.eigh and numpy.linalg.svd were vectorized (able to run on arrays with more than 2 dimensions) in numpy version 1.8.0. I worked around this by iterating them for every nested 2D array, if installed numpy version is less than 1.8.0. Obviously in that case there's no increase in speed. @Garyfallidis I looked into the scipy implementations but it seems that none of scipy.linalg.svd, scipy.linalg.pinv or scipy.linalg.pinv2 are vectorized.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 9, 2015

Bravo Dimitri! :) Can the other core devs have a look at this and give some thumbs up? @MrBago, @arokem ?

@arokem

This comment has been minimized.

Member

arokem commented Oct 9, 2015

Will look, but probably not before early next week!

On Thu, Oct 8, 2015 at 5:09 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Bravo Dimitri! :) Can the other core devs have a look at this and give
some thumbs up? @MrBago https://github.com/MrBago, @arokem
https://github.com/arokem ?


Reply to this email directly or view it on GitHub
#727 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 9, 2015

Okay if @MrBago can help here and give me some thumbs up I can go ahead and merge.

@dimrozakis dimrozakis force-pushed the advantis-io:opt_tensor_fit branch from 86fe603 to 23a62be Oct 9, 2015

except KeyError:
raise ValueError('"' + str(fit_method) + '" is not a known fit '
'method, the fit method should either be a '
'function or one of the common fit methods')
self.fit_method = fit_method

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

Nice catch! Would you mind adding a test that fails without this change, so that we don't end up reverting without noticing?

"""Vectorized version of numpy.linalg.pinv"""
a = np.asarray(a)
if np.version.version.split('.') < ['1', '8']:

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

Use LooseVersion to compare versions. Like here: https://github.com/nipy/dipy/blob/master/dipy/core/optimize.py#L14

@@ -28,3 +28,45 @@ def as_native_array(arr):
return arr
return arr.byteswap().newbyteorder()
def pinv_vec(a, rcond=1e-15):
"""Vectorized version of numpy.linalg.pinv"""

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

Documentation should follow the numpy docstring standard: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt

def eigh(a, UPLO='L'):
"""Iterate over eigh if it doesn't support vectorized operation"""

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

Same here about the documentation - please follow the numpy docstring standard (Parameters, Returns, etc.)

def eigh(a, UPLO='L'):
"""Iterate over eigh if it doesn't support vectorized operation"""
a = np.asarray(a)
if a.ndim > 2 and np.version.version.split('.') < ['1', '8']:

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

Use LooseVersion here as well.

@@ -1151,6 +1158,28 @@ def predict(self, gtab, S0=1):
return tensor_prediction(self.model_params[0:12], gtab, S0=S0)
def tensor6_to_dtiparams(tensor, min_diffusivity=0):

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

This function is already implemented as eig_from_lo_tri

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

But it might be worth refactoring that function. For example, it currently does not take into account min_diffusivitiy

for i, item in enumerate(a):
result[i] = np.linalg.pinv(item, rcond)
return result.reshape(shape + (a.shape[2], a.shape[1]))

This comment has been minimized.

@arokem

arokem Oct 12, 2015

Member

For clarity, you might want to put the following in an else clause

@arokem

This comment has been minimized.

Member

arokem commented Oct 12, 2015

Very nice! Just a few comments.

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 12, 2015

@arokem thanks for the comments, I'll look into them.

Btw, I just noticed that both ols_fit_tensor and wls_fit_tensor claim in their docstrings to be returning an evals, evecs tuple, whereas in fact they return a single array with its last dimension equal to 12 (that's true both for the master branch and this PR). Is it the return type or the docstring that should change so that they both match? Perhaps have them return a 2-tuple and have TensorModel/TensorFit concatenate them if given as a 2-tuple?

@dimrozakis dimrozakis force-pushed the advantis-io:opt_tensor_fit branch 4 times, most recently from d8a6526 to 2cd37e0 Oct 12, 2015

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 12, 2015

@Garyfallidis, @arokem something is causing the tests for some python versions to fail. I thought it was me after pushing new commits, but that doesn't seem to be the case. After several attempts, I have ended up with the following situation in this branch: The second last commit 23a62be (3 days ago) passes all tests. The last commit (2cd37e0) however, which simply touches a file, fails most tests. For the failing tests, there's a dtype error. Any idea as to what may be causing this?

@arokem

This comment has been minimized.

Member

arokem commented Oct 12, 2015

I am not sure this has anything to do with it, but did you rebase on
master? By the way - you don't need to make a commit to trigger the testing
on travis. There's a "rerun" button there.

On Mon, Oct 12, 2015 at 2:30 PM, Dimitris Rozakis notifications@github.com
wrote:

@Garyfallidis https://github.com/Garyfallidis, @arokem
https://github.com/arokem something is causing the tests for some
python versions to fail. I thought it was me after pushing new commits, but
that doesn't seem to be the case. After several attempts, I have ended up
with the following situation in this branch: The second last commit
23a62be
23a62be
(3 days ago) passes all tests. The last commit (2cd37e0
2cd37e0)
however, which simply touches a file, fails most tests. For the failing
tests, there's a dtype error. Any idea as to what may be causing this?


Reply to this email directly or view it on GitHub
#727 (comment).

@arokem arokem reopened this Oct 15, 2015

@arokem

This comment has been minimized.

Member

arokem commented Oct 15, 2015

Sorry - wrong button :-)

dimrozakis added some commits Oct 4, 2015

Properly set fit_method attr in TensorModel
If fit_method was a callable, self.fit_method wasn't being set at all.
Vectorize decompose_tensor
It can accept a diffusion tensor with dimensions (..., 3, 3) and returns
eigenvalues and eigenvectors arrays with matching dimensions.
Vectorize ols_fit_method
Significantly improves speed
Vectorize wls_fit_method
Significantly improves speed
Use np.linalg.pinv in pinv_vec for old numpy
Function pinv_vec, the vectorized version of numpy.linalg.pinv requires a
required version of numpy.linalg.svd that was added in numpy version
1.8.0. If using an older numpy version, pinv_vec now falls back to just
iterating over np.linalg.pinv (instead of trying to use np.linalg.svd with 3+
dimensional arrays and fail with a np.linalg.LinAlgError).
Wrap np.linalg.eigh in dipy.utils.arrfuncs.eigh
Function np.linalg.eigh can accept input arrays with more than 2
dimensions after version 1.8.0. For newer version, the wrapper will just
call np.linalg.eigh, for older versions it will iterate np.linalg.eigh
for every 2D array contained in the ND array.

@dimrozakis dimrozakis force-pushed the advantis-io:opt_tensor_fit branch from 2cd37e0 to 64579d5 Oct 15, 2015

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 15, 2015

I have rebased on master and I think I have addressed all the issues raised in the comments above. @arokem care to take another look?

@arokem

This comment has been minimized.

Member

arokem commented Oct 15, 2015

Yep - looks good to me. If no one objects, I will merge this in some time this weekend. Thanks for the nice contribution!

Parameters
----------
a : (..., M, N) array_like

This comment has been minimized.

@Garyfallidis

Garyfallidis Oct 16, 2015

Member

Just a stylistic issue here. In Dipy we first write array_like and then the shape. Please change everywhere.

for i in range(4):
for j in range(4):
for k in range(4):
assert_array_almost_equal(_pinv[i, j, k],

This comment has been minimized.

@Garyfallidis

Garyfallidis Oct 16, 2015

Member

Cool! 👍 Good test!

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 16, 2015

Apart from this minor stylistic change this is ready to be merged.

@dimrozakis

This comment has been minimized.

Contributor

dimrozakis commented Oct 16, 2015

Just a stylistic issue here. In Dipy we first write array_like and then the shape. Please change everywhere.

@Garyfallidis I fixed this in 6ef3d4b.

arokem added a commit that referenced this pull request Oct 17, 2015

@arokem arokem merged commit d08a849 into nipy:master Oct 17, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 18, 2015

Bravo @dimrozakis ! Your PR is merged. Thank you for discovering and contributing to DIPY. And welcome! 👍 Hopefully see you in Athens ;)

@dimrozakis dimrozakis deleted the advantis-io:opt_tensor_fit branch Nov 2, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment