Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add support for ArraySequence in length function #1076

Merged
merged 6 commits into from
Jan 10, 2017

Conversation

MarcCote
Copy link
Contributor

@MarcCote MarcCote commented Jun 9, 2016

This small PR is the first of a series of PRs aiming to support Nibabel's ArraySequence objects.

Specifically, this PR improves dipy.tracking.streamlinespeed.length to support ArraySequence objects. This gives pretty good speedup even over the current Cython version (all speedups are in respect to the Python implementation):

Timing length() with 20,000 streamlines.
Python time: 3.9sec
Cython time: 0.46sec
Speed up of 8.41x
Cython time (ArraySequence): 0.08sec
Speed up of 48.38x

Edit: of course this needs the master branch of Nibabel or eventually v2.1.

@MarcCote MarcCote changed the title Add support for ArraySequence in length function ENH: Add support for ArraySequence in length function Jun 9, 2016
@MarcCote
Copy link
Contributor Author

@arokem @Garyfallidis: I guess we should support older NiBabel's versions, right? We could add a check whether ArraySequence class exists or not like we do for vtk (the have_vtk variable/flag).

@arokem
Copy link
Contributor

arokem commented Jun 14, 2016

Yes. We'll want to support older versions of nibabel for now. I think that we should support backwards for at least a year after these new features are in a nibabel release. Possibly even longer -- @Garyfallidis : what are your thoughts on duration of back-compatibility in this case?

Your method seems reasonable to me, but you can also directly query the nibabel version string with LooseVersion (see https://github.com/nipy/dipy/blob/master/dipy/core/optimize.py#L14).

@arokem
Copy link
Contributor

arokem commented Jun 14, 2016

That all said, it's a good thing to start integrating this! Users living on the cutting edge should be able to start reaping the benefits of all your hard work!

@MarcCote
Copy link
Contributor Author

Thanks @arokem, I will definitively have a look at the LooseVersion. It seems to be the right approach.

@MarcCote
Copy link
Contributor Author

I added a check for Nibabel's verison but currently Travis only installs Nibabel 2.0.2 and not the master. @arokem do you know an easy way to add new Travis bots that will use Nibabel's master instead? Maybe one for Python 2 and another one for Python 3.

@arokem
Copy link
Contributor

arokem commented Jun 16, 2016

I think that the best way would be to have a nibabel release, before merging this work into dipy.

@matthew-brett : any plans for upcoming nibabel release?

@matthew-brett
Copy link
Contributor

I plan to do a release in the next few weeks.

@MarcCote
Copy link
Contributor Author

@arokem but we still need additional Travis bots to test that Dipy is working properly with an older Nibabel right?

@arokem
Copy link
Contributor

arokem commented Jun 16, 2016

Yes, we already have a bot devoted to testing old versions of our dependencies: https://github.com/nipy/dipy/blob/master/.travis.yml#L32, which specifies the oldest version we support of these different libraries.

@coveralls
Copy link

coveralls commented Jun 16, 2016

Coverage Status

Coverage decreased (-0.5%) to 82.389% when pulling cadf229 on MarcCote:enh_support_arraysequence_for_length into 3abb871 on nipy:master.

@codecov-io
Copy link

codecov-io commented Jun 16, 2016

Current coverage is 85.87% (diff: 100%)

Merging #1076 into master will increase coverage by 0.01%

@@             master      #1076   diff @@
==========================================
  Files           214        214          
  Lines         25862      25882    +20   
  Methods           0          0          
  Messages          0          0          
  Branches       2642       2642          
==========================================
+ Hits          22206      22226    +20   
  Misses         3003       3003          
  Partials        653        653          

Powered by Codecov. Last update d489c86...a531a7a

@Garyfallidis
Copy link
Contributor

Hi all,

Because the new nibabel release will be solving a critical issue with the memory management of the streamlines generated or loaded I don't think that we can do anything else than changing our minimum requirements to nibabel to 2.1 immediately after nibabel gets released.

Remember that in what we currently have someone who has generated 5GBytes of streamlines as lists of numpy arrays which is currently the only option will be actually using 10+GBytes of memory. This memory will stay around even if the user deletes the streamlines making the program extremely hungry for resources. We need to inform our users about that and ask them to update nibabel. The new Streamline API is designed in a way to completely replace our current representation of streamlines without changing user experience.

If we continue supporting older versions that will mean that we will still carry the critical memory issue for long time as the memory will not be freed properly upon loading the data with nibabel.

I don't think this is in our hands any more we have to change our minimum requirements asap. People these days generate bigger and bigger tractograms. We need to be careful with memory management.

In summary, I think it will be a bad idea to continue supporting older versions of nibabel than 2.1 (next release).

We need also to make sure that the new Streamlines API will replace all existing usages of streamlines as lists of arrays. A true call for action!

Finally, I suggest for this PR to add a couple of travis bots to use the master version of nibabel until nibabel get's released. That will help us move faster with the necessary API changes inside DIPY.

@arokem
Copy link
Contributor

arokem commented Jun 17, 2016

What you propose seems problematic to me for a few reasons (in increasing
order of realism):

  1. There might be people relying on Dipy that are not ready to put in the
    effort to move forward at this point.
  2. New functionality on nibabel might still need to be battle-tested before
    we're ready to adopt it whole hog. What do we do if the API on nibabel
    changes, because we discover unanticipated problems with the current API?
    For this reason, I would also think that we need at least one more release
    supporting only the old API, before we make a release that incorporates
    this API. In other words, slow and steady with this PR.
  3. I don't believe we have the resources to respond to your "call to
    action". I think that realistically it will take time until we can move
    everything here to the new API. We are mostly really busy with GSoC
    projects and with the PRs that are already in the review process. To wit,
    even tiny PRs proposing only a few lines of code changes (Drop26 #1075, Faster dti odf #1064,
    Conda install #1052, BF - peaks_from_model with nbr_processes <= 0 #1046, RF - move direction getters to dipy/direction/ #1042, Slicer fix #944, are a few examples) can languish for months
    before they get reviewed and merged.

I understand the importance of moving forward with this, and I would
definitely want to make this an option for users who can move forward
quickly with their own code (some of my collaborators would definitely
enjoy this!), but I think that we need to be consider what effort we can
realistically do, and what the potential pitfalls are.

How about the following plan: it's been a while since we made a release, so
as soon as we can merge a few of the long-languishing PRs, including
#740...), we should make a release. Then we can move ahead with GSoC,
merging that work towards the beginning of fall and making another release.
Then, in the fall, we start working on this forward thrust, taking a long
release cycle to focus on this and the ensuing changes and really testing
things out on our own master branch, and figuring the kinks out. If that
all goes well, we should be able to set more radical requirements in early
2017 (such as dropping support for nibabel < 2.1), with sufficient time for
people to get used to these ideas.

On Thu, Jun 16, 2016 at 7:53 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Hi all,

Because the new nibabel release will be solving a critical issue with the
memory management of the streamlines generated or loaded I don't think that
we can do anything else than changing our minimum requirements to nibabel
to 2.1 immediately after nibabel gets released.

Remember that in what we currently have someone who has generated 5GBytes
of streamlines as lists of numpy arrays which is currently the only option
will be actually using 10+GBytes of memory. This memory will stay around
even if the user deletes the streamlines making the program extremely
hungry for resources. We need to inform our users about that and ask them
to update nibabel. The new Streamline API is designed in a way to
completely replace our current representation of streamlines without
changing user experience.

If we continue supporting older versions that will mean that we will still
carry the critical memory issue for long time as the memory will not be
freed properly upon loading the data with nibabel.

I don't think this is in our hands any more we have to change our minimum
requirements asap. People these days generate bigger and bigger
tractograms. We need to be careful with memory management.

In summary, I think it will be a bad idea to continue supporting older
versions of nibabel than 2.1 (next release).

We need also to make sure that the new Streamlines API will replace all
existing usages of streamlines as lists of arrays. A true call for action!

Finally, I suggest for this PR to add a couple of travis bots to use the
master version of nibabel until nibabel get's released. That will help us
move faster with the necessary API changes inside DIPY.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1076 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAHPNv5Gu1hAkn2DytJ5LHG6hBVSPAFOks5qMgwYgaJpZM4IyG8o
.

@MarcCote
Copy link
Contributor Author

I restarted the build but Travis still doesn't take the latest Nibabel 2.1 release. @matthew-brett any idea why? See https://travis-ci.org/nipy/dipy/jobs/138160759#L211

@Garyfallidis
Copy link
Contributor

Hey @matthew-brett and @arokem something does not allow Travis to use latest released version of nibabel which is 2.1. Do you know why this is happening?

@MarcCote
Copy link
Contributor Author

MarcCote commented Nov 2, 2016

Should I used this https://github.com/matthew-brett/travis-wheel-builder to update Travis wheels or it has nothing to do with that?

@matthew-brett
Copy link
Contributor

Yes, that is what you'd need to do - the build script is only looking for the wheels in http://travis-wheels.scikit-image.org/ - so you need to build, upload wheels for new versions.

@MarcCote
Copy link
Contributor Author

MarcCote commented Nov 2, 2016

@matthew-brett I'm not sure of the procedure. I forked your repo, made some modification (TO_BUILD="nibabel") and push the changes on my fork (that has Travis enable) but Travis is failing to deploy:
https://travis-ci.org/MarcCote/travis-wheel-builder/jobs/172778101#L397

Should I make a PR to your repo?
Edit: Yes, I needed to make a PR! 👍

@MarcCote MarcCote force-pushed the enh_support_arraysequence_for_length branch from cadf229 to 70c2235 Compare November 2, 2016 23:54
@coveralls
Copy link

coveralls commented Nov 3, 2016

Coverage Status

Coverage increased (+0.6%) to 87.993% when pulling 70c2235 on MarcCote:enh_support_arraysequence_for_length into 0548479 on nipy:master.

@MarcCote
Copy link
Contributor Author

MarcCote commented Nov 3, 2016

The decrease in coverage is due to the benchmark not being executed. There is also missing coverage for other benchmarks in Dipy. See PR #1141.

@MarcCote MarcCote force-pushed the enh_support_arraysequence_for_length branch from 70c2235 to 34e1814 Compare November 3, 2016 14:43
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 87.993% when pulling 34e1814 on MarcCote:enh_support_arraysequence_for_length into e8eeb70 on nipy:master.

@arokem
Copy link
Contributor

arokem commented Nov 3, 2016

Looks like that's fine now, with the merge of #1141. You rebased?

@MarcCote
Copy link
Contributor Author

MarcCote commented Nov 3, 2016

@arokem yep.

@arokem
Copy link
Contributor

arokem commented Dec 13, 2016

Sorry - this has languished here for a while. Looks ready to go from my point of view. @MarcCote : anything more need to be done here before I merge?

@MarcCote
Copy link
Contributor Author

It's all good.
Unless you have some suggestions about how ArraySequences are handled (that is in its own function) in https://github.com/nipy/dipy/pull/1076/files#diff-7bb0bef654a39db09e6f7a55d43b41ba
because there will be more coming (one for compress and set_number_of_point).

@arokem
Copy link
Contributor

arokem commented Dec 16, 2016

I guess we can get rid of the checks for nibabel < 2.1, if we go ahead with #1157 as is.

@MarcCote
Copy link
Contributor Author

That would be really nice. Indeed :).

Copy link
Contributor

@arokem arokem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can remove all this stuff now :-)


import nibabel

NIBABEL_LESS_2_1 = LooseVersion(nibabel.__version__) < '2.1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is no longer necessary

from dipy.tracking.streamline import (set_number_of_points,
length,
compress_streamlines)
from dipy.tracking.tests.test_streamline import (set_number_of_points_python,
length_python,
compress_streamlines_python)

if not NIBABEL_LESS_2_1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

assert_array_equal([length_python(s) for s in DATA["streamlines"]],
length(DATA["streamlines"]))

if not NIBABEL_LESS_2_1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

from libc.math cimport sqrt

from dipy.tracking import NIBABEL_LESS_2_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as well

True
>>> length([])
0.0
>>> length(np.array([[1, 2, 3]]))
0.0

'''
if not NIBABEL_LESS_2_1 and isinstance(streamlines, Streamlines):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this

@@ -11,9 +11,10 @@
from numpy.testing import (assert_array_equal, assert_array_almost_equal,
assert_raises, run_module_suite)

from dipy.tracking import NIBABEL_LESS_2_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this

@@ -24,6 +25,9 @@
orient_by_rois,
values_from_volume)

if not NIBABEL_LESS_2_1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this


for i, s in enumerate(streamlines_64bit):
length_streamline_python = length_python(s)
assert_array_almost_equal(length_streamlines_cython[i],
length_streamline_python)

# ArraySequence
if not NIBABEL_LESS_2_1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And all of this

@arokem
Copy link
Contributor

arokem commented Jan 7, 2017

Awesome. Thanks for doing this. I will let Travis do its thing, but if all goes well, and if no one objects, I'll merge this on Monday.

from dipy.data import get_data
from nibabel import trackvis as tv

from dipy.tracking import NIBABEL_LESS_2_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops - missed this one!

DATA['streamlines'] = generate_streamlines(nb_streamlines,
min_nb_points, max_nb_points,
rng=rng)
if not NIBABEL_LESS_2_1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one too!

@MarcCote MarcCote force-pushed the enh_support_arraysequence_for_length branch from 930c73d to 7a78481 Compare January 7, 2017 23:44
@MarcCote
Copy link
Contributor Author

MarcCote commented Jan 7, 2017

Yep, I missed two. I also rebased just to be sure.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.009%) to 88.397% when pulling a531a7a on MarcCote:enh_support_arraysequence_for_length into d489c86 on nipy:master.

@arokem arokem merged commit e414987 into dipy:master Jan 10, 2017
@MarcCote MarcCote deleted the enh_support_arraysequence_for_length branch January 11, 2017 20:48
@arokem
Copy link
Contributor

arokem commented Jan 24, 2017

@MarcCote -- the following bot failure might be related to this PR: http://nipy.bic.berkeley.edu/builders/dipy-bdist64-27/builds/113/steps/shell_12/logs/stdio

Do you understand what's going on there?

@MarcCote
Copy link
Contributor Author

@arokem oops, I didn't see your last comment. It probably has to do with the fact that we are now using np.intp for ArraySequence._lengths and ._offsets and that wasn't the case when I made this PR in June :-/. I'll make a patch.

@MarcCote
Copy link
Contributor Author

@arokem @matthew-brett
Changing the signature of c_arclengths_from_arraysequence (https://github.com/nipy/dipy/blob/master/dipy/tracking/streamlinespeed.pyx#L39) like this

cdef void c_arclengths_from_arraysequence(Streamline points,
                                          np.npy_intp[:] offsets, np.npy_intp[:] lengths,
                                          double[:] arclengths) nogil:
...

doesn't seem to work on my linux machine (64bits) :/.

Now, I'm getting the same kind of error as the Windows machine.

ERROR: dipy.tracking.tests.test_streamline.test_length
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/marc/.local/env/p2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/data/research/src/dipy/dipy/tracking/tests/test_streamline.py", line 374, in test_length
    length_streamline_arrseq = length(Streamlines([streamline]))
  File "dipy/tracking/streamlinespeed.pyx", line 109, in dipy.tracking.streamlinespeed.length (dipy/tracking/streamlinespeed.c:2607)
    streamlines._offsets,
ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

It seems np.npy_intp (cython identifier) is not the same as np.intp. Am I missing something? Maybe I should try reinstalling Cython and numpy (I'm using virtualenv, maybe the two libraries are out of sync with each of other)?

Here' the definition of a ArraySequence
https://github.com/MarcCote/nibabel/blob/master/nibabel/streamlines/array_sequence.py#L77

@matthew-brett
Copy link
Contributor

I think Cython's intp is always the same as numpy's intp, but intp might not be the same as int64 (for example). On my Mac, the offsets for the test are int64 data type, rather than intp - could that be the problem?

@MarcCote
Copy link
Contributor Author

On my machine intp is int64 but Cython's np.npy_intp seems to be int32 :-/.

In [10]: np.intp
Out[10]: numpy.int64

@matthew-brett
Copy link
Contributor

Hmm, yes, it works correctly for me, but only for Cython >= 0.25.1 .

This is not relevant here, I think, but it's also safer to use libc.stdlib for the imports of malloc, free:

diff --git a/dipy/tracking/streamlinespeed.pyx b/dipy/tracking/streamlinespeed.pyx
index 06af294..3006b73 100644
--- a/dipy/tracking/streamlinespeed.pyx
+++ b/dipy/tracking/streamlinespeed.pyx
@@ -4,6 +4,7 @@
 import cython
 import numpy as np
 from libc.math cimport sqrt
+from libc.stdlib cimport malloc, free
 
 cimport numpy as np
 
@@ -13,11 +14,6 @@ from dipy.tracking import Streamlines
 cdef extern from "dpy_math.h" nogil:
     bint dpy_isnan(double x)
 
-cdef extern from "stdlib.h" nogil:
-    ctypedef unsigned long size_t
-    void free(void * ptr)
-    void * malloc(size_t size)
-
 
 cdef double c_length(Streamline streamline) nogil:
     cdef:
@@ -37,7 +33,8 @@ cdef double c_length(Streamline streamline) nogil:
 
 
 cdef void c_arclengths_from_arraysequence(Streamline points,
-                                          long[:] offsets, long[:] lengths,
+                                          np.npy_intp[:] offsets,
+                                          np.npy_intp[:] lengths,
                                           double[:] arclengths) nogil:
     cdef:
         np.npy_intp i, j, k

@MarcCote
Copy link
Contributor Author

MarcCote commented Apr 19, 2017

Ok, I've reinstalled Cython 0.21 (the version I currently have), did a make clear and a make ext. The tests are passing now on my machine. I'll make a PR.
EDIT: apparently it seems I didn't make a full make clear :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants