ENH: Add support for ArraySequence in `length` function #1076

MarcCote · 2016-06-09T15:37:03Z

This small PR is the first of a series of PRs aiming to support Nibabel's ArraySequence objects.

Specifically, this PR improves dipy.tracking.streamlinespeed.length to support ArraySequence objects. This gives pretty good speedup even over the current Cython version (all speedups are in respect to the Python implementation):

Timing length() with 20,000 streamlines.
Python time: 3.9sec
Cython time: 0.46sec
Speed up of 8.41x
Cython time (ArraySequence): 0.08sec
Speed up of 48.38x

Edit: of course this needs the master branch of Nibabel or eventually v2.1.

MarcCote · 2016-06-14T14:16:56Z

@arokem @Garyfallidis: I guess we should support older NiBabel's versions, right? We could add a check whether ArraySequence class exists or not like we do for vtk (the have_vtk variable/flag).

arokem · 2016-06-14T14:29:52Z

Yes. We'll want to support older versions of nibabel for now. I think that we should support backwards for at least a year after these new features are in a nibabel release. Possibly even longer -- @Garyfallidis : what are your thoughts on duration of back-compatibility in this case?

Your method seems reasonable to me, but you can also directly query the nibabel version string with LooseVersion (see https://github.com/nipy/dipy/blob/master/dipy/core/optimize.py#L14).

arokem · 2016-06-14T14:30:38Z

That all said, it's a good thing to start integrating this! Users living on the cutting edge should be able to start reaping the benefits of all your hard work!

MarcCote · 2016-06-14T15:00:15Z

Thanks @arokem, I will definitively have a look at the LooseVersion. It seems to be the right approach.

MarcCote · 2016-06-16T14:59:11Z

I added a check for Nibabel's verison but currently Travis only installs Nibabel 2.0.2 and not the master. @arokem do you know an easy way to add new Travis bots that will use Nibabel's master instead? Maybe one for Python 2 and another one for Python 3.

arokem · 2016-06-16T15:32:51Z

I think that the best way would be to have a nibabel release, before merging this work into dipy.

@matthew-brett : any plans for upcoming nibabel release?

matthew-brett · 2016-06-16T16:43:54Z

I plan to do a release in the next few weeks.

MarcCote · 2016-06-16T18:29:24Z

@arokem but we still need additional Travis bots to test that Dipy is working properly with an older Nibabel right?

arokem · 2016-06-16T18:33:15Z

Yes, we already have a bot devoted to testing old versions of our dependencies: https://github.com/nipy/dipy/blob/master/.travis.yml#L32, which specifies the oldest version we support of these different libraries.

coveralls · 2016-06-16T18:34:22Z

Coverage decreased (-0.5%) to 82.389% when pulling cadf229 on MarcCote:enh_support_arraysequence_for_length into 3abb871 on nipy:master.

codecov-io · 2016-06-16T18:46:01Z

Current coverage is 85.87% (diff: 100%)

Merging #1076 into master will increase coverage by 0.01%

@@             master      #1076   diff @@
==========================================
  Files           214        214          
  Lines         25862      25882    +20   
  Methods           0          0          
  Messages          0          0          
  Branches       2642       2642          
==========================================
+ Hits          22206      22226    +20   
  Misses         3003       3003          
  Partials        653        653

Powered by Codecov. Last update d489c86...a531a7a

Garyfallidis · 2016-06-17T02:53:12Z

Hi all,

Because the new nibabel release will be solving a critical issue with the memory management of the streamlines generated or loaded I don't think that we can do anything else than changing our minimum requirements to nibabel to 2.1 immediately after nibabel gets released.

Remember that in what we currently have someone who has generated 5GBytes of streamlines as lists of numpy arrays which is currently the only option will be actually using 10+GBytes of memory. This memory will stay around even if the user deletes the streamlines making the program extremely hungry for resources. We need to inform our users about that and ask them to update nibabel. The new Streamline API is designed in a way to completely replace our current representation of streamlines without changing user experience.

If we continue supporting older versions that will mean that we will still carry the critical memory issue for long time as the memory will not be freed properly upon loading the data with nibabel.

I don't think this is in our hands any more we have to change our minimum requirements asap. People these days generate bigger and bigger tractograms. We need to be careful with memory management.

In summary, I think it will be a bad idea to continue supporting older versions of nibabel than 2.1 (next release).

We need also to make sure that the new Streamlines API will replace all existing usages of streamlines as lists of arrays. A true call for action!

Finally, I suggest for this PR to add a couple of travis bots to use the master version of nibabel until nibabel get's released. That will help us move faster with the necessary API changes inside DIPY.

arokem · 2016-06-17T14:36:17Z

What you propose seems problematic to me for a few reasons (in increasing
order of realism):

There might be people relying on Dipy that are not ready to put in the
effort to move forward at this point.
New functionality on nibabel might still need to be battle-tested before
we're ready to adopt it whole hog. What do we do if the API on nibabel
changes, because we discover unanticipated problems with the current API?
For this reason, I would also think that we need at least one more release
supporting only the old API, before we make a release that incorporates
this API. In other words, slow and steady with this PR.
I don't believe we have the resources to respond to your "call to
action". I think that realistically it will take time until we can move
everything here to the new API. We are mostly really busy with GSoC
projects and with the PRs that are already in the review process. To wit,
even tiny PRs proposing only a few lines of code changes (Drop26 #1075, Faster dti odf #1064,
Conda install #1052, BF - peaks_from_model with nbr_processes <= 0 #1046, RF - move direction getters to dipy/direction/ #1042, Slicer fix #944, are a few examples) can languish for months
before they get reviewed and merged.

I understand the importance of moving forward with this, and I would
definitely want to make this an option for users who can move forward
quickly with their own code (some of my collaborators would definitely
enjoy this!), but I think that we need to be consider what effort we can
realistically do, and what the potential pitfalls are.

How about the following plan: it's been a while since we made a release, so
as soon as we can merge a few of the long-languishing PRs, including
#740...), we should make a release. Then we can move ahead with GSoC,
merging that work towards the beginning of fall and making another release.
Then, in the fall, we start working on this forward thrust, taking a long
release cycle to focus on this and the ensuing changes and really testing
things out on our own master branch, and figuring the kinks out. If that
all goes well, we should be able to set more radical requirements in early
2017 (such as dropping support for nibabel < 2.1), with sufficient time for
people to get used to these ideas.

On Thu, Jun 16, 2016 at 7:53 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Hi all,

Because the new nibabel release will be solving a critical issue with the
memory management of the streamlines generated or loaded I don't think that
we can do anything else than changing our minimum requirements to nibabel
to 2.1 immediately after nibabel gets released.

Remember that in what we currently have someone who has generated 5GBytes
of streamlines as lists of numpy arrays which is currently the only option
will be actually using 10+GBytes of memory. This memory will stay around
even if the user deletes the streamlines making the program extremely
hungry for resources. We need to inform our users about that and ask them
to update nibabel. The new Streamline API is designed in a way to
completely replace our current representation of streamlines without
changing user experience.

If we continue supporting older versions that will mean that we will still
carry the critical memory issue for long time as the memory will not be
freed properly upon loading the data with nibabel.

I don't think this is in our hands any more we have to change our minimum
requirements asap. People these days generate bigger and bigger
tractograms. We need to be careful with memory management.

In summary, I think it will be a bad idea to continue supporting older
versions of nibabel than 2.1 (next release).

We need also to make sure that the new Streamlines API will replace all
existing usages of streamlines as lists of arrays. A true call for action!

Finally, I suggest for this PR to add a couple of travis bots to use the
master version of nibabel until nibabel get's released. That will help us
move faster with the necessary API changes inside DIPY.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1076 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAHPNv5Gu1hAkn2DytJ5LHG6hBVSPAFOks5qMgwYgaJpZM4IyG8o
.

MarcCote · 2016-10-28T01:39:14Z

I restarted the build but Travis still doesn't take the latest Nibabel 2.1 release. @matthew-brett any idea why? See https://travis-ci.org/nipy/dipy/jobs/138160759#L211

Garyfallidis · 2016-11-02T17:01:35Z

Hey @matthew-brett and @arokem something does not allow Travis to use latest released version of nibabel which is 2.1. Do you know why this is happening?

MarcCote · 2016-11-02T17:12:52Z

Should I used this https://github.com/matthew-brett/travis-wheel-builder to update Travis wheels or it has nothing to do with that?

matthew-brett · 2016-11-02T17:17:28Z

Yes, that is what you'd need to do - the build script is only looking for the wheels in http://travis-wheels.scikit-image.org/ - so you need to build, upload wheels for new versions.

MarcCote · 2016-11-02T22:23:47Z

@matthew-brett I'm not sure of the procedure. I forked your repo, made some modification (TO_BUILD="nibabel") and push the changes on my fork (that has Travis enable) but Travis is failing to deploy:
https://travis-ci.org/MarcCote/travis-wheel-builder/jobs/172778101#L397

Should I make a PR to your repo?
Edit: Yes, I needed to make a PR! 👍

coveralls · 2016-11-03T00:56:00Z

Coverage increased (+0.6%) to 87.993% when pulling 70c2235 on MarcCote:enh_support_arraysequence_for_length into 0548479 on nipy:master.

MarcCote · 2016-11-03T04:11:05Z

The decrease in coverage is due to the benchmark not being executed. There is also missing coverage for other benchmarks in Dipy. See PR #1141.

coveralls · 2016-11-03T15:57:30Z

Coverage increased (+0.01%) to 87.993% when pulling 34e1814 on MarcCote:enh_support_arraysequence_for_length into e8eeb70 on nipy:master.

arokem · 2016-11-03T16:01:37Z

Looks like that's fine now, with the merge of #1141. You rebased?

MarcCote · 2016-11-03T16:03:06Z

@arokem yep.

arokem · 2016-12-13T01:43:26Z

Sorry - this has languished here for a while. Looks ready to go from my point of view. @MarcCote : anything more need to be done here before I merge?

MarcCote · 2016-12-14T20:27:30Z

It's all good.
Unless you have some suggestions about how ArraySequences are handled (that is in its own function) in https://github.com/nipy/dipy/pull/1076/files#diff-7bb0bef654a39db09e6f7a55d43b41ba
because there will be more coming (one for compress and set_number_of_point).

arokem · 2016-12-16T15:29:04Z

I guess we can get rid of the checks for nibabel < 2.1, if we go ahead with #1157 as is.

MarcCote · 2016-12-16T19:15:35Z

That would be really nice. Indeed :).

arokem

I guess you can remove all this stuff now :-)

arokem · 2016-12-16T20:26:39Z

dipy/tracking/__init__.py

+
+import nibabel
+
+NIBABEL_LESS_2_1 = LooseVersion(nibabel.__version__) < '2.1'


Seems like this is no longer necessary

arokem · 2016-12-16T20:26:48Z

dipy/tracking/benchmarks/bench_streamline.py

 from dipy.tracking.streamline import (set_number_of_points,
                                      length,
                                      compress_streamlines)
 from dipy.tracking.tests.test_streamline import (set_number_of_points_python,
                                                 length_python,
                                                 compress_streamlines_python)

+if not NIBABEL_LESS_2_1:


arokem · 2016-12-16T20:26:57Z

dipy/tracking/benchmarks/bench_streamline.py

+    assert_array_equal([length_python(s) for s in DATA["streamlines"]],
+                       length(DATA["streamlines"]))
+
+    if not NIBABEL_LESS_2_1:


arokem · 2016-12-16T20:27:07Z

dipy/tracking/streamlinespeed.pyx

 from libc.math cimport sqrt

+from dipy.tracking import NIBABEL_LESS_2_1


This is as well

arokem · 2016-12-16T20:27:17Z

dipy/tracking/streamlinespeed.pyx

    True
    >>> length([])
    0.0
    >>> length(np.array([[1, 2, 3]]))
    0.0

    '''
+    if not NIBABEL_LESS_2_1 and isinstance(streamlines, Streamlines):


arokem · 2016-12-16T20:27:26Z

dipy/tracking/tests/test_streamline.py

@@ -11,9 +11,10 @@
 from numpy.testing import (assert_array_equal, assert_array_almost_equal,
                           assert_raises, run_module_suite)

+from dipy.tracking import NIBABEL_LESS_2_1


arokem · 2016-12-16T20:27:30Z

dipy/tracking/tests/test_streamline.py

@@ -24,6 +25,9 @@
                                      orient_by_rois,
                                      values_from_volume)

+if not NIBABEL_LESS_2_1:


arokem · 2016-12-16T20:27:39Z

dipy/tracking/tests/test_streamline.py


    for i, s in enumerate(streamlines_64bit):
        length_streamline_python = length_python(s)
        assert_array_almost_equal(length_streamlines_cython[i],
                                  length_streamline_python)

+    # ArraySequence
+    if not NIBABEL_LESS_2_1:


And all of this

arokem · 2017-01-07T22:37:58Z

Awesome. Thanks for doing this. I will let Travis do its thing, but if all goes well, and if no one objects, I'll merge this on Monday.

arokem · 2017-01-07T23:42:30Z

dipy/tracking/benchmarks/bench_streamline.py

 from dipy.data import get_data
 from nibabel import trackvis as tv

+from dipy.tracking import NIBABEL_LESS_2_1


Whoops - missed this one!

arokem · 2017-01-07T23:42:41Z

dipy/tracking/benchmarks/bench_streamline.py

+    DATA['streamlines'] = generate_streamlines(nb_streamlines,
+                                               min_nb_points, max_nb_points,
+                                               rng=rng)
+    if not NIBABEL_LESS_2_1:


And this one too!

MarcCote · 2017-01-07T23:47:12Z

Yep, I missed two. I also rebased just to be sure.

coveralls · 2017-01-08T01:47:45Z

Coverage increased (+0.009%) to 88.397% when pulling a531a7a on MarcCote:enh_support_arraysequence_for_length into d489c86 on nipy:master.

arokem · 2017-01-24T16:59:50Z

@MarcCote -- the following bot failure might be related to this PR: http://nipy.bic.berkeley.edu/builders/dipy-bdist64-27/builds/113/steps/shell_12/logs/stdio

Do you understand what's going on there?

MarcCote · 2017-04-18T00:48:22Z

@arokem oops, I didn't see your last comment. It probably has to do with the fact that we are now using np.intp for ArraySequence._lengths and ._offsets and that wasn't the case when I made this PR in June :-/. I'll make a patch.

MarcCote · 2017-04-18T01:24:02Z

@arokem @matthew-brett
Changing the signature of c_arclengths_from_arraysequence (https://github.com/nipy/dipy/blob/master/dipy/tracking/streamlinespeed.pyx#L39) like this

cdef void c_arclengths_from_arraysequence(Streamline points,
                                          np.npy_intp[:] offsets, np.npy_intp[:] lengths,
                                          double[:] arclengths) nogil:
...

doesn't seem to work on my linux machine (64bits) :/.

Now, I'm getting the same kind of error as the Windows machine.

ERROR: dipy.tracking.tests.test_streamline.test_length
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/marc/.local/env/p2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/data/research/src/dipy/dipy/tracking/tests/test_streamline.py", line 374, in test_length
    length_streamline_arrseq = length(Streamlines([streamline]))
  File "dipy/tracking/streamlinespeed.pyx", line 109, in dipy.tracking.streamlinespeed.length (dipy/tracking/streamlinespeed.c:2607)
    streamlines._offsets,
ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

It seems np.npy_intp (cython identifier) is not the same as np.intp. Am I missing something? Maybe I should try reinstalling Cython and numpy (I'm using virtualenv, maybe the two libraries are out of sync with each of other)?

Here' the definition of a ArraySequence
https://github.com/MarcCote/nibabel/blob/master/nibabel/streamlines/array_sequence.py#L77

matthew-brett · 2017-04-18T10:49:43Z

I think Cython's intp is always the same as numpy's intp, but intp might not be the same as int64 (for example). On my Mac, the offsets for the test are int64 data type, rather than intp - could that be the problem?

MarcCote · 2017-04-18T12:51:15Z

On my machine intp is int64 but Cython's np.npy_intp seems to be int32 :-/.

In [10]: np.intp
Out[10]: numpy.int64

matthew-brett · 2017-04-18T13:32:57Z

Hmm, yes, it works correctly for me, but only for Cython >= 0.25.1 .

This is not relevant here, I think, but it's also safer to use libc.stdlib for the imports of malloc, free:

diff --git a/dipy/tracking/streamlinespeed.pyx b/dipy/tracking/streamlinespeed.pyx
index 06af294..3006b73 100644
--- a/dipy/tracking/streamlinespeed.pyx
+++ b/dipy/tracking/streamlinespeed.pyx
@@ -4,6 +4,7 @@
 import cython
 import numpy as np
 from libc.math cimport sqrt
+from libc.stdlib cimport malloc, free
 
 cimport numpy as np
 
@@ -13,11 +14,6 @@ from dipy.tracking import Streamlines
 cdef extern from "dpy_math.h" nogil:
     bint dpy_isnan(double x)
 
-cdef extern from "stdlib.h" nogil:
-    ctypedef unsigned long size_t
-    void free(void * ptr)
-    void * malloc(size_t size)
-
 
 cdef double c_length(Streamline streamline) nogil:
     cdef:
@@ -37,7 +33,8 @@ cdef double c_length(Streamline streamline) nogil:
 
 
 cdef void c_arclengths_from_arraysequence(Streamline points,
-                                          long[:] offsets, long[:] lengths,
+                                          np.npy_intp[:] offsets,
+                                          np.npy_intp[:] lengths,
                                           double[:] arclengths) nogil:
     cdef:
         np.npy_intp i, j, k

MarcCote · 2017-04-19T01:34:11Z

Ok, I've reinstalled Cython 0.21 (the version I currently have), did a make clear and a make ext. The tests are passing now on my machine. I'll make a PR.
EDIT: apparently it seems I didn't make a full make clear :/

…for_length ENH: Add support for ArraySequence in `length` function

MarcCote changed the title ~~Add support for ArraySequence in length function~~ ENH: Add support for ArraySequence in length function Jun 9, 2016

MarcCote force-pushed the enh_support_arraysequence_for_length branch from cadf229 to 70c2235 Compare November 2, 2016 23:54

MarcCote force-pushed the enh_support_arraysequence_for_length branch from 70c2235 to 34e1814 Compare November 3, 2016 14:43

arokem mentioned this pull request Dec 16, 2016

Replace get_affine with affine and get_header with header. #1157

Merged

arokem reviewed Dec 16, 2016

View reviewed changes

arokem reviewed Jan 7, 2017

View reviewed changes

MarcCote added 5 commits January 7, 2017 18:44

Length now supports ArraySequence

1b46c1c

PEP8

0db5fc1

Check for Nibabel version

a427c05

FIX: check Nibabel's version in benchmarks too

81ccf75

Removed Nibabel 2.1 checks

7a78481

MarcCote force-pushed the enh_support_arraysequence_for_length branch from 930c73d to 7a78481 Compare January 7, 2017 23:44

BF: forgot some NIBABEL_LESS_2_1

a531a7a

arokem merged commit e414987 into dipy:master Jan 10, 2017

MarcCote deleted the enh_support_arraysequence_for_length branch January 11, 2017 20:48

This was referenced Apr 19, 2017

BF: Use np.npy_intp instead of assuming long for ArraySequence attributes #1227

Merged

RF: Use cython imports instead of relying on extern #1228

Merged

arokem mentioned this pull request May 12, 2017

Enable memory profiling of the examples. #1239

Merged

MarcCote mentioned this pull request Oct 5, 2017

ENH: Add support for ArraySequence inset_number_of_points function #1348

Merged

ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018

Merge pull request dipy#1076 from MarcCote/enh_support_arraysequence_…

f0947f6

…for_length ENH: Add support for ArraySequence in `length` function


		import nibabel

		NIBABEL_LESS_2_1 = LooseVersion(nibabel.__version__) < '2.1'

		from libc.math cimport sqrt

		from dipy.tracking import NIBABEL_LESS_2_1

ENH: Add support for ArraySequence in length function #1076

ENH: Add support for ArraySequence in length function #1076

Conversation

MarcCote commented Jun 9, 2016 • edited Loading

MarcCote commented Jun 14, 2016

arokem commented Jun 14, 2016

arokem commented Jun 14, 2016

MarcCote commented Jun 14, 2016

MarcCote commented Jun 16, 2016

arokem commented Jun 16, 2016

matthew-brett commented Jun 16, 2016

MarcCote commented Jun 16, 2016

arokem commented Jun 16, 2016

coveralls commented Jun 16, 2016 • edited Loading

codecov-io commented Jun 16, 2016 • edited Loading

Current coverage is 85.87% (diff: 100%)

Garyfallidis commented Jun 17, 2016

arokem commented Jun 17, 2016

MarcCote commented Oct 28, 2016

Garyfallidis commented Nov 2, 2016

MarcCote commented Nov 2, 2016

matthew-brett commented Nov 2, 2016

MarcCote commented Nov 2, 2016 • edited Loading

coveralls commented Nov 3, 2016 • edited Loading

MarcCote commented Nov 3, 2016 • edited Loading

coveralls commented Nov 3, 2016

arokem commented Nov 3, 2016

MarcCote commented Nov 3, 2016

arokem commented Dec 13, 2016

MarcCote commented Dec 14, 2016

arokem commented Dec 16, 2016

MarcCote commented Dec 16, 2016

arokem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arokem commented Jan 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcCote commented Jan 7, 2017

coveralls commented Jan 8, 2017

arokem commented Jan 24, 2017

MarcCote commented Apr 18, 2017

MarcCote commented Apr 18, 2017

matthew-brett commented Apr 18, 2017

MarcCote commented Apr 18, 2017

matthew-brett commented Apr 18, 2017

MarcCote commented Apr 19, 2017 • edited Loading

ENH: Add support for ArraySequence in `length` function #1076

ENH: Add support for ArraySequence in `length` function #1076

MarcCote commented Jun 9, 2016 •

edited

Loading

coveralls commented Jun 16, 2016 •

edited

Loading

codecov-io commented Jun 16, 2016 •

edited

Loading

MarcCote commented Nov 2, 2016 •

edited

Loading

coveralls commented Nov 3, 2016 •

edited

Loading

MarcCote commented Nov 3, 2016 •

edited

Loading

MarcCote commented Apr 19, 2017 •

edited

Loading