New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add support for ArraySequence in`set_number_of_points` function #1348

Merged
merged 6 commits into from Oct 15, 2017

Conversation

Projects
None yet
4 participants
@MarcCote
Contributor

MarcCote commented Oct 5, 2017

Second small PR that aims to support Nibabel's ArraySequenceobjects (the first one was #1076).

Specifically, this PR improves dipy.tracking.streamlinespeed.set_number_of_points to support ArraySequence objects. This gives pretty good speedup even over the current Cython version (all speedups are with respect to the Python implementation):

Timing set_number_of_points() in 100,000 streamlines.
Cython time: 0.590 sec
Python time: 61.76 sec
Speed up of 104.68x
Cython time (ArrSeq): 0.13 sec
Speed up of 475.08x

@MarcCote MarcCote requested a review from jchoude Oct 5, 2017

MarcCote added some commits Oct 5, 2017

@MarcCote MarcCote force-pushed the MarcCote:enh_support_arraysequence_set_number_of_points branch from ac87cbf to 56c943b Oct 5, 2017

@codecov-io

This comment has been minimized.

codecov-io commented Oct 5, 2017

Codecov Report

Merging #1348 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1348      +/-   ##
==========================================
- Coverage   87.02%   86.98%   -0.05%     
==========================================
  Files         228      228              
  Lines       28841    29074     +233     
  Branches     3100     3131      +31     
==========================================
+ Hits        25099    25289     +190     
- Misses       3037     3072      +35     
- Partials      705      713       +8
Impacted Files Coverage Δ
dipy/tracking/tests/test_streamline.py 98.16% <100%> (+0.02%) ⬆️
dipy/testing/__init__.py 77.77% <0%> (-6.84%) ⬇️
dipy/viz/ui.py 90.24% <0%> (-2.11%) ⬇️
dipy/viz/tests/test_ui.py 83.53% <0%> (-1.7%) ⬇️
dipy/core/graph.py 73.8% <0%> (-1.2%) ⬇️
dipy/tracking/life.py 97.8% <0%> (+0.01%) ⬆️
dipy/tracking/streamline.py 97.95% <0%> (+0.01%) ⬆️
dipy/core/geometry.py 90.29% <0%> (+0.03%) ⬆️
dipy/tracking/utils.py 86.93% <0%> (+0.03%) ⬆️
dipy/viz/interactor.py 98.12% <0%> (+0.07%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea3a999...8df3d17. Read the comment docs.

@skoudoro

This comment has been minimized.

Member

skoudoro commented Oct 9, 2017

Hi @MarcCote ,

I got this error when I run the tests on my environment : Windows 10, Python 3.6, anaconda, Cython 0.25.2, numpy 1.12.1

=================================================================
ERROR: dipy.tracking.tests.test_streamline.test_set_number_of_points
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\skoudoro\AppData\Local\Continuum\Anaconda3\envs\dipy-dev-3\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Users\skoudoro\Devel\dipy\dipy\tracking\tests\test_streamline.py", line 239, in test_set_number_of_points
    new_streamlines_as_seq_cython = set_number_of_points(arrseq, nb_points)
  File "dipy\tracking\streamlinespeed.pyx", line 335, in dipy.tracking.streamlinespeed.set_number_of_points
    streamlines._data, streamlines._offsets,
ValueError: Buffer dtype mismatch, expected 'npy_intp' but got 'long'

if dtype == np.float32:
c_set_number_of_points_from_arraysequence[float2d](
streamlines._data, streamlines._offsets,

This comment has been minimized.

@skoudoro

skoudoro Oct 9, 2017

Member

It seems you are using the protected variables in many places. So, I just wonder why there is not get/set property for these variables on Nibabel?

@skoudoro

This comment has been minimized.

Member

skoudoro commented Oct 9, 2017

Hi @MarcCote ,

I try to run the benchmarks, but it fails. Did I do something wrong ?

In [1]: import dipy.tracking as dt

In [2]: dt.bench()
Running benchmarks for dipy.tracking
NumPy version 1.13.1
NumPy relaxed strides checking option: True
NumPy is installed in C:\Users\skoudoro\AppData\Local\Continuum\Anaconda3\envs\dipy-dev-3\lib\site-packages\numpy
Python version 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)]
nose version 1.3.7
Timing set_number_of_points() with 100,000 streamlines.
Cython time: 0.430 sec
ETiming length() with 200,000 streamlines.
Python time: 2.03 sec
Cython time: 0.280 sec
Speed up of 7.25x
FTiming compress_streamlines() in Cython (300 streamlines)
Cython time: 0.02sec
Python time: 0.0sec
Speed up of 0.0x
.
======================================================================
ERROR: dipy.tracking.benchmarks.bench_streamline.bench_set_number_of_points
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\skoudoro\AppData\Local\Continuum\Anaconda3\envs\dipy-dev-3\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Users\skoudoro\Devel\dipy\dipy\tracking\benchmarks\bench_streamline.py", line 69, in bench_set_number_of_points
    " for s in streamlines]", repeat)
  File "C:\Users\skoudoro\AppData\Local\Continuum\Anaconda3\envs\dipy-dev-3\lib\site-packages\numpy\testing\utils.py", line 1315, in measure
    exec(code, globs, locs)
  File "Test name: None ", line 1, in <module>
    """ Benchmarks for functions related to streamline
  File "Test name: None ", line 1, in <listcomp>
    """ Benchmarks for functions related to streamline
NameError: name 'nb_points' is not defined

======================================================================
FAIL: dipy.tracking.benchmarks.bench_streamline.bench_length
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\skoudoro\AppData\Local\Continuum\Anaconda3\envs\dipy-dev-3\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Users\skoudoro\Devel\dipy\dipy\tracking\benchmarks\bench_streamline.py", line 103, in bench_length
    length(DATA["streamlines"]))
  File "C:\Users\skoudoro\AppData\Local\Continuum\Anaconda3\envs\dipy-dev-3\lib\site-packages\numpy\testing\utils.py", line 854, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "C:\Users\skoudoro\AppData\Local\Continuum\Anaconda3\envs\dipy-dev-3\lib\site-packages\numpy\testing\utils.py", line 778, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not equal

(mismatch 57.825%)
 x: array([ 36.168137,  14.202025,  31.208102, ...,  30.387572,  26.99025 ,
        10.31078 ])
 y: array([ 36.168137,  14.202025,  31.208102, ...,  30.387572,  26.99025 ,
        10.31078 ])

----------------------------------------------------------------------
Ran 3 tests in 3.124s

FAILED (errors=1, failures=1)
Out[2]: False
@MarcCote

This comment has been minimized.

Contributor

MarcCote commented Oct 10, 2017

@skoudoro thanks, I forgot to cast the offsets and lengths to np.intp. Regarding accessing those protected variables, you are right. We should provide get/set methods for those.

Regarding the benchmark function, I'm a bit confused. I don't understand why it says nb_points no defined :(. I don't have this problem on my linux machine. I use the following command to run the benchmark: nosetests -s --match '(?:^|[\\b_\\.//-])[Bb]ench' dipy/tracking/benchmarks/bench_streamline.py:bench_set_number_of_points.

The array mismatch error when benchmarking the length function seems to be a precision error (maybe a Windows vs. Linux thing).

@MarcCote

This comment has been minimized.

Contributor

MarcCote commented Oct 10, 2017

@matthew-brett any suggestion on the best way to expose ArraySequence._data, ArraySequence._offsets and ArraySequence._lengths in NiBabel? Should we just make them public? The main reason why I made them private in the first place was to present a cleaner API to the user, but clearly it can be useful to play with those attributes directly (for advanced users let's say).

@skoudoro

Hi @MarcCote , Thanks for the correction! All tests passed on Windows now.

  • Concerning benchmark, it works with IPython. I do not know why it does not work with nosetests command line on windows. that's really weird. Anyway, I obtain these values:
Cython time: 0.450 sec
Python time: 34.37 sec
Speed up of 76.38x
Cython time (ArrSeq): 0.120 sec
Speed up of 286.42x
  • Concerning array mismatch, I made a comment below
  • And concerning the protected variable, thanks for this information
@@ -98,7 +104,7 @@ def bench_length():

This comment has been minimized.

@skoudoro

skoudoro Oct 11, 2017

Member

Because of windows error, can you replace it by
assert_array_almost_equal([length_python(s) for s in DATA["length(DATA["streamlines"]), decimal=12)

This comment has been minimized.

@MarcCote

MarcCote Oct 11, 2017

Contributor

It should be fine now. I used assert_array_almost_equal without default value for decimal as we do in the tests.

This comment has been minimized.

@skoudoro

skoudoro Oct 11, 2017

Member

perfect ! Thanks !

@@ -305,7 +363,7 @@ def set_number_of_points(streamlines, nb_points=3):
raise ValueError("All streamlines must have at least 2 points.")

# Allocate memory for each modified streamline
modified_streamlines = []
new_streamlines = []

This comment has been minimized.

@Garyfallidis

Garyfallidis Oct 14, 2017

Member

@MarcCote shouldn't this be Streamlines rather than []?

This comment has been minimized.

@MarcCote

MarcCote Oct 15, 2017

Contributor

This part of the code handles the case where a list of ndarrays was provided. In other words, if the user calls set_number_of_points on a list of ndarrays, we return a list of modified ndarrays. If the user provides an ArraySequence (a.k.a. dipy.tracking.Streamlines) object, we return a modified ArraySequence. Let me know if that behavior is not what you were expecting.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 14, 2017

Hi @MarcCote! This looks ready to be merged apart from a minor issue. It seems there are still some lists rather than Streamlines (ArraySequence) in streamlinespeed.pyx. Can you check?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 15, 2017

Okay thank you for the explanation. It makes sense to have a plan B. We may want to enforce later having only Streamlines as input. Doesn't need to happen now.

@Garyfallidis Garyfallidis merged commit 418108f into nipy:master Oct 15, 2017

3 checks passed

codecov/patch 100% of diff hit (target 87.02%)
Details
codecov/project Absolute coverage decreased by -0.04% but relative coverage increased by +12.97% compared to ea3a999
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@MarcCote MarcCote deleted the MarcCote:enh_support_arraysequence_set_number_of_points branch Oct 17, 2017

ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018

Merge pull request nipy#1348 from MarcCote/enh_support_arraysequence_…
…set_number_of_points

ENH: Add support for ArraySequence in`set_number_of_points` function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment