New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the writeable flag back to its original state when finished. #431

Merged
merged 7 commits into from Oct 9, 2014

Conversation

Projects
None yet
3 participants
@MarcCote
Contributor

MarcCote commented Oct 1, 2014

Right now, using Cython version of length and set_number_of_points on read-only numpy arrays (as provided by NiBabel) will fail. The reason is memoryview can only be used with writable numpy array (for more details check https://mail.python.org/pipermail/cython-devel/2013-February/003394.html)

This PR makes sure those Cython functions set the writeable flag of numpy arrays to True before processing them and changed it back to its original state once done.

streamlines_length[i] = _length[double2d](streamlines[i])
streamline.setflags(write=writeable)

This comment has been minimized.

@MrBago

MrBago Oct 1, 2014

Contributor

Sorry this is probably me just being stupid, but what is streamline here? It doesn't seem like the streamline variable is being updated in the loop.

This comment has been minimized.

@MarcCote

MarcCote Oct 1, 2014

Contributor

Woa, no you are not, I am. It should be streamlines[i] instead of streamline.

@MrBago

This comment has been minimized.

Contributor

MrBago commented Oct 1, 2014

My issue with this approach is that it doesn't solve the problem in general. Read-only arrays sometimes raise an error if you try and make them writable. Take this example:

import numpy as np
a = np.zeros((10, 3))
a.flags.writeable = False
b = a[:]
b.setflags(write=True)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-b316630e639a> in <module>()
----> 1 b.setflags(write=True)

ValueError: cannot set WRITEABLE flag to True of this array

I think the only way to solve this problem in general is to copy the array by doing something like this:

streamline = streamline if streamline.flags.writeable else streamline.copy()

That being said, I think this is an OK solution to the nibable issue if we make it clear in the comments that's the problem it's meant to solve.

Another approach to solve the nibabel problem (though it may have it's own issues) is to have trackvis.read use np.fromstring instead of np.ndarray(..., buffer=string).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 1, 2014

Streamlines are usually very large datasets >1.5 Gbytes. If we start copying them we will definitely get MemoryErrors.

@MrBago

This comment has been minimized.

Contributor

MrBago commented Oct 1, 2014

You only need to copy one at a time, it leaves memory as soon as you need
to copy the next one. Even a streamline with thousands of points only takes
up a few KB.

On Wed, Oct 1, 2014 at 2:06 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Streamlines are usually very large datasets >1.5 Gbytes. If we start
copying them we will definitely get MemoryErrors.


Reply to this email directly or view it on GitHub
#431 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 1, 2014

Okay this was not clear in your previous post. But I see what you mean now. But copying memory in small blocks takes time too. So please before you change this, profile it first with a few millions of streamlines. And let's see how it does.

@MarcCote

This comment has been minimized.

Contributor

MarcCote commented Oct 2, 2014

Following the same idea proposed by @MrBago, I used astype before calling the inner function _length and _set_number_of_points instead of changing the writeable flag if needed.

I benchmarked the change using respectively 1k and 10k streamlines of 100 points for set_number_of_points and length.

bench_resample ... Timing set_number_of_points() in Cython
New version time: 11.0sec
Old version time: 10.1sec
Speed up of 0.920765027322x

bench_length ... Timing length() in Cython
Cython time: 39.2sec
Cython time: 32.9sec
Speed up of 0.838734371013x

What do you think of this slow down? It is still faster than the python version though.

Edit: This benchmarking was done on streamlines that are writeable and in float32 (I think the useful use case for those functions). But it worth mentioning that the new version is more flexible as it accepts list of streamlines with different dtypes and different writeable flag status. Basically before calling the inner Cython function (e.g. _length) I cast each streamline into either float32 or float64 depending on its dtype (int8/int16/int32 or int64) using astype or if its writeable flag is False.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 2, 2014

Can you try it with millions of streamlines please?

@MrBago

This comment has been minimized.

Contributor

MrBago commented Oct 2, 2014

Keep in mind the difference should only effect read-only arrays.

On Thu, Oct 2, 2014 at 11:11 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Can you try it with millions of streamlines please?


Reply to this email directly or view it on GitHub
#431 (comment).

@MarcCote

This comment has been minimized.

Contributor

MarcCote commented Oct 2, 2014

@MrBago with the modifications I made (called flexible version), it is not the case anymore. Even for arrays that are not read-only, there will be a difference because I am now doing more comparisons in order to handle list of streamlines with mixed dtype/writeable flags. The code is cleaner and shorter this way but it adds a small cost proportional to the number of streamlines.

@Garyfallidis I tested with a millions of streamlines, I forgot to mention that I was repeat the functions 100 times in the benchmark. Here are the results for 1M streamlines and repeat set to 1:

Timing length() in Cython
Flexible version time: 7.3sec
MrBago version time: 3.45sec <-- Caching caused this (see next post instead).
Speed up of 0.472602739726x

Timing set_number_of_points() in Cython
Flexible version time: 17.8sec
MrBago version time: 15.5sec
Speed up of 0.874507597074x

It seems the overhead has a bigger impact on function length.
Edit: I put clearer name for version that are being benchmarking.
Edit2: see next post for up-to-date benchmarking results.

@MarcCote

This comment has been minimized.

Contributor

MarcCote commented Oct 2, 2014

I guess we don't really need those functions to be more flexible. In the end, the normal use case for those functions is to have a list of streamlines having the same dtype/writeable flag status.

I should have start by showing you this benchmarking: repeat=10, 1M streamlines, 100 points each, writeable=True, dtype=float32.

Timing length() in Cython
Original version time: 14.8sec
MrBago version time: 15.8sec (0.933x)
Flex version time: 19.7sec (0.748x)
MrBago (read-only) version time: 28.0sec (0.528x)
Flex (read-only) version time: 36.9sec (0.4x)
Timing set_number_of_points() in Cython
Old version time: 62.4sec
MrBago version time: 67.6sec (0.923x)
Flex version time: 72.9sec (0.857x)
MrBago (read-only) version time: 81.7sec (0.764x)
Flex (read-only) version time: 84.0sec (0.743x)

Edit: It appears caching was impacting the benchmark. So I've redone the benchmark making sure new streamlines are recreated before calling the function.

MarcCote added some commits Oct 2, 2014

Make a new benchmarking function to compare Orignal, MrBago and the f…
…lexible version of functions length and set_number_of_points in Cython
@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 3, 2014

Okay, I get the same timings with you @MarcCote. Switch to what @MrBago is suggesting and will do the merge.

@MarcCote

This comment has been minimized.

Contributor

MarcCote commented Oct 8, 2014

Okay @Garyfallidis , I did some modifications on the flexible version. The flexible version (the one included in this PR) still supports having mixed dtypes and writeable flag status. @MrBago's version supports mixed writeable flag status. Whereas, the original version (the Cythonize version, that is not the Python ones in dipy.tracking.metrics) supports only writeable and homogeneous dtypes.

The following benchmark was produced on 100k streamlines containing 100 points (float32) each and were all writeable. When (read-only) is specified, it means streamlines were set to be read-only before processing them. *The slowdown (x factor) is reported against the original version.

Timing length() in Cython
Original version time: 3.48sec
MrBago version time: 3.85sec (0.904x)
Flex version time: 3.83sec (0.909x)
MrBago (read-only) version time: 7.68sec (0.453x)
Flex (read-only) version time: 8.04sec (0.433x)
Timing set_number_of_points() in Cython
Old version time: 12.5sec
MrBago version time: 13.2sec (0.953x)
Flex version time: 13.9sec (0.903x)
MrBago (read-only) version time: 15.7sec (0.798x)
Flex (read-only) version time: 15.8sec (0.793x)
@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 9, 2014

@MrBago are you good with @MarcCote corrections? Should we go ahead and merge or you still see some issues?

@MrBago

This comment has been minimized.

Contributor

MrBago commented Oct 9, 2014

I think this is OK to merge. I haven't reviewed the tests as well as I probably should, mainly because there seem to be a lot of changes that don't have to do directly with this PR.

The one thing I'd like to add at some point is support for streamlines from a generator (which does not have len), but I don't think it should block this PR. If you guys think that's a good idea make an issue and assign it to me. I'll get around to it eventually (it shouldn't be that hard).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 9, 2014

Cool, added the issue as a reminder.

Thx Marc!

Garyfallidis added a commit that referenced this pull request Oct 9, 2014

Merge pull request #431 from MarcCote/fix_streamlinespeed
Change the writeable flag back to its original state when finished.

@Garyfallidis Garyfallidis merged commit 3ed4501 into nipy:master Oct 9, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@MarcCote MarcCote deleted the MarcCote:fix_streamlinespeed branch Oct 9, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment