New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cythonize DirectionGetter and whatnot #1342

Merged
merged 8 commits into from Nov 21, 2017

Conversation

Projects
None yet
4 participants
@nilgoyette
Contributor

nilgoyette commented Sep 26, 2017

Disclaimer: there are 2 failing tests.

I discovered after much testing that calling def pyhon code from cython is slow. Much slower than one could think! I tested on LocalTracking and PFT Tracking, though pft doesn't appear in this PR. in both cases, the speedup is around 2x, less for LocalTracking, more for PFT. One could do better by removing checks and cythonizing more but I concentrated on the important parts: DirectionGetter::get_direction, PmfGen::get_pmf and calling C code as much as possible.

What you might not like:

  • I removed some checks and I would remove more! The verifications should be done one time at the start of the algo, not in the heart of the cython code!
  • I removed many calls to numpy and coded the functions myself. Calling numpy has a cost and it's now worth it, IMO. Of course, I don't like duplicating code from numpy... :(
  • I often have a def name() function that simply calls a cdef name_c() function, which is a fast version without check. This can cause some problems, and it is indeed causing a problem :) 2 LocalTracking tests are failing because of this. My cython code is directly calling the C version and the test overrides the python version, so SimpleTissueClassifier and SimpleDirectionGetter are ignored. Of course, I could call the def or cpdef version but there's a cost to this! To discuss.

What I don't like:

  • cdf = self._adj_matrix[tuple(*direction)] This will always be slow and (!) it's using 3 floats as a dictionnary key :( I don't know the code enough to fix it but it would be much better/faster if it was a simple integer. Maybe a struct: int index, double[3] direction?
  • Using np.random() or random(), not because it's random but because it's the only remaining slow call! I was tempted to use the old C rand() but I've heard so much bad things about it that I didn't dare. Do you think it's worth it?
@nilgoyette

This comment has been minimized.

Contributor

nilgoyette commented Sep 26, 2017

BTW, I think the right merge order should be 1) this PR 2) PFT (merge master, fix conflits) 3) parallel local/pft tracking

@gabknight

Thank you for this PR. 2X is super!! See my comments, mostly PEP8 comments. This makes sense to recode those small numpy functions here, as they are called at every tracking step (sometimes more than once per step for PFT).

  • I don't grasp the problem with def name()versus cdef name_c(). Can we modify the tests here?

  • cdf = self._adj_matrix[tuple(*direction)]. Those 3 float could be changed to a single int. In fact, the direction is always one of the vertex of the sphere. To do that, we would need to keep track of the index of the vertex on the sphere (and the direction), instead of solely the direction. This is fine for all pmf-based model as they all used a discrete sphere. However, this is not the case for the direction of EuDX (PeaksAndMetricsDirectionGetter). This might need a bit of work to be done nicely. I suggest keeping this for another PR.

@@ -202,14 +209,15 @@ def _set_adjacency_matrix(self, sphere, cos_similarity):
each value is a boolean array indicating which directions are less than
max_angle degrees from the key"""
matrix = np.dot(sphere.vertices, sphere.vertices.T)
matrix = abs(matrix) >= cos_similarity
matrix = (abs(matrix) >= cos_similarity).astype('uint8')

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

Why uint8 here?

This comment has been minimized.

@nilgoyette

nilgoyette Oct 20, 2017

Contributor

Because cython doesn't recognize bool. Numpy does so the line was "ok" but there are warning when we try to use it later in cython.

This comment has been minimized.

@gabknight

gabknight Oct 23, 2017

Contributor

ok, thx.

cpdef trilinear_interpolate4d(double[:, :, :, :] data, double[:] point,
np.ndarray out=*)
cpdef trilinear_interpolate4d(
double[:, :, :, :] data,

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

I think here the function parameters should be aligned with the (.

This comment has been minimized.

@nilgoyette

nilgoyette Oct 20, 2017

Contributor

I don't think PEP8 says anything on this subject. I can revert the [style] change but I personally find it harder to read because of the random numbers of vars on each line.

Idem for all other alignment comments.

cdef int _trilinear_interpolate_c_4d(double[:, :, :, :] data, double[:] point,
double[::1] result) nogil:
cdef int trilinear_interpolate4d_c(
double[:, :, :, :] data,

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

alignment here?

for i in range(3):
b[i] = a[i]
def local_tracker(
DirectionGetter dg, TissueClassifier tc,

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

alignment here?

np.ndarray[np.float_t, ndim=2, mode='c'] streamline,
double stepsize,
int fixedstep):
cdef int _local_tracker(

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

alignment here?

double point[3], dir[3], vs[3], voxdir[3]
double[::1] pview = point, dview = dir
void (*step)(double*, double*, double) nogil
void (*step)(double * , double*, double) nogil

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

The original line was OK, no?

This comment has been minimized.

@nilgoyette

nilgoyette Oct 20, 2017

Contributor

Err, yes, sorry, I accidently added a space :) Will remove.

cdef:
double result
int err
err = _trilinear_interpolate_c_4d(self.metric_map[..., None], point,
self.interp_out_view)
err = trilinear_interpolate4d_c(

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

alignment here?

include_err = _trilinear_interpolate_c_4d(self.include_map[..., None],
point, self.interp_out_view)
include_err = trilinear_interpolate4d_c(
self.include_map[..., None],

This comment has been minimized.

@gabknight

gabknight Oct 20, 2017

Contributor

alignment here?

@nilgoyette

This comment has been minimized.

Contributor

nilgoyette commented Oct 20, 2017

Thank you for looking at this @gabknight

My explanation of def name() and def name_c() was a little fast. Let me try again :) Take TissueClassifier for example:

cdef class TissueClassifier:
    cpdef TissueClass check_point(self, double[::1] point):
        if point.shape[0] != 3:
            raise ValueError("Point has wrong shape")

        return self.check_point_c(&point[0])

    cdef TissueClass check_point_c(self, double* point):
         pass

The tests define their own TissueClassifier::check_point (the python version, not _c). My new cython code directly calls the _c version, thus ignoring the new test version.

  • I can't call the python version because ... that's the point of this PR! We should never call Python code when running Cython if we want to be fast.
  • I could put the new tests definition in the .pxd/.pyx but, meh, it's ugly. I'm not sure what else I can do though.
  • Well, yes, we could use cpdef instead but I read that it's quite costly. And I can't use pointers in cpdef function.

I think I need to put more emphasis on this. The def name()/def name_c() method is indeed fast but you can't override those method anymore in normal python code.

@gabknight

This comment has been minimized.

Contributor

gabknight commented Oct 25, 2017

OK, thx @nilgoyette. I get a better picture now. I think we will always want to override those methods in cython anyway.

I notice that line 94 in /reconst/peak_direction_getter.pyx should read:
cdef int get_direction_c(self, double* point, double* direction):
instead of:
cpdef int get_direction(self, double[::1] point, double[::1] direction) except -1:
This caused PeaksAndMetricsDirectionGetter to produce straight line only. This was not covered by tests.

I could do a PR on your branch to modify the failing tests and add the missing test for PeaksAndMetricsDirectionGetter.

@nilgoyette

This comment has been minimized.

Contributor

nilgoyette commented Oct 25, 2017

@gabknight, if you want and have the time to do it, sure! Thank you.

@skoudoro

Thanks for this PR @nilgoyette!
I just made some cosmetic comment in your PR.

I removed some checks and I would remove more! ...

totally agree

I removed many calls to numpy and coded the functions myself.

I think for this case, we should put these functions in dipy.utils and document it like we did in dipy.utils.six. It can be quite confusing if we let it there. Moreover, if for any reason, someone wants to use this function somewhere else, it's quite ugly to call them from this package. What do you think? What is the good way for that @arokem ? any rules?

I often have a def name() function that simply calls a cdef name_c() function, which is a fast version without check. This can cause some problems....

I need to think more about it

Using np.random() or random(), .....

I think you should implement your own if you think that it will really improve your performance. Like you, I heard too many bad things about the old C implementation so +1 to avoid it.

@@ -137,7 +137,7 @@ def peak_directions(odf, sphere, relative_peak_threshold=.5,
elif n == 1:
return sphere.vertices[indices], values, indices
odf_min = odf.min()
odf_min = np.min(odf)

This comment has been minimized.

@skoudoro

skoudoro Oct 25, 2017

Member

I was curious about the difference here, so I made the small test below and it seems that odf.min() is faster than np.min(odf) in general. (Numpy 1.13.1, Python 3.6). I do not know if it is true for other versions of Numpy or Python. Any idea?

import numpy as np
a = np.random.rand(3000,160000)
%timeit a.min()
172 ms ± 2.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit np.min(a)
177 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This comment has been minimized.

@nilgoyette

nilgoyette Oct 25, 2017

Contributor

Haha, thank you for the test :) My goal, for once, wasn't to be faster. I changed it because this method is called from Cython and odf is not a np.array anymore, it's a memoryview. A memoryview doesn't have any of the numpy method (well, yes, some but now much).

tldr np.func(data) accepts np.array AND memoryview. data.func() doesn't.

This comment has been minimized.

@skoudoro

skoudoro Oct 25, 2017

Member

Oh ok ! Good to know, Thanks !

trilinear_interpolate4d_c(self.shcoeff, point, self.coeff)
for i in range(len_pmf):
_sum = 0
for j in range(self.B.shape[1]):

This comment has been minimized.

@skoudoro

skoudoro Oct 25, 2017

Member

Like you did above, you should avoid to call shape function on each iteration so I think it will be better for your performance to put self.B.shape[1] in a size_t variable.

raise ValueError("%s is not a known basis type." % basis_type)
self._B, m, n = basis(sh_order, sphere.theta, sphere.phi)
import numpy as np
cimport numpy as np

This comment has been minimized.

@skoudoro

skoudoro Oct 25, 2017

Member

it can be confusing. Can you use cnp as an alias?

This comment has been minimized.

@nilgoyette

nilgoyette Oct 25, 2017

Contributor

Yes, I can, but I'll wait for another request because there's no name collision and imo it's not confusing. The cimport version is only used for the variable type, so in cdef and argument definition. For example np.float_t clearly use the cimport, while np.dot doesn't.

As I said, I'll change it to cnp if it confuses the team!

@nilgoyette

This comment has been minimized.

Contributor

nilgoyette commented Oct 25, 2017

I think for this case, we should put these functions in dipy.utils

Yes, sure! I put them just above my code and, of course, I shouldn't do that :) I think I even duplicated them 2 times in different files. Having a fast_numpy pyx library file is indeed a good idea.

Using np.random() or random()
I think you should implement your own if you think that it will really improve your performance.

Well, no, I don't actually want to code my own random! My question was more: do we have actual proofs that rand() sucks or any reason why we shouldn't use it? My research on this matter wasn't conclusive. I think it provides an uniform distribution but the interval is divided by RAND_MAX, which can be as "low" as 32767. Is this a problem? Any connoiseur here?

Merge pull request #2 from gabknight/TST_stop_conditions
Fixed tracking tests and PeaksAndMetricsDirectionGetter
@codecov-io

This comment has been minimized.

codecov-io commented Oct 27, 2017

Codecov Report

Merging #1342 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1342      +/-   ##
==========================================
- Coverage   87.01%   86.97%   -0.04%     
==========================================
  Files         228      227       -1     
  Lines       29086    28977     -109     
  Branches     3131     3119      -12     
==========================================
- Hits        25309    25204     -105     
+ Misses       3068     3067       -1     
+ Partials      709      706       -3
Impacted Files Coverage Δ
dipy/tracking/local/tests/test_local_tracking.py 97.32% <100%> (-0.13%) ⬇️
dipy/direction/peaks.py 79.32% <100%> (ø) ⬆️
dipy/tracking/local/__init__.py 100% <100%> (ø) ⬆️
dipy/core/graph.py 75% <0%> (+1.19%) ⬆️
...ipy/tracking/local/tests/test_tissue_classifier.py 95.74% <0%> (+2.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f205f3...573e7e5. Read the comment docs.

@gabknight

This comment has been minimized.

Contributor

gabknight commented Oct 31, 2017

@nilgoyette I added an issue for the dictionary keys #1372.

@gabknight

This looks good to me. I will update PR #1340 once merged.

@gabknight

This comment has been minimized.

Contributor

gabknight commented Nov 21, 2017

@skoudoro do you have further comments on this? If not, I will go ahead merge this PR and move forward with PR #1340 and #1184.

@skoudoro

This comment has been minimized.

Member

skoudoro commented Nov 21, 2017

Yes, sure! I put them just above my code and, of course, I shouldn't do that :) I think I even duplicated them 2 times in different files. Having a fast_numpy pyx library file is indeed a good idea.

I just wonder if we should do it now or in a new PR. What do you think @gabknight @nilgoyette ?

Otherwise, LGTM, you can go ahead @gabknight

@nilgoyette

This comment has been minimized.

Contributor

nilgoyette commented Nov 21, 2017

@skoudoro the move is done. You can merge, if you think everything is done.

@skoudoro

This comment has been minimized.

Member

skoudoro commented Nov 21, 2017

@skoudoro skoudoro merged commit 9751eb2 into nipy:master Nov 21, 2017

3 checks passed

codecov/patch 100% of diff hit (target 87.01%)
Details
codecov/project Absolute coverage decreased by -0.03% but relative coverage increased by +12.98% compared to 6f205f3
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

ShreyasFadnavis pushed a commit to ShreyasFadnavis/dipy that referenced this pull request Sep 20, 2018

Merge pull request nipy#1342 from nilgoyette/cythonize_dg
Cythonize DirectionGetter and whatnot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment