(Not so) minor optimizations #1012

adler-j · 2017-05-29T09:34:42Z

This started out small but in the end i achieved a x2 speedup for small tomographic inversion problems, clearly nice.

Improvements include:

MInor optimization on hot paths
- Remove unncessary "isinstance" call
- Remove repeated "size" etc calls
Cache some often re-used stuff (like proximals) using functools.lrc_cache
Reduce number of "element" calls by using proper inplace arithmetic in douglas rachford solver
Performed some (rudimentary) performance tests and we now have three "levels" of lincomb, each optimal for its own case:
- size < 100: Callback to pure numpy
- 100 <= size <= 50000: Use blas style arithmetic but without blas
- 50000 < size: Use blas
Renamed _lincomb to _lincomb_impl since the previous did not play well with the spyder profiler

adler-j · 2017-05-29T14:54:18Z

Added several new optimizations, squeezing out about 20% more perfomance in the afforementioned small examples.

Some short guide on how I profiled to do this:

python -m cProfile -o profiling_data.pyprof profile_mri.py
pyprof2calltree -i profiling_data.pyprof

here, cprofile ships with python and pyprof2calltree can be installed via pip install pyprof2calltree

I then visualized the result using KCacheGrind

Overall this was quite easy and I don't have to feel as ashamed of my reconstruction times when I present, we should do it more often.

adler-j · 2017-05-30T00:42:17Z

odl/space/npy_ntuples.py

-
-    if _blas_is_applicable(x1, x2, out):
+    # If data is very big, use BLAS if possible
+    if size >= 50000 and _blas_is_applicable(x1, x2, out):


Perhaps add names for the constants 100 and 50000

Yes, please

adler-j · 2017-05-30T01:59:25Z

odl/solvers/nonsmooth/douglas_rachford.py

-        tmp_2 = sum(Li.adjoint(wi) for Li, wi in zip(L, w2))
-        z1.lincomb(1.0, w1, - (tau / 2.0), tmp_2)
-        x += lam(k) * (z1 - p1)
+        # Compute tmp_domain = sum(Li.adjoint(wi) for Li, w2i in zip(L, w2))


kohr-h

The optimizations look fine. I tried to eyeball them but didn't run any checks. I guess tests / examples will tell if this code is equivalent.
Check the points I raised, then go ahead.

kohr-h · 2017-05-31T17:13:50Z

odl/discr/discretization.py

@@ -535,7 +535,6 @@ class DiscretizedSpaceElement(DiscretizedSetElement, FnBaseVector):

    def __init__(self, space, data):
        """Initialize a new instance."""
-        assert isinstance(space, DiscretizedSpace)


Fully agree, I think I remove all of these in the #861 PR

kohr-h · 2017-05-31T17:17:38Z

odl/discr/grid.py

+        # Cache for efficiency instead of re-computing
+        try:
+            strd = self.__stride
+        except AttributeError:


I don't like this pattern of trying to access some attribute of self and react on failure. I would clearly prefer to initialize the attribute to some nonsense ("sentinel") value like None and check for that instead of checking if it's there at all. The check a is None should incur no cost whatsoever and make this more robust.

kohr-h · 2017-05-31T17:18:53Z

odl/discr/grid.py

+            self.__stride = np.array(strd)
+            return self.__stride.copy()
+        else:
+            return strd.copy()


Why copy? I can see this as a general pattern, but that would require changing similar code in lots of places.

Well because otherwise there is a severe risk of stuff like this:

strides = grid.strides() strides += 2 # lots of code strides = grid.strides() # returns bullshit

This (with the copy) is equivalent to the old code

kohr-h · 2017-05-31T17:23:50Z

odl/util/numerics.py

            factor = factor * arr[slc]

        out *= factor

        # Finally multiply by the remaining 1d array
        slc = [None] * out.ndim
-        slc[last_ax] = np.s_[:]
+        slc[last_ax] = slice(None)


Good change, now that we understand what's going on :-)

kohr-h · 2017-05-31T17:41:04Z

odl/util/utility.py

+    """Decorate function to cache the result with given arguments.
+
+    This is equivalent to `functools.lru_cache` with Python 3, and currently
+    does nothing with Python 2 but this may change at some later point.


WTF?? Okay, I guess this is still better than depending on some backport package for Python 2. The "later point" at which this changes will be the point when we drop Python 2 compatibility.

In the future library, ~~this seems to be available anyway: check here~~ OK, not without an extra pip install.

I'll leave it for now. Python 2 is mostly supported because we have to in a sense. Minor optimizations of 10% should not really be expected on legacy platforms.

kohr-h · 2017-05-31T17:43:41Z

odl/solvers/nonsmooth/douglas_rachford.py

+        lam_k = lam(k)
+
+        # Compute tmp_domain = sum(Li.adjoint(vi) for Li, vi in zip(L, v))
+        L[0].adjoint(v[0], out=tmp_domain)


Do we reach this point if L is empty?

Do we allow L empty? we did not before anyway.

I guess we can add an if statement for it, but I'm not even sure if the algorithm is valid in that case.

adler-j · 2017-06-03T11:36:47Z

Fixed the comments, will merge.

adler-j added status: review needed type: performance labels May 29, 2017

adler-j commented May 30, 2017

View reviewed changes

kohr-h approved these changes May 31, 2017

View reviewed changes

adler-j added 13 commits June 5, 2017 11:00

ENH: Memorize some often reused variables to improve performance

4d2b26a

ENH: Improve performance of numpy lincomb

710422d

MAINT: Reduce performance heavy assert

4144c21

ENH: Improve performance of douglas_rachford_pd

3d44c91

MAINT: Change lru_cache to version independent impl

a9e12df

BUG: Fix bug in return type for ReductionOperator

5a56034

ENH: Cache RectGrid.stride for efficiency

247f803

ENH: Minor performance improvement to _interp_kernel_ft

7b814ae

ENH: Performance improvements to fast_1d_tensor_mult

5c25f62

ENH: Cache 'is_xxx_dtype' functions

ea7faef

MAINT: Minor performance improvement to dft_postprocess_data

3fd52ac

MAINT: Minor fixes to grid.stride and doc

f32a0fe

ENH: Allow empty list of operators in douglas_rachford_pd

9a4f7fc

adler-j force-pushed the minor_optimizations branch from a70a2fc to 9a4f7fc Compare June 5, 2017 09:03

adler-j merged commit 7118cb1 into odlgroup:master Jun 6, 2017

adler-j deleted the minor_optimizations branch June 6, 2017 02:00

adler-j mentioned this pull request Jun 9, 2017

Potential memory leak #1014

Closed

kohr-h removed the status: review needed label Jun 14, 2017

adler-j mentioned this pull request Jun 29, 2017

Memorize some often reused operators #718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Not so) minor optimizations #1012

(Not so) minor optimizations #1012

adler-j commented May 29, 2017

adler-j commented May 29, 2017

adler-j May 30, 2017

kohr-h May 31, 2017

adler-j May 30, 2017

kohr-h left a comment

kohr-h May 31, 2017

kohr-h May 31, 2017

kohr-h May 31, 2017

adler-j Jun 1, 2017

kohr-h May 31, 2017

kohr-h May 31, 2017

adler-j Jun 3, 2017

kohr-h May 31, 2017

adler-j Jun 1, 2017

adler-j commented Jun 3, 2017

(Not so) minor optimizations #1012

(Not so) minor optimizations #1012

Conversation

adler-j commented May 29, 2017

adler-j commented May 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kohr-h left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adler-j commented Jun 3, 2017