New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Not so) minor optimizations #1012
Conversation
Added several new optimizations, squeezing out about 20% more perfomance in the afforementioned small examples. Some short guide on how I profiled to do this:
here, cprofile ships with python and I then visualized the result using KCacheGrind Overall this was quite easy and I don't have to feel as ashamed of my reconstruction times when I present, we should do it more often. |
odl/space/npy_ntuples.py
Outdated
|
||
if _blas_is_applicable(x1, x2, out): | ||
# If data is very big, use BLAS if possible | ||
if size >= 50000 and _blas_is_applicable(x1, x2, out): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add names for the constants 100 and 50000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please
tmp_2 = sum(Li.adjoint(wi) for Li, wi in zip(L, w2)) | ||
z1.lincomb(1.0, w1, - (tau / 2.0), tmp_2) | ||
x += lam(k) * (z1 - p1) | ||
# Compute tmp_domain = sum(Li.adjoint(wi) for Li, w2i in zip(L, w2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wi -> w2i
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimizations look fine. I tried to eyeball them but didn't run any checks. I guess tests / examples will tell if this code is equivalent.
Check the points I raised, then go ahead.
@@ -535,7 +535,6 @@ class DiscretizedSpaceElement(DiscretizedSetElement, FnBaseVector): | |||
|
|||
def __init__(self, space, data): | |||
"""Initialize a new instance.""" | |||
assert isinstance(space, DiscretizedSpace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree, I think I remove all of these in the #861 PR
odl/discr/grid.py
Outdated
# Cache for efficiency instead of re-computing | ||
try: | ||
strd = self.__stride | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this pattern of trying to access some attribute of self
and react on failure. I would clearly prefer to initialize the attribute to some nonsense ("sentinel") value like None
and check for that instead of checking if it's there at all. The check a is None
should incur no cost whatsoever and make this more robust.
odl/discr/grid.py
Outdated
self.__stride = np.array(strd) | ||
return self.__stride.copy() | ||
else: | ||
return strd.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why copy? I can see this as a general pattern, but that would require changing similar code in lots of places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well because otherwise there is a severe risk of stuff like this:
strides = grid.strides()
strides += 2
# lots of code
strides = grid.strides() # returns bullshit
This (with the copy) is equivalent to the old code
factor = factor * arr[slc] | ||
|
||
out *= factor | ||
|
||
# Finally multiply by the remaining 1d array | ||
slc = [None] * out.ndim | ||
slc[last_ax] = np.s_[:] | ||
slc[last_ax] = slice(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change, now that we understand what's going on :-)
"""Decorate function to cache the result with given arguments. | ||
|
||
This is equivalent to `functools.lru_cache` with Python 3, and currently | ||
does nothing with Python 2 but this may change at some later point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WTF?? Okay, I guess this is still better than depending on some backport package for Python 2. The "later point" at which this changes will be the point when we drop Python 2 compatibility.
In the future
library, this seems to be available anyway: check here OK, not without an extra pip install
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it for now. Python 2 is mostly supported because we have to in a sense. Minor optimizations of 10% should not really be expected on legacy platforms.
lam_k = lam(k) | ||
|
||
# Compute tmp_domain = sum(Li.adjoint(vi) for Li, vi in zip(L, v)) | ||
L[0].adjoint(v[0], out=tmp_domain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we reach this point if L
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow L
empty? we did not before anyway.
I guess we can add an if statement for it, but I'm not even sure if the algorithm is valid in that case.
Fixed the comments, will merge. |
a70a2fc
to
9a4f7fc
Compare
This started out small but in the end i achieved a x2 speedup for small tomographic inversion problems, clearly nice.
Improvements include:
functools.lrc_cache
_lincomb
to_lincomb_impl
since the previous did not play well with the spyder profiler