New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TS map computation performance #506

Merged
merged 5 commits into from Apr 19, 2016

Conversation

Projects
None yet
2 participants
@adonath
Member

adonath commented Apr 9, 2016

This PR includes some further performance improvements to the computation of TS maps, that have been laying around on my hard disk for a while. An interesting lesson I've learned is that on most machines np.log2(x) * 0.69314718055994529 seems to be faster than just np.log(x), because of different underlying implementations in glibc's math.h (here's a link to a discussion on stackoverflow). While this might not be the case on any platform, I definitely see a significant (~50%) speed-up on my Linux machines. So I'm keeping the change.

Furthermore I've added a leastsq iter method to the TS computation function. Right now this is not particularly useful, the performance and accuracy is comparable with the other methods, but it is still a nice cross check to the standard root finding methods.

I don't request any feedback, I'll wait for Travis-CI to pass and then merge.

@adonath

This comment has been minimized.

Member

adonath commented Apr 9, 2016

OK, this fails with a linking error for log2 on AppVeyor and Python 2.7, because log2 was only added in Visual studio 2013 (see here) @cdeil Any opinion? Merge or not?

@adonath adonath added the feature label Apr 9, 2016

cdef unsigned int i, j, ni, nj
ni = counts.shape[1]
nj = counts.shape[0]
sum = 0

This comment has been minimized.

@cdeil

cdeil Apr 11, 2016

Member

Is it possible to initialise variables directly in the cdef statement?

"""Helper function to extract parts of a larger array.
Simple implementation of an array extract function , because
`~astropy.ndata.utils.extract_array` introduces to much overhead.`

This comment has been minimized.

@cdeil

cdeil Apr 11, 2016

Member

typo: "to" -> "too"
There's a spurious backtick at the end of the line that should be removed.

@@ -346,7 +346,7 @@ def reproject(self, refheader, mode='interp', *args, **kwargs):
input_data=self.to_image_hdu(),
output_projection=refheader,
*args,
**kwargs,
**kwargs

This comment has been minimized.

@cdeil

cdeil Apr 11, 2016

Member

This bug was fixed just now in another PR.
Please rebase this one.

@cdeil

This comment has been minimized.

Member

cdeil commented Apr 11, 2016

After thinking about it a bit: if the effort is not too extensive, maybe exposing this optimisation to the end-user would be nice, so that I can e.g. try it easily on Mac with clang.

For the test, you'd then either have to call the old version always, or if you do call the new version, skip that on Windows.

@cdeil cdeil added this to the 0.4 milestone Apr 14, 2016

@cdeil

This comment has been minimized.

Member

cdeil commented Apr 14, 2016

@adonath - Can you please finish this up?

(I'd like once more to attempt to make a Gammapy 0.4 release, maybe Monday.)

@adonath adonath force-pushed the adonath:ts_leastsq_iter branch 6 times, most recently from 8419b95 to c3df47c Apr 18, 2016

@adonath

This comment has been minimized.

Member

adonath commented Apr 18, 2016

@cdeil I implemented a fall back for windows. Once the AppVeyor build pass this is ready to merge.

@cdeil

This comment has been minimized.

Member

cdeil commented Apr 18, 2016

Appveyor tests pass ... merge if you like.

If you have time, my feedback would be:

  1. add a comment somewhere that the log2 shenanigans were added because they give better speed with gcc XX on your Linux YY.
  2. can't you keep the code simple by using log (or call it log_fast) and define it like this at the top?

On Linux cdef log(x): return log2(x) * LOG2_TO_E and maybe inline this (hard to imagine that the compiler doesn't do this optimally by itself).
On Windows just keep using log directly.

This should give you the same speed improvement on Linux, but cleaner code, no?

@cdeil

This comment has been minimized.

Member

cdeil commented Apr 18, 2016

I couldn't resist and read a bit at http://stackoverflow.com/questions/33809789/why-are-log2-and-log1p-so-much-faster-than-log-and-log10 ... what a complicated mess.
My preference would be that we write clean code and let compilers / libraries handle micro-optimisations like this one (Gammalib has lots of them in all formulae, plus lots of caching for model evaluation, and I find it really adds up to hard to understand code).

On my computer, log2 and log are equally fast.

But I'll admit ... it's also fascinating and in the past I've toyed with micro-optimisations also.

@adonath adonath force-pushed the adonath:ts_leastsq_iter branch from c3df47c to 06d7a48 Apr 19, 2016

@adonath

This comment has been minimized.

Member

adonath commented Apr 19, 2016

I agree we should rather focus on clean code, than micro optimization of performance. And the code is already fast enough, even for my use case. So I removed the log2 optimization again and will merge now.

@adonath adonath merged commit 32c2904 into gammapy:master Apr 19, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@cdeil cdeil changed the title from TS performance improvements to Improve TS map computation performance Apr 20, 2016

@adonath adonath deleted the adonath:ts_leastsq_iter branch Nov 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment