Improve TS map computation performance #506

adonath · 2016-04-09T10:55:24Z

This PR includes some further performance improvements to the computation of TS maps, that have been laying around on my hard disk for a while. An interesting lesson I've learned is that on most machines np.log2(x) * 0.69314718055994529 seems to be faster than just np.log(x), because of different underlying implementations in glibc's math.h (here's a link to a discussion on stackoverflow). While this might not be the case on any platform, I definitely see a significant (~50%) speed-up on my Linux machines. So I'm keeping the change.

Furthermore I've added a leastsq iter method to the TS computation function. Right now this is not particularly useful, the performance and accuracy is comparable with the other methods, but it is still a nice cross check to the standard root finding methods.

I don't request any feedback, I'll wait for Travis-CI to pass and then merge.

adonath · 2016-04-09T11:50:37Z

OK, this fails with a linking error for log2 on AppVeyor and Python 2.7, because log2 was only added in Visual studio 2013 (see here) @cdeil Any opinion? Merge or not?

cdeil · 2016-04-11T10:24:56Z

gammapy/detect/_test_statistics_cython.pyx

+    cdef unsigned int i, j, ni, nj
+    ni = counts.shape[1]
+    nj = counts.shape[0]
+    sum = 0


Is it possible to initialise variables directly in the cdef statement?

cdeil · 2016-04-11T10:28:38Z

After thinking about it a bit: if the effort is not too extensive, maybe exposing this optimisation to the end-user would be nice, so that I can e.g. try it easily on Mac with clang.

For the test, you'd then either have to call the old version always, or if you do call the new version, skip that on Windows.

cdeil · 2016-04-14T13:10:42Z

@adonath - Can you please finish this up?

(I'd like once more to attempt to make a Gammapy 0.4 release, maybe Monday.)

adonath · 2016-04-18T19:05:38Z

@cdeil I implemented a fall back for windows. Once the AppVeyor build pass this is ready to merge.

cdeil · 2016-04-18T19:25:33Z

Appveyor tests pass ... merge if you like.

If you have time, my feedback would be:

add a comment somewhere that the log2 shenanigans were added because they give better speed with gcc XX on your Linux YY.
can't you keep the code simple by using log (or call it log_fast) and define it like this at the top?

On Linux cdef log(x): return log2(x) * LOG2_TO_E and maybe inline this (hard to imagine that the compiler doesn't do this optimally by itself).
On Windows just keep using log directly.

This should give you the same speed improvement on Linux, but cleaner code, no?

cdeil · 2016-04-18T19:44:17Z

I couldn't resist and read a bit at http://stackoverflow.com/questions/33809789/why-are-log2-and-log1p-so-much-faster-than-log-and-log10 ... what a complicated mess.
My preference would be that we write clean code and let compilers / libraries handle micro-optimisations like this one (Gammalib has lots of them in all formulae, plus lots of caching for model evaluation, and I find it really adds up to hard to understand code).

On my computer, log2 and log are equally fast.

But I'll admit ... it's also fascinating and in the past I've toyed with micro-optimisations also.

adonath · 2016-04-19T11:42:32Z

I agree we should rather focus on clean code, than micro optimization of performance. And the code is already fast enough, even for my use case. So I removed the log2 optimization again and will merge now.

adonath added the feature label Apr 9, 2016

cdeil reviewed Apr 11, 2016
View reviewed changes

cdeil added this to the 0.4 milestone Apr 14, 2016

cdeil assigned adonath Apr 14, 2016

adonath force-pushed the ts_leastsq_iter branch 6 times, most recently from 8419b95 to c3df47c Compare April 18, 2016 17:49

adonath added 5 commits April 19, 2016 12:49

add iterative least sqares to ts computation

fdfc5e5

Minor performace improvements to TS map computation

4098fe8

Disable boundscheck in cython functions

b55676e

Add windows fallback

8a71598

Remove log2 ts optimization

06d7a48

adonath force-pushed the ts_leastsq_iter branch from c3df47c to 06d7a48 Compare April 19, 2016 10:50

adonath merged commit 32c2904 into gammapy:master Apr 19, 2016

cdeil changed the title ~~TS performance improvements~~ Improve TS map computation performance Apr 20, 2016

adonath deleted the ts_leastsq_iter branch November 20, 2018 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve TS map computation performance #506

Improve TS map computation performance #506

adonath commented Apr 9, 2016

adonath commented Apr 9, 2016

cdeil Apr 11, 2016

cdeil commented Apr 11, 2016

cdeil commented Apr 14, 2016

adonath commented Apr 18, 2016

cdeil commented Apr 18, 2016

cdeil commented Apr 18, 2016

adonath commented Apr 19, 2016

Improve TS map computation performance #506

Improve TS map computation performance #506

Conversation

adonath commented Apr 9, 2016

adonath commented Apr 9, 2016

cdeil Apr 11, 2016

Choose a reason for hiding this comment

cdeil commented Apr 11, 2016

cdeil commented Apr 14, 2016

adonath commented Apr 18, 2016

cdeil commented Apr 18, 2016

cdeil commented Apr 18, 2016

adonath commented Apr 19, 2016