New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TS map computation performance #506

Merged
merged 5 commits into from Apr 19, 2016

Conversation

Projects
None yet
2 participants
@adonath
Member

adonath commented Apr 9, 2016

This PR includes some further performance improvements to the computation of TS maps, that have been laying around on my hard disk for a while. An interesting lesson I've learned is that on most machines np.log2(x) * 0.69314718055994529 seems to be faster than just np.log(x), because of different underlying implementations in glibc's math.h (here's a link to a discussion on stackoverflow). While this might not be the case on any platform, I definitely see a significant (~50%) speed-up on my Linux machines. So I'm keeping the change.

Furthermore I've added a leastsq iter method to the TS computation function. Right now this is not particularly useful, the performance and accuracy is comparable with the other methods, but it is still a nice cross check to the standard root finding methods.

I don't request any feedback, I'll wait for Travis-CI to pass and then merge.

@adonath

This comment has been minimized.

Show comment
Hide comment
@adonath

adonath Apr 9, 2016

Member

OK, this fails with a linking error for log2 on AppVeyor and Python 2.7, because log2 was only added in Visual studio 2013 (see here) @cdeil Any opinion? Merge or not?

Member

adonath commented Apr 9, 2016

OK, this fails with a linking error for log2 on AppVeyor and Python 2.7, because log2 was only added in Visual studio 2013 (see here) @cdeil Any opinion? Merge or not?

@adonath adonath added the feature label Apr 9, 2016

Show outdated Hide outdated gammapy/detect/test_statistics.py Outdated
Show outdated Hide outdated gammapy/image/maps.py Outdated
@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Apr 11, 2016

Member

After thinking about it a bit: if the effort is not too extensive, maybe exposing this optimisation to the end-user would be nice, so that I can e.g. try it easily on Mac with clang.

For the test, you'd then either have to call the old version always, or if you do call the new version, skip that on Windows.

Member

cdeil commented Apr 11, 2016

After thinking about it a bit: if the effort is not too extensive, maybe exposing this optimisation to the end-user would be nice, so that I can e.g. try it easily on Mac with clang.

For the test, you'd then either have to call the old version always, or if you do call the new version, skip that on Windows.

@cdeil cdeil added this to the 0.4 milestone Apr 14, 2016

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Apr 14, 2016

Member

@adonath - Can you please finish this up?

(I'd like once more to attempt to make a Gammapy 0.4 release, maybe Monday.)

Member

cdeil commented Apr 14, 2016

@adonath - Can you please finish this up?

(I'd like once more to attempt to make a Gammapy 0.4 release, maybe Monday.)

@adonath

This comment has been minimized.

Show comment
Hide comment
@adonath

adonath Apr 18, 2016

Member

@cdeil I implemented a fall back for windows. Once the AppVeyor build pass this is ready to merge.

Member

adonath commented Apr 18, 2016

@cdeil I implemented a fall back for windows. Once the AppVeyor build pass this is ready to merge.

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Apr 18, 2016

Member

Appveyor tests pass ... merge if you like.

If you have time, my feedback would be:

  1. add a comment somewhere that the log2 shenanigans were added because they give better speed with gcc XX on your Linux YY.
  2. can't you keep the code simple by using log (or call it log_fast) and define it like this at the top?

On Linux cdef log(x): return log2(x) * LOG2_TO_E and maybe inline this (hard to imagine that the compiler doesn't do this optimally by itself).
On Windows just keep using log directly.

This should give you the same speed improvement on Linux, but cleaner code, no?

Member

cdeil commented Apr 18, 2016

Appveyor tests pass ... merge if you like.

If you have time, my feedback would be:

  1. add a comment somewhere that the log2 shenanigans were added because they give better speed with gcc XX on your Linux YY.
  2. can't you keep the code simple by using log (or call it log_fast) and define it like this at the top?

On Linux cdef log(x): return log2(x) * LOG2_TO_E and maybe inline this (hard to imagine that the compiler doesn't do this optimally by itself).
On Windows just keep using log directly.

This should give you the same speed improvement on Linux, but cleaner code, no?

@cdeil

This comment has been minimized.

Show comment
Hide comment
@cdeil

cdeil Apr 18, 2016

Member

I couldn't resist and read a bit at http://stackoverflow.com/questions/33809789/why-are-log2-and-log1p-so-much-faster-than-log-and-log10 ... what a complicated mess.
My preference would be that we write clean code and let compilers / libraries handle micro-optimisations like this one (Gammalib has lots of them in all formulae, plus lots of caching for model evaluation, and I find it really adds up to hard to understand code).

On my computer, log2 and log are equally fast.

But I'll admit ... it's also fascinating and in the past I've toyed with micro-optimisations also.

Member

cdeil commented Apr 18, 2016

I couldn't resist and read a bit at http://stackoverflow.com/questions/33809789/why-are-log2-and-log1p-so-much-faster-than-log-and-log10 ... what a complicated mess.
My preference would be that we write clean code and let compilers / libraries handle micro-optimisations like this one (Gammalib has lots of them in all formulae, plus lots of caching for model evaluation, and I find it really adds up to hard to understand code).

On my computer, log2 and log are equally fast.

But I'll admit ... it's also fascinating and in the past I've toyed with micro-optimisations also.

@adonath

This comment has been minimized.

Show comment
Hide comment
@adonath

adonath Apr 19, 2016

Member

I agree we should rather focus on clean code, than micro optimization of performance. And the code is already fast enough, even for my use case. So I removed the log2 optimization again and will merge now.

Member

adonath commented Apr 19, 2016

I agree we should rather focus on clean code, than micro optimization of performance. And the code is already fast enough, even for my use case. So I removed the log2 optimization again and will merge now.

@adonath adonath merged commit 32c2904 into gammapy:master Apr 19, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@cdeil cdeil changed the title from TS performance improvements to Improve TS map computation performance Apr 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment