Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TS map computation performance #506

Merged
merged 5 commits into from Apr 19, 2016

Conversation

adonath
Copy link
Member

@adonath adonath commented Apr 9, 2016

This PR includes some further performance improvements to the computation of TS maps, that have been laying around on my hard disk for a while. An interesting lesson I've learned is that on most machines np.log2(x) * 0.69314718055994529 seems to be faster than just np.log(x), because of different underlying implementations in glibc's math.h (here's a link to a discussion on stackoverflow). While this might not be the case on any platform, I definitely see a significant (~50%) speed-up on my Linux machines. So I'm keeping the change.

Furthermore I've added a leastsq iter method to the TS computation function. Right now this is not particularly useful, the performance and accuracy is comparable with the other methods, but it is still a nice cross check to the standard root finding methods.

I don't request any feedback, I'll wait for Travis-CI to pass and then merge.

@adonath
Copy link
Member Author

adonath commented Apr 9, 2016

OK, this fails with a linking error for log2 on AppVeyor and Python 2.7, because log2 was only added in Visual studio 2013 (see here) @cdeil Any opinion? Merge or not?

@adonath adonath added the feature label Apr 9, 2016
cdef unsigned int i, j, ni, nj
ni = counts.shape[1]
nj = counts.shape[0]
sum = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to initialise variables directly in the cdef statement?

@cdeil
Copy link
Contributor

cdeil commented Apr 11, 2016

After thinking about it a bit: if the effort is not too extensive, maybe exposing this optimisation to the end-user would be nice, so that I can e.g. try it easily on Mac with clang.

For the test, you'd then either have to call the old version always, or if you do call the new version, skip that on Windows.

@cdeil cdeil added this to the 0.4 milestone Apr 14, 2016
@cdeil
Copy link
Contributor

cdeil commented Apr 14, 2016

@adonath - Can you please finish this up?

(I'd like once more to attempt to make a Gammapy 0.4 release, maybe Monday.)

@adonath adonath force-pushed the ts_leastsq_iter branch 6 times, most recently from 8419b95 to c3df47c Compare April 18, 2016 17:49
@adonath
Copy link
Member Author

adonath commented Apr 18, 2016

@cdeil I implemented a fall back for windows. Once the AppVeyor build pass this is ready to merge.

@cdeil
Copy link
Contributor

cdeil commented Apr 18, 2016

Appveyor tests pass ... merge if you like.

If you have time, my feedback would be:

  1. add a comment somewhere that the log2 shenanigans were added because they give better speed with gcc XX on your Linux YY.
  2. can't you keep the code simple by using log (or call it log_fast) and define it like this at the top?

On Linux cdef log(x): return log2(x) * LOG2_TO_E and maybe inline this (hard to imagine that the compiler doesn't do this optimally by itself).
On Windows just keep using log directly.

This should give you the same speed improvement on Linux, but cleaner code, no?

@cdeil
Copy link
Contributor

cdeil commented Apr 18, 2016

I couldn't resist and read a bit at http://stackoverflow.com/questions/33809789/why-are-log2-and-log1p-so-much-faster-than-log-and-log10 ... what a complicated mess.
My preference would be that we write clean code and let compilers / libraries handle micro-optimisations like this one (Gammalib has lots of them in all formulae, plus lots of caching for model evaluation, and I find it really adds up to hard to understand code).

On my computer, log2 and log are equally fast.

But I'll admit ... it's also fascinating and in the past I've toyed with micro-optimisations also.

@adonath
Copy link
Member Author

adonath commented Apr 19, 2016

I agree we should rather focus on clean code, than micro optimization of performance. And the code is already fast enough, even for my use case. So I removed the log2 optimization again and will merge now.

@adonath adonath merged commit 32c2904 into gammapy:master Apr 19, 2016
@cdeil cdeil changed the title TS performance improvements Improve TS map computation performance Apr 20, 2016
@adonath adonath deleted the ts_leastsq_iter branch November 20, 2018 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants