New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve TS map computation performance #506
Conversation
cdef unsigned int i, j, ni, nj | ||
ni = counts.shape[1] | ||
nj = counts.shape[0] | ||
sum = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to initialise variables directly in the cdef statement?
After thinking about it a bit: if the effort is not too extensive, maybe exposing this optimisation to the end-user would be nice, so that I can e.g. try it easily on Mac with clang. For the test, you'd then either have to call the old version always, or if you do call the new version, skip that on Windows. |
@adonath - Can you please finish this up? (I'd like once more to attempt to make a Gammapy 0.4 release, maybe Monday.) |
8419b95
to
c3df47c
Compare
@cdeil I implemented a fall back for windows. Once the AppVeyor build pass this is ready to merge. |
Appveyor tests pass ... merge if you like. If you have time, my feedback would be:
On Linux This should give you the same speed improvement on Linux, but cleaner code, no? |
I couldn't resist and read a bit at http://stackoverflow.com/questions/33809789/why-are-log2-and-log1p-so-much-faster-than-log-and-log10 ... what a complicated mess. On my computer, But I'll admit ... it's also fascinating and in the past I've toyed with micro-optimisations also. |
c3df47c
to
06d7a48
Compare
I agree we should rather focus on clean code, than micro optimization of performance. And the code is already fast enough, even for my use case. So I removed the |
This PR includes some further performance improvements to the computation of TS maps, that have been laying around on my hard disk for a while. An interesting lesson I've learned is that on most machines
np.log2(x) * 0.69314718055994529
seems to be faster than justnp.log(x)
, because of different underlying implementations inglibc
'smath.h
(here's a link to a discussion on stackoverflow). While this might not be the case on any platform, I definitely see a significant (~50%) speed-up on my Linux machines. So I'm keeping the change.Furthermore I've added a
leastsq iter
method to the TS computation function. Right now this is not particularly useful, the performance and accuracy is comparable with the other methods, but it is still a nice cross check to the standard root finding methods.I don't request any feedback, I'll wait for Travis-CI to pass and then merge.