Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Implement sparse summed fit statistics in cython #2104
This PR is an "experimental" change I tried to implement the "sparse" evaluation for
Here is the runtime of the affected notebooks with and without this change:
And the current master:
It's clear that this is not the bottleneck in our likelihood evaluation, still it seems we spend a non-negligible amount of the computing time evaluating the likelihood for empty bins.
@adonath - Such a PR on performance is really hard to review.
You show timings for a few high-level analyses - that's good. But a few got a bit faster, and at least one became slower, and in all cases we don't really know why. It's hard to reproduce or check as a reviewer, because you don't add the new & fast implementation as an extra option, but replace the old one.
Overall I would find it surprising if we spend a significant amount of time in
For the change in
So @adonath - overall I'm not sure how to continue.
One idea could be to simplify as much as possible first:
And then to get a well-defined test case and insight into performance:
Overall I wouldn't have suggested to work on this performance optimisation now - but if what comes out is some insight into where we spend our time during map fitting, or better handling / caching of masks on
Thanks @cdeil, I'd like to make a few comments and add a little bit more information.
Here is the
I also was surprised, that the calls to
The main point you miss e.g. when comparing to the spatial model evaluation calls, is the number of pixels you make the computation for. There are ~O(1e6) pixels in a typical map-dataset, but the spatial models are evaluated on a cutout and using factorisation. So they are only evaluated for ~O(1e4) pixels, which is two orders of magnitude less. The cash likelihood is evaluated for all ~O(1e6) pixels in each iteration. Assuming 3 datasets and 300 function evaluations until the likelihood fit converges (as in the joint analysis notebook), you spend ~20s just to evaluate the
Which is basically the difference in runtime we see in the
In the MCMC notebook, because it uses a simulated dataset with high statistics, the fraction of empty pixels is ~50%. This is probably where the additional
Yes, but I think it's not really a good idea to make this an option and expose it in the API? How would the user decide whether to use it or not? I think in this case it's rather a non-option and just an example, where we waste computation time because of the limitations we have with numpy.
The main question for me is what's the cost this change comes for? So given that our data is typically sparse, is it acceptable to introduce a custom likelihood evaluation for this case (and basically reimplementing ~20 lines of fit statistics in cython for 20% performance gain) or not. I think the decision is not completely obvious. To implement this change took basically the same time as writing this comment, so I don't feel strongly about whether this PR is merged or not, but I think it's good to have a place for this kind of discussion.
I definitely don't want to make performance optimisation a priority for Gammapy now. Doing extensive performance studies and setting-up a benchmark system is definitely future work (even after 1.0...). But in general I'm open to accept these kind of little improvements (that probably every developer enjoys), if they come at few costs.
@adonath - OK to merge this in soon if you like. Would suggest to add the test with the edge cases for zero counts or npred that shows that the result of our two cash functions are the same (need four array entries I guess).
Discussed with @adonath offline just now.
A general note: results are very much dtype-dependent, for me like this:
So basically a log call is only as costly as two additions?!
So checking which dtypes are used in our datasets is also important and has the potential to speed-computations up by a factor of ~ 2.
Thanks @cdeil for the additional comments. I added a reference test and a TODO comment to move the mask handling to the Cython functions too. I'll address this, when refactoring the parameter and mask handling in our dataset classes. I'll merge this PR now.
Apr 3, 2019
8 of 9 checks passed
@adonath and all - The C++ HPC lectures from Pierre this week were very interesting: There are many different effects and tricks one can use to make something like the Cash function faster.
This PR introduced several changes and overall got a speedup - but it's not quite clear what is going on. Specifically this suggests that introducing an
Now in our case, this has to be weighed against the cost of a
So overall I think if one really wants to optimise