Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements #442

Open
dtnguyen2 opened this issue Apr 12, 2018 · 10 comments
Open

Performance Improvements #442

dtnguyen2 opened this issue Apr 12, 2018 · 10 comments

Comments

@dtnguyen2
Copy link
Contributor

Model evaluation
- Model caching, not to recompute models with frozen parameters
- Paralellization with intelligent parsing of the model expression
Slowness in fitting X-ray spectra
- issues in simultaneous fitting of multiple data (more general) - do not reevaluate the models with the
same parameters multiple times
Parallelization for MonCar and Nelder-Mead methods:

  • lower priority until the two parts as described above are done
@JohannesBuchner
Copy link
Contributor

Model evaluation

  • Model caching, not to recompute models with frozen parameters
  • Paralellization with intelligent parsing of the model expression
    Slowness in fitting X-ray spectra
  • issues in simultaneous fitting of multiple data (more general) - do not reevaluate the models with the
    same parameters multiple times

These would be great. Another low-hanging fruit improvement would be to not recomputing apec (and other xspec models) shape if only the normalization parameter was changed.

We are fitting a sum of 10 apecs with the temperatures fixed in logarithmic steps, where the normalisations are linked to a formula whose parameters are varied. There is no real reason why this should be slow.

So I'd propose model caching, not to recompute models with unchanged parameters. This can be achieved with memoization.

I placed a practical benchmark here: https://github.com/JohannesBuchner/xray-fit-benchmark

@hamogu
Copy link
Contributor

hamogu commented Nov 24, 2023

In general, models are cached in Sherpa using the

def modelCacher1d(func):
decorator.

I'm not sure what @dtnguyen2 had in mind when he wrote that, but the first bullet should be solved by now.

I agree with

Another low-hanging fruit improvement would be to not recomputing apec (and other xspec models) shape if only the normalization parameter was changed.

Unfortunately, that's a little harder to implement in a general sense: For all XSPEC models, we pass all model parameters to the XSPEC function. There is no special markup which of those is the norm and which of those require a re-compute. We could make an informed guess that any parameter with the substring "norm" in it may not require a recomputation, but that's seem brittle (it would work for APEC, though).

However, you should be able to achieve the same effect by changing your model definitions; instead of set_source(apec.a1 + apec.a2) you could do

sect_source(scale1d.norm1 * apec1.a1 + scale1d.norm2 * apec.a2)
a1.norm = 1
a2.norm =1
# freeze all the apec parameters ...

Because there are no changes to any of the parameters for apec1 and apec2 in the fitting loop now, it should not be eveluated again.

@JohannesBuchner
Copy link
Contributor

Thanks @hamogu!

Unfortunately, that's a little harder to implement in a general sense: For all XSPEC models, we pass all model parameters to the XSPEC function. There is no special markup which of those is the norm and which of those require a re-compute. We could make an informed guess that any parameter with the substring "norm" in it may not require a recomputation, but that's seem brittle (it would work for APEC, though).

Maybe sherpa could maintain a list of additive models for which "norm" exists and this is safe to do, and automatically wrap those, i.e., add a scale1d.norm and keep xspec.norm=1.

With your workaround, the speed increased from 15 model evaluations per second to 20 model evaluations per second.

I could not apply modelCacher1d to the xspec table model:

Traceback (most recent call last):
  File "generateconst.py", line 28, in <module>
    mymodel = xstbabs.galabso * (scale1d.torusnorm * modelCacher1d(torus) + scale1d.omninorm * modelCacher1d(omni))
  File "/home/user/bin/ciao-4.13/lib/python3.7/site-packages/sherpa-4.13.0-py3.7-linux-x86_64.egg/sherpa/models/model.py", line 86, in modelCacher1d
    cache_model.__name__ = func.__name__
  File "/home/user/bin/ciao-4.13/lib/python3.7/site-packages/sherpa-4.13.0-py3.7-linux-x86_64.egg/sherpa/models/model.py", line 203, in __getattr__
    NoNewAttributesAfterInit.__getattribute__(self, name)
AttributeError: 'XSTableModel' object has no attribute '__name__'

@DougBurke
Copy link
Contributor

DougBurke commented Nov 24, 2023 via email

@hamogu
Copy link
Contributor

hamogu commented Nov 25, 2023

@JohannesBuchner The caching for XSPEC models should already be applied without you doing things. @DougBurke is the master of knowing how it's done, but the goal is that, as a user, you don't have to do anything because we apply sensible caching as default.

@DougBurke
Copy link
Contributor

So, the cache code is complicated to understand as there's some complexities there I don't quite understand and have never had time to track down what's going on (last time I tried to look at how well the cache was working the results didn't make sense to me; perhaps it's time to look again).

One thing I will say, and it echoes the suggestion in #442 (comment) is that the current cache code works at the python model class instance level - that is, if you have

mdl1 = XSapec("m1")
mdl2 = Xsapec("m2")

then there are caches for mdl1 and mdl2, so even if the parameters are exactly the same then the XSPEC function would be called twice. This means that you do want to use the same model instance as much as possible (with an extra constant-type term where necessary) to try and benefit from the cache code. I added was the cache_status method for models to get some indication if it's being used - e.g.

sherpa In [15]: clean()

sherpa In [16]: create_model_component("xsvapec", "m1")
Out[16]: <XSvapec model instance 'xsvapec.m1'>

sherpa In [17]: m1.cache_status()
 xsvapec.m1                 size:    1  hits:     0  misses:     0  check:     0

sherpa In [18]: m1([0.1, 0.2, 0.3], [0.2, 0.3, 0.4])
Out[18]: array([0.32428567, 0.16706778, 0.10965365])

sherpa In [19]: m1.cache_status()
 xsvapec.m1                 size:    1  hits:     0  misses:     1  check:     1

sherpa In [20]: m1([0.1, 0.2, 0.3], [0.2, 0.3, 0.4])
Out[20]: array([0.32428567, 0.16706778, 0.10965365])

sherpa In [21]: m1.cache_status()
 xsvapec.m1                 size:    1  hits:     1  misses:     1  check:     2

sherpa In [22]: m1([0.1, 0.2, 0.3], [0.2, 0.3, 0.4])
Out[22]: array([0.32428567, 0.16706778, 0.10965365])

sherpa In [23]: m1.cache_status()
 xsvapec.m1                 size:    1  hits:     2  misses:     1  check:     3

sherpa In [24]: m1.kt = 2

sherpa In [25]: m1([0.1, 0.2, 0.3], [0.2, 0.3, 0.4])
Out[25]: array([0.53666901, 0.22059353, 0.15235161])

sherpa In [26]: m1.cache_status()
 xsvapec.m1                 size:    1  hits:     2  misses:     2  check:     4

@JohannesBuchner
Copy link
Contributor

Even with independent caches, since my model keeps all parameters except for the normalisation the same, it should be so slow.

I cannot find where create_model_component is defined (missing in github search).

If that would wrap some selected models and handle norm internally, it would be solved.

Another idea I had is that projection through the response could potentially be cached, if one applied a distributing transformation to the model tree (Resp * (a * ma + b * mb + c * mc) -> a * Resp * ma + b * Resp * mb + c * Resp * mc), which would be trivially fast to compute if only normalisations change. This can then be used in combination with block sampling.

Is there a way to use sherpa in a pedestrian fashion? Can I do something like:

mymodel = get_model(myid)
spec = get_data(myid)
bkg = get_bkg(myid)
statistic = compute_cstat(mymodel.projected_model_spectrum, spec.counts, bkg.counts, bkg.scale)

What is the right way for that last line?

@DougBurke
Copy link
Contributor

FYI I don't have much time to spend time on so it'll take some time for me to respond.

@hamogu
Copy link
Contributor

hamogu commented Dec 7, 2023

FYI I don't have much time to spend time on so it'll take some time for me to respond.

It took me some time to count how many times the word "time" can be used in a sentence with < 20 words..

@hamogu
Copy link
Contributor

hamogu commented Dec 7, 2023

But seriously, there is a trade-off between developer time and run-time. The suggestion in #442 (comment) should work in principle, but it's not simple in any sense.
It requires treating certain model parameters special, starting with XSPEC model norms. But why stop there? How about norms in Sherpa models? Or in user-defined models?
I'm not saying it won't work and it's not worthwhile to do in Sherpa itself or at least as a documented "work-around" for users that do need that speed-up in some case. What I am saying is that I don't know if that's the most efficient use of limited developer time or if there are other bottlenecks that we should address first that are either easier to do or more general and thus bring benefit to more users.

For that, it would be good to actually write down a few end-to-end workflows that we can benchmark and profile to identify where we should focus our efforts. Is it caching as discussed here? Parallelization? Avoiding temporary copies of big arrays? Move more Python Code to C? Avoid overhead of calling C by moving more code to Python/Numpy? Just-in-time compilation of Python code with Numba / JAX / ...? Find better optimizers / use analytical gradiends/ autograd / to reduce the number of model evaluations needed to get a good fit?

Any of those can help in specific cases, but without profiling a set of representative use cases, I don't know which of those are high or low priority or have a high or low likelihood of success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants