use cache for predctions in likelihood #80

peterstangl · 2019-03-15T09:39:28Z

This PR implements caching for predictions of observables inside a MeasurementLikelihood instance.

coveralls · 2019-03-15T09:45:47Z

Coverage decreased (-0.003%) to 94.012% when pulling 7b08a5c on peterstangl:predictions_cache into 5a9b64e on flav-io:master.

DavidMStraub · 2019-03-15T10:05:53Z

That's a brilliant idea!

Do I understand correctly that the idea is that the cost of hashing is negligible since it is done only once and is small compared to the calculation of all observables? Did you check how long it roughly takes (just to understand if it would affect e.g. a MeasurementLikelihood with just a single, very fast observable)?

We should also add a comment to the doc string about the existence of caching and a comment in the code as to what the line defining predictions_key is doing.

Concerning the hashing, I see a potential problem with hashing wc_obj. The problem is that, for historical reasons, WilsonCoefficients was initialized empty and WCs were set afterwards with set_initial. But the default hash of a user class will stay the same over the lifetime of the object, so it won't change when the WCs change.

For Wilson, this would be less of an issue, as WCs are set on instantiation. Nevertheless, it would be better to have a custom __hash__ that also looks at the config dictionary etc. Since this is probably not too relevant for wilson, this could be done aat the level of WilsonCoefficients. The questions is how to get a fast hash. I will play around a bit.

peterstangl · 2019-03-15T10:26:19Z

Do I understand correctly that the idea is that the cost of hashing is negligible since it is done only once and is small compared to the calculation of all observables? Did you check how long it roughly takes (just to understand if it would affect e.g. a MeasurementLikelihood with just a single, very fast observable)?

The hashing takes around 30 μs on my laptop. How fast is the fastest observable? Probably still some orders of magnitude slower?

We should also add a comment to the doc string about the existence of caching and a comment in the code as to what the line defining predictions_key is doing.

OK, I will add some comments.

Concerning the hashing, I see a potential problem with hashing wc_obj. The problem is that, for historical reasons, WilsonCoefficients was initialized empty and WCs were set afterwards with set_initial. But the default hash of a user class will stay the same over the lifetime of the object, so it won't change when the WCs change.

For Wilson, this would be less of an issue, as WCs are set on instantiation. Nevertheless, it would be better to have a custom __hash__ that also looks at the config dictionary etc. Since this is probably not too relevant for wilson, this could be done aat the level of WilsonCoefficients. The questions is how to get a fast hash. I will play around a bit.

It might be good to actually recompute the hash of a wc_obj each time the WCs are changed and save this hash in the wc_obj such that it can be retrieved very fast. It would be nice, if the hash is constructed only from the defining wc_dict, the scale and the basis with

hash((frozenset(wc.dict.items()),wc.basis,wc.scale))

Using such a hash, the hash for two different wc_obj would be the same if they describe the same point in an EFT basis.

DavidMStraub · 2019-03-15T10:28:35Z

So for me, hashing the par_dict takes about 60 µs on my machine, hashing a general wcxf.WC.dict about 230 µs. But here are some fast observables that only take about O(10 µs). However I guess it is OK to assume wcxf.WC instances to remain unchanged during their lifetime.

DavidMStraub · 2019-03-15T10:31:46Z

OK so in conclusion I think we can merge your PR already, with only

doc string
code comment
wc_obj.__hash__() (this is a private method) → hash(wc_obj)

I can then separaretly implement WilsonCoefficients.__hash__. Assuming wcxf.WC to be immutable, this would essentially amount to

hash((self.wc, frozenset(self._options)))

if I am not mistaken.

DavidMStraub · 2019-03-15T10:50:24Z

... actually since this is just 3 lines, you can just add this to your PR. in WilsonCoefficients:

def __hash__(self):
    """Return a hash of the `WilsonCoefficient` instance.
    This assumes that `self.wc` is not modified over its lifetime. The hash only changes when options are modified."""
    hash((self.wc, frozenset(self._options)))

DavidMStraub · 2019-03-15T10:54:34Z

Sorry, this docstring is misleading, as the attribute self.wc can indeed change, just not the instances themselves. Better:

    """Return a hash of the `WilsonCoefficient` instance.

    The hash changes when Wilson coefficient values or options are modified.
    It assumes that `wcxf.WC` instances are not modified after instantiation."""

DavidMStraub · 2019-03-15T11:53:00Z

I realized that this solution as of now has a memory leak: it caches all calls, which will quickly eat up all memory in a scan.

So either we need a LRU cache or, much simpler, just cache the last value called. This should be sufficient for our use case.

peterstangl · 2019-03-15T16:17:50Z

... actually since this is just 3 lines, you can just add this to your PR. in WilsonCoefficients:

def __hash__(self):
    """Return a hash of the `WilsonCoefficient` instance.
    This assumes that `self.wc` is not modified over its lifetime. The hash only changes when options are modified."""
    hash((self.wc, frozenset(self._options)))

This actually does not work since self._options is not set on initialization. Should I just set it to None on init?

peterstangl · 2019-03-15T16:19:25Z

Actually, None does also not work, but I could set it to {}.

peterstangl · 2019-03-15T17:27:10Z

@DavidMStraub I think this is ready to be merged.

use cache for predctions in likelihood

0703555

use hash() function instead of __hash__() method

91a5670

cache only values of last call

57f97bb

peterstangl added 4 commits March 15, 2019 17:46

use hash of tuple instead of tuple of hashes

0b9057b

add custom __hash__() method to WilsonCoefficient class

9ed891c

add comments and change name of hash variable

f10a9b8

use separate attributes for hash and values in cache, make them private

7b08a5c

DavidMStraub merged commit 48347a1 into flav-io:master Mar 15, 2019

peterstangl deleted the predictions_cache branch March 15, 2019 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use cache for predctions in likelihood #80

use cache for predctions in likelihood #80

peterstangl commented Mar 15, 2019

coveralls commented Mar 15, 2019 •

edited

Loading

DavidMStraub commented Mar 15, 2019

peterstangl commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019 •

edited

Loading

DavidMStraub commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019

peterstangl commented Mar 15, 2019

peterstangl commented Mar 15, 2019

peterstangl commented Mar 15, 2019

use cache for predctions in likelihood #80

use cache for predctions in likelihood #80

Conversation

peterstangl commented Mar 15, 2019

coveralls commented Mar 15, 2019 • edited Loading

DavidMStraub commented Mar 15, 2019

peterstangl commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019 • edited Loading

DavidMStraub commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019

DavidMStraub commented Mar 15, 2019

peterstangl commented Mar 15, 2019

peterstangl commented Mar 15, 2019

peterstangl commented Mar 15, 2019

coveralls commented Mar 15, 2019 •

edited

Loading

DavidMStraub commented Mar 15, 2019 •

edited

Loading