Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining the watching and triggering semantics #269

Closed
jlstevens opened this issue Sep 20, 2018 · 13 comments
Closed

Defining the watching and triggering semantics #269

jlstevens opened this issue Sep 20, 2018 · 13 comments
Labels
component: depends/watch status: discussion Discussion. Not yet a specific feature/bug. Likely to result in multiple PRs/issues.

Comments

@jlstevens
Copy link
Contributor

jlstevens commented Sep 20, 2018

Param has recently been given the ability to 'watch' parameters, triggering callbacks when parameters are set. There is an important subtlety to think about as currently triggering occurs when parameters are set regardless of their value.

This is rather surprising: your watching callback will be triggered even if you keep setting the parameters to the same values (either with attribute setting or set_param). In my opinion, what you really want to do is watch parameters for changes.

Given these semantics, it would be nice to let users trigger watch callbacks without value changes should they decide to. Discussing this with @jbednar, we propose the following that avoids adding a boolean flag to the watch API:

  1. Update the semantics of the watch mechanism to be specific to parameter changes.
  2. Add a trigger method that takes a list of parameter names to trigger the (applicable) watch callbacks using the current set of parameter values. E.g foo.trigger('a','b') would pass value Change objects to the applicable watch callbacks for the current values of parameters a and b.

Unfortunately, updating the watch mechanism is itself a non trivial task as it is hard to know when something (mutable) has changed for anything other than simple types. This is the same problem we face if we want to generalize the memoization machinery in HoloViews so it can be used in param.

Ideally, I think we would tackle the following issues in param in this order:

  1. Move the memoization/equality comparison machinery out of holoviews and expose it in a useful way from param.
  2. Use this machinery to implement trigger and update watch with the new semantics.
  3. Continue work on batching parameter changes on top of the updated semantics.

In practice, I think we can skip item 1 for now and partially implement item 2, with an explicit note that the equality checking is not sophisticated for complex mutable types. Then I can continue working on item 3 which is the next priority. When we have a chance, we can then implement item 1 and finish off item 2 to handle more complex types.

Does this seem reasonable?

@jlstevens
Copy link
Contributor Author

I'll note that equality testing is easy for the other attributes (what) supported by the watch mechanism as they are simple immutable literals e.g constant is a boolean, bounds is a numeric tuple etc. It is detecting the parameter value changes that is particularly tricky.

@jbednar
Copy link
Member

jbednar commented Sep 20, 2018

That all sounds good to me.

@philippjfr
Copy link
Member

philippjfr commented Sep 21, 2018

I'm not at all fond of the hashing code in holoviews and would be very hesitant porting that code to param. For pretty much any array or dataframe data that code is extremely slow, so much so that I've basically had to bypass it entirely to get usable performance out of any data based Stream classes. In fact, in practice I've found that in a vast majority of cases the hashing is so slow that it actually provides no benefit over just disabling the memoization.

So I agree with the order you lay out at the end, starting by implementing this for cases where simple equality works and can decide to expand that to more complex types at a later point, hopefully with a cleaner/better/faster approach than currently implemented in holoviews.

@philippjfr
Copy link
Member

philippjfr commented Sep 21, 2018

Just to get my thoughts recorded somewhere, basically I believe any approach exclusively based on hashing is infeasible, because it requires the whole data to be hashed before any comparison can happen. The only thing I can see being feasible is a hybrid approach that performs some simple checks before a full hash is generated, e.g. things like:

  • Are the objects the same type?
  • Does the first value of the array match?
  • Do the dataframe column names match?
  • Does the array shape match?

These kinds of checks would bypass a full hash in the vast majority of cases, and only in the worst case would it fall back. The drawback is that any such code will get ugly very quickly.

@jbednar
Copy link
Member

jbednar commented Sep 21, 2018

These kinds of checks would bypass a full hash in the vast majority of cases

I agree that simple checks like that should be done where feasible. However, isn't it necessary to do a full, deep comparison before one can make use of anything that's been memoized? So it seems like at best such checks can help make memoization less costly when it's not useful, and they won't help any for when it is useful.

@philippjfr
Copy link
Member

That's a good point, doing this is always going to be costly and the cost of doing it can often outweigh any benefit gained by not executing the function. Ideally you would only activate it in cases where it's actually preventing large computations from being rerun.

@philippjfr
Copy link
Member

It'll be worth looking at joblib.Memory and more specifically MemorizedFunc which uses on-disk caching but promises to have a very fast hashing approach that we might be able to copy.

@jbednar
Copy link
Member

jbednar commented Sep 21, 2018

Ideally it would just be a hash on the underlying contents of memory -- simple to calculate, and may falsely require recomputation but would always be truly accurate when it says they match. But that's out of scope for anything we implement ourselves, so hopefully that's what joblib is doing.

@jlstevens
Copy link
Contributor Author

Point 1 where the semantics of watch has now been addressed in #271.

As for fast equality checking, I don't think there are any easy answers. I certainly don't disagree with anything said here and I remember not being particularly happy with any approach when I implemented it. The options are:

  1. Don't support memoization for non-trivial types.
  2. Accept that equality checks can be expensive (normally due to serialization) and hope that it is generally faster than the recomputation you are avoiding by memoizing. This is what we opted for in holoviews.
  3. Somehow hash the memory as Jim just mentioned. I don't know any easy way to do this is Python but maybe some library out there can handle fast, deep equality tests for arbitrary Python objects. Even if such a library exists, it would have to become a core dependency for either holoviews/param which means even this wouldn't be an ideal solution.

One approach (which I am not necessarily recommending!) would be to say parameters shouldn't be large, complex chunks of data and should only be relatively simple literals. This is normally true but we do know that there are cases where we do want parameters that hold a fair bit of state.

@philippjfr
Copy link
Member

philippjfr commented Sep 25, 2018

Currently my thinking is that we should probably just provide an extensible way to define equality checks for arbitrary types. If a user wants their custom object to work with it they can register an equality function. Testing for equality is always going to be faster than hashing and then comparing hashes since there is large (often huge) overhead in serialization, which is, I think, entirely unnecessary here because we have both the old and new value available for comparison. Just as a simple example here's the difference between HoloViews hashing comparisons and a simple numpy.array_equal:

screen shot 2018-09-25 at 2 43 49 am

I think we can all agree that 1.8 seconds (or a 3600x!!! difference to the equality check) for a fairly small array is pretty much unusable.

Defining equality function to cover the common cases such as literals, functions, ndarrays and dataframes etc. (basically all the types that param defines explicit Parameters for) won't be difficult and anyone else can define their own equality checks for custom objects they use.

@jbednar
Copy link
Member

jbednar commented Sep 25, 2018

That sounds good...

@jlstevens
Copy link
Contributor Author

Just to say that a first cut at triggering was introduced in #283 and batched watching has also been merged (though that also needs to support what properly).

@ceball ceball added component: depends/watch status: discussion Discussion. Not yet a specific feature/bug. Likely to result in multiple PRs/issues. labels Apr 13, 2020
@tonyfast
Copy link
Contributor

@philippjfr @jlstevens has this all been dealt with? could y'all open a new issue if there are any small changes to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: depends/watch status: discussion Discussion. Not yet a specific feature/bug. Likely to result in multiple PRs/issues.
Projects
None yet
Development

No branches or pull requests

5 participants