-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defining the watching and triggering semantics #269
Comments
I'll note that equality testing is easy for the other attributes ( |
That all sounds good to me. |
I'm not at all fond of the hashing code in holoviews and would be very hesitant porting that code to param. For pretty much any array or dataframe data that code is extremely slow, so much so that I've basically had to bypass it entirely to get usable performance out of any So I agree with the order you lay out at the end, starting by implementing this for cases where simple equality works and can decide to expand that to more complex types at a later point, hopefully with a cleaner/better/faster approach than currently implemented in holoviews. |
Just to get my thoughts recorded somewhere, basically I believe any approach exclusively based on hashing is infeasible, because it requires the whole data to be hashed before any comparison can happen. The only thing I can see being feasible is a hybrid approach that performs some simple checks before a full hash is generated, e.g. things like:
These kinds of checks would bypass a full hash in the vast majority of cases, and only in the worst case would it fall back. The drawback is that any such code will get ugly very quickly. |
I agree that simple checks like that should be done where feasible. However, isn't it necessary to do a full, deep comparison before one can make use of anything that's been memoized? So it seems like at best such checks can help make memoization less costly when it's not useful, and they won't help any for when it is useful. |
That's a good point, doing this is always going to be costly and the cost of doing it can often outweigh any benefit gained by not executing the function. Ideally you would only activate it in cases where it's actually preventing large computations from being rerun. |
It'll be worth looking at |
Ideally it would just be a hash on the underlying contents of memory -- simple to calculate, and may falsely require recomputation but would always be truly accurate when it says they match. But that's out of scope for anything we implement ourselves, so hopefully that's what joblib is doing. |
Point 1 where the semantics of As for fast equality checking, I don't think there are any easy answers. I certainly don't disagree with anything said here and I remember not being particularly happy with any approach when I implemented it. The options are:
One approach (which I am not necessarily recommending!) would be to say parameters shouldn't be large, complex chunks of data and should only be relatively simple literals. This is normally true but we do know that there are cases where we do want parameters that hold a fair bit of state. |
Currently my thinking is that we should probably just provide an extensible way to define equality checks for arbitrary types. If a user wants their custom object to work with it they can register an equality function. Testing for equality is always going to be faster than hashing and then comparing hashes since there is large (often huge) overhead in serialization, which is, I think, entirely unnecessary here because we have both the old and new value available for comparison. Just as a simple example here's the difference between HoloViews hashing comparisons and a simple I think we can all agree that 1.8 seconds (or a 3600x!!! difference to the equality check) for a fairly small array is pretty much unusable. Defining equality function to cover the common cases such as literals, functions, ndarrays and dataframes etc. (basically all the types that param defines explicit Parameters for) won't be difficult and anyone else can define their own equality checks for custom objects they use. |
That sounds good... |
Just to say that a first cut at triggering was introduced in #283 and batched watching has also been merged (though that also needs to support |
@philippjfr @jlstevens has this all been dealt with? could y'all open a new issue if there are any small changes to address. |
Param has recently been given the ability to 'watch' parameters, triggering callbacks when parameters are set. There is an important subtlety to think about as currently triggering occurs when parameters are set regardless of their value.
This is rather surprising: your watching callback will be triggered even if you keep setting the parameters to the same values (either with attribute setting or
set_param
). In my opinion, what you really want to do is watch parameters for changes.Given these semantics, it would be nice to let users trigger watch callbacks without value changes should they decide to. Discussing this with @jbednar, we propose the following that avoids adding a boolean flag to the
watch
API:watch
mechanism to be specific to parameter changes.trigger
method that takes a list of parameter names to trigger the (applicable) watch callbacks using the current set of parameter values. E.gfoo.trigger('a','b')
would pass valueChange
objects to the applicable watch callbacks for the current values of parametersa
andb
.Unfortunately, updating the watch mechanism is itself a non trivial task as it is hard to know when something (mutable) has changed for anything other than simple types. This is the same problem we face if we want to generalize the memoization machinery in HoloViews so it can be used in param.
Ideally, I think we would tackle the following issues in param in this order:
trigger
and updatewatch
with the new semantics.In practice, I think we can skip item 1 for now and partially implement item 2, with an explicit note that the equality checking is not sophisticated for complex mutable types. Then I can continue working on item 3 which is the next priority. When we have a chance, we can then implement item 1 and finish off item 2 to handle more complex types.
Does this seem reasonable?
The text was updated successfully, but these errors were encountered: