Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Callback memoization breaks for non-hashable stream values #1067

Closed
philippjfr opened this Issue Jan 17, 2017 · 11 comments

Comments

Projects
None yet
3 participants
@philippjfr
Copy link
Contributor

philippjfr commented Jan 17, 2017

In #1063 we added memoization for stream values by generating a key from the callback args, kwargs and all stream values. This is fragile because certain types like lists cannot be hashed. We can either cast to hashable types or simply use the string repr of the key.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 17, 2017

I'm not super comfortable using repr for this purpose although it should work. Probably the best thing is to implement the following behavior:

  1. Assume the key is hashable and try to use it.
  2. Fallback to the repr because something in the key is unhashable.
  3. Skip memoisation (maybe the repr can fail).

In no circumstance should there be an error - the worst case should be that you lose the advantage of memoization because the repr of the unhashable object is unstable.

The other solution is to limit the types of allowed stream parameter - but that is also tricky. If you specify a param.List, the elements of the list could be anything. It probably isn't worth restricting what streams can do for the sake of a performance optimization.

The only other thing we could do is warn when falling back to repr. The idea would be to recommend that the user uses hashable stream values (e.g tuples instead of lists). Then again, if you want to use dictionaries as the stream value, there is no easy hashable alternative (a tuple of tuples?). If only Python had a frozendict....

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 17, 2017

I'm not a huge fan of having it fall back to a different behavior. I do think it should a single approach, although I do recognize your worries about stability of the repr so I'm wondering whether we should just traverse the key and replace unhashable objects:

  • list -> tuple
  • dict -> tuple(sorted(dict.items()))
  • set -> frozenset

What else would be left? I believe most other types should hash correctly or am I wrong?

@jbednar

This comment has been minimized.

Copy link
Contributor

jbednar commented Jan 17, 2017

I would think that solid handling for lists and dicts is very important, and I don't see any reason why converting them to tuples or tuples of tuples, respectively, would be an issue, apart from needing to do so recursively.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 17, 2017

We would at least want to support OrderedDict as well as dict. I prefer the idea of recursive replacement with hashables but I am still wondering whether our suggested replacements cover everything. Do we ever use defaultdicts with param.Dict?

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 17, 2017

Both OrderedDict and defaultdict are dict subclasses so are covered by dict.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 17, 2017

I wouldn't have bet that cyordereddict also being a subclass of dict (being cython and all) but thankfully it is.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 17, 2017

More importantly, neither numpy arrays nor pandas Dataframes are hashable. I wouldn't recommend using them as stream parameters, but there is nothing stopping you from trying.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 17, 2017

In fact, numpy arrays are a good example of why it is worth avoiding string reprs. As the repr of a large enough numpy array gets truncated with ... in the middle, two arrays with the same values around the edges but different values in the middle would map to the same key.

@philippjfr

This comment has been minimized.

Copy link
Contributor Author

philippjfr commented Jan 17, 2017

For numpy arrays a SHA might work not sure what to do about dataframes and other types.

@jbednar

This comment has been minimized.

Copy link
Contributor

jbednar commented Jan 17, 2017

Seems like a SHA is a good fallback, because it's better to have it mistakenly report that something has changed when it hasn't (e.g. if something trivial is reordered inside in a way that doesn't actually matter) than to not update the plot when something truly has changed.

@jlstevens

This comment has been minimized.

Copy link
Contributor

jlstevens commented Jan 18, 2017

Addressed in the above PR. Closing.

@jlstevens jlstevens closed this Jan 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.