New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure multiple callbacks do not bleed wrong plot state #1034

Merged
merged 18 commits into from Jan 7, 2017

Conversation

Projects
None yet
3 participants
@philippjfr
Member

philippjfr commented Jan 4, 2017

Currently there is a bug in the code that handles multiple callbacks being attached to one bokeh plot object. The issue arises when two streams are attached to one plot object, this can happen when one axis is linked between two plots. In these cases the RangeXY stream of one plot can end up with the y-range of another plot, which has a linked and therefore shared x-axis. The solution is to keep track of the IDs of the handles each stream is attached to, the frontend can send the matching ids along with the updated values and we can then check that only the appropriate data is actually forwarded to the stream.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 4, 2017

Does this fix pyviz/datashader#250?

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 4, 2017

The immediate problem there is already fixed, this is a more subtle issue that occurs under certain circumstances. In the case that it came up it was in a scenario where there are two datashader plots with RangeXY callbacks, which have one linked axis and one independent axis, e.g. two different quantities over time.

@jbednar

This comment has been minimized.

Member

jbednar commented Jan 4, 2017

Works well for me.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 5, 2017

Some other issue happening here where callback events are clobbering each other and it times out.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 5, 2017

Had to make some further changes to the ACK message protocol. Previously when multiple callbacks fired consecutively it could happen that some messages were getting throttled causing one particular comm to never receive an ACK message that would unblock it. Therefore callback event messages and the response may now contain the comms_target ID, ensuring that all comms are appropriately unblocked after an event has been sent.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 5, 2017

Need to add more tests and update existing tests.

@philippjfr philippjfr requested a review from jlstevens Jan 5, 2017

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 5, 2017

@jlstevens I believe this is now ready for review. Both issues that I fixed here are pretty subtle and caused strange bugs. Here's the example I've been using to test the fix:

import holoviews as hv
import numpy as np
import pandas as pd
from holoviews.operation.datashader import datashade, aggregate, shade

hv.notebook_extension('bokeh')

%opts DynamicMap [width=900, tools=['xwheel_zoom']] Layout [shared_datasource=False]

n=100000
def f(n):
    return np.cumsum(np.random.randn(n))
df = pd.DataFrame({'t': range(n), 'y1': f(n), 'y2': f(n), 'y3' : .10 * f(n)} )

def p(y):
    return hv.Curve(df, kdims=['t'], vdims=[y])

layout = datashade(p('y1')) * datashade(p('y2')) + datashade(p('y3'))

layout.cols(1)

A rough explanation of the changes implemented here:

  1. Since a callback can be shared across multiple streams, and two callbacks can share multiple handles you can end up in situations where a bokeh JS callback sends data to two different callbacks. In the example above the x-range and y-range on the top plot share a JS callback, and the x-range and y-range on the bottom plot share a JS callback. Additionally since the x-axis is linked between the two plots, the x-range model is also shared between the plots. This means that the JS callback for one plot ends up sending the x-range and the y-range to both callbacks. However since the y-range is not shared this ends up supplying the wrong y-range to the callback on the other plot. The solution is to a) keep track of the IDs of the plot handles a particular stream is attached to and b) send the ID along with the value when an event is generated. Now the callback can check that the ID of the value its been given matches the IDs of the handles for a particular stream ensuring that only the correct values are sent to the stream.

  2. Every time a message is sent to Python, it has to send a message back. This acknowledgement (ACK) message is used to unblock a comm for future events. In the case above it may end up unblocking the same comm twice, leaving the second comm blocked. To get around this it now sends the unique ID of the comm in the message to Python, allowing the ACK message to send that ID back, ensuring that the correct comm can be unblocked.

@@ -10,7 +10,7 @@
from ..comms import JupyterCommJS
def attributes_js(attributes):
def attributes_js(attributes, handles):
"""

This comment has been minimized.

@jlstevens

jlstevens Jan 5, 2017

Member

Would be good to see the docstring updated with an example showing how handles is involved...

This comment has been minimized.

@philippjfr

philippjfr Jan 5, 2017

Member

Agreed.

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

This docstring still needs to be updated I think.

This comment has been minimized.

@philippjfr

philippjfr Jan 7, 2017

Member

Yes, still need to do that.

# Gather the ids of the plotting handles attached to this callback
# This allows checking that a stream is not given the state
# of a plotting handle it wasn't attached to
stream_handle_ids = defaultdict(list)

This comment has been minimized.

@jlstevens

jlstevens Jan 5, 2017

Member

Might be good to turn this bit of code into a method (get_handle_ids?) and turn the comment into a docstring.

This comment has been minimized.

@philippjfr

philippjfr Jan 5, 2017

Member

Agreed.

for stream in self.streams:
stream.update(trigger=False, **msg)
ids = self.stream_handles[stream]
sanitized_msg = {}

This comment has been minimized.

@jlstevens

jlstevens Jan 5, 2017

Member

I think this bit sounds like a 'message filter' (as opposed to sanitization) and could be its own method (with docstring). Something like message_filter(msg, ids)...

This comment has been minimized.

@philippjfr

philippjfr Jan 5, 2017

Member

Yes, sounds good.

else:
handle.callback.code += code
else:
self.stream_handles.update(stream_handle_ids)

This comment has been minimized.

@jlstevens

jlstevens Jan 5, 2017

Member

Looks like the contents of self.stream_handles can keep growing between set_customjs calls with no way to reset/clear it? Is this more state that hangs around when a visualization is removed (e.g a notebook cell is deleted?).

This comment has been minimized.

@philippjfr

philippjfr Jan 5, 2017

Member

Yes, it'll stick around but compared to all the plotting state I'm not worried about a few IDs.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 5, 2017

To summarize private discussion I've had directly with Philipp, I'll just say that I think the general strategy used in this PR looks sound - labelling comms with an id and keeping track of plotting handles.

I intend to use inline comments for suggestions specific to this PR and here are some of the more general (future) plans that we discussed:

  • There is code in callbacks that references jupyter comms specifically. This should be generalized to work with any comms type (e.g a websocket comms implementation).
  • I would suggest renaming 'comms_target' to 'comm_id'.
  • We want to document our message protocol and ideally encapsulate it all in one location (ie. in a class, maybe called Protocol that lives alongside the comm classes).
  • We should aim to generalize our message protocol. E.g always have a 'command' or 'msg_type' field. Maybe all messages could have 'msg_type' and 'data' fields that can be dispatched appropriately in the centralized message protocol class. Then we need something for 'ACK' and 'ERROR' messages...
@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 7, 2017

@jordansamuels I just updated the PR fixing the issue you reported when resetting, could you test it as well?

philippjfr added some commits Jan 7, 2017

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 7, 2017

@jlstevens Ready for review again.

@@ -152,39 +164,84 @@ def __init__(self, plot, streams, source, **params):
self.streams = streams
self.comm = self._comm_type(plot, on_msg=self.on_msg)
self.source = source
self.handle_ids = defaultdict(list)
def initialize(self):
plots = [self.plot]
if self.plot.subplots:
plots += list(self.plot.subplots.values())

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

Looks like the code below could be made into a small get_handles (or get_plot_handles) method. Alternatively, it could be a function in utils so the code below could be:

handle_ids = self._get_handle_ids(util.get_handles(plots))
self.handle_ids.update(handle_ids)

Which I think is clearer.

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

I am also assuming there is a good reason to use a dictionary update... i.e there may be other handle_ids on the Callback other than the ones currently returned by _get_handle_ids. If that is true, it is a bit strange to see this in an initialize method which sounds like it should only happen once.

This comment has been minimized.

@philippjfr

philippjfr Jan 7, 2017

Member

Looks like the code below could be made into a small get_handles (or get_plot_handles) method.

Originally had one, didn't seem worth it though.

If that is true, it is a bit strange to see this in an initialize method which sounds like it should only happen once.

That's true, the handles get merged if multiple streams end up being attached to the same callback, in that case the set_customjs method below merges the callbacks:

for k, v in self.handle_ids.items():
    cb.handle_ids[k] += v

I could just assign in initialize instead of using the dictionary update, but I wanted to indicate that the handle_ids may be modified by another callback.

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

Originally had one, didn't seem worth it though.

I agree it is a small bit of code and I was also wondering if it was worth it. On balance, I felt making the intent clearer for this block of code was more valuable.

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

I'm still a bit confused about self.handle_ids - I see it mentioned in four place:

  1. When it is declared.
  2. The bit in initialize above.
  3. What looks like an a dictionary access in on_msg.
  4. The iteration over k, v you just mentioned.

In none of these places do I see self.handle_ids being changed after it is initialized. If you mean that self.handle_ids is being changed somewhere else in the code, then I wouldn't worry about using update here.

In that case, I would also want to look at where the merging of self.handle_ids is happening to consider whether that should become part of the API of Callback.

This comment has been minimized.

@philippjfr

philippjfr Jan 7, 2017

Member

I posted where it's being changed above, in set_customjs a callback will merge its own handle_ids with another callback's handle_ids if that callback has already been attached to the plot handle (since you can't attach multiple callbacks on one handle).

for k, v in self.handle_ids.items():
    cb.handle_ids[k] += v

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

I see, I was looking for self.handle_ids but cb here is another Callback instance. In that case, how about merging to a different place e.g cb.joint_handle_ids? If it is too complicated, then I don't mind leaving it as it is.

This comment has been minimized.

@philippjfr

philippjfr Jan 7, 2017

Member

No point keeping both around imo.

found = []
for plot in plots:
for handle in self.handles:
if handle not in plot.handles or handle in found:
continue
self.set_customjs(plot.handles[handle])
self.set_customjs(plot.handles[handle], handles)

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

If I understand right, found is a list of handles that are in some sense 'unique'. So maybe it could be something like:

unique_handles = self.unique_handles(plots)
for unique_handle in unique_handles:
   self.set_customjs(plot.handles[handle], handles)

Now you might want to call unique_handles something like filtered_handles instead and this variable would then replace found.

return msg
def set_customjs(self, handle):
def _get_handle_ids(self, handles):

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

I would like a name that conveys the filtering operation that is also happening here. Maybe _get_attached_ids or _get_attached_handle_ids?

data['x_range'] = (msg['x0'], msg['x1'])
if 'y0' in msg and 'y1' in msg:
data['y_range'] = (msg['y0'], msg['y1'])
return data
class RangeXCallback(Callback):

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

All of the new bits of code below seems to have the form:

if predicate(msg):
   return dictionary_from_msg(msg)
else:
  return {}

If you agree this is a general pattern, maybe we can just have an applicable predicate method with the baseclass checking the predicate value. E.g

class RangeXCallback(Callback):
      handles = ['x_range']

     def applicable(msg):
          return 'x0' in msg and 'x1' in msg
  
      def _process_msg(self, msg):
         return {'x_range': (msg['x0'], msg['x1'])}

And of course applicable can return True by default.

This comment has been minimized.

@philippjfr

philippjfr Jan 7, 2017

Member

What if there are multiple predicates, such as in RangeXY? I'd prefer not to complicate this for now, although I do agree something like this is worth considering.

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

RangeXY is really the union of RangeX and RangeY. We might want to consider generalizing this idea of a union so we can build things like RangeXY out of the component pieces.

This comment has been minimized.

@jlstevens

jlstevens Jan 7, 2017

Member

If you agree with this suggestion, maybe it should be made into a new issue (feature request)? I don't think it would be hard to implement later.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 7, 2017

Okay, implemented most of your new suggestions. Need to update the attribute_js docstring and will try to add a few unit tests for the various Callback methods.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Jan 7, 2017

@jlstevens Added a unit test and revised the docstring. We can settle what to do about the callback _process_msg implementations in another issue as you suggest. Unless you object I think this is ready to merge once tests pass.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 7, 2017

I'm happy to merge now although looking at the things we discussed, there might be one thing that could still be addressed in this PR:

  • I would suggest renaming 'comms_target' to 'comm_id'.

Here are the other suggestions made (that we should turn into issues):

  • As you agreed, we should file an issue to define Callbacks with an applicable predicate and introducing unions (so PositionXY would be the union of PositionX and PositionY).
  • There is code in callbacks that references jupyter comms specifically. This should be generalized to work with any comms type (e.g a websocket comms implementation).
  • We want to document our message protocol and ideally encapsulate it all in one location (ie. in a class, maybe called Protocol that lives alongside the comm classes).
  • We should aim to generalize our message protocol. E.g always have a 'command' or 'msg_type' field. Maybe all messages could have 'msg_type' and 'data' fields that can be dispatched appropriately in the centralized message protocol class. Then we need something for 'ACK' and 'ERROR' messages...
@jlstevens

This comment has been minimized.

Member

jlstevens commented Jan 7, 2017

Thanks for doing the renaming (which looked a bit tricky)!

Merging.

@jlstevens jlstevens merged commit 7806464 into master Jan 7, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.002%) to 76.672%
Details
s3-reference-data-cache Test data is cached.
Details

@philippjfr philippjfr deleted the duplicate_callback_fix branch Jan 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment