Ensure multiple callbacks do not bleed wrong plot state #1034

philippjfr · 2017-01-04T20:38:38Z

Currently there is a bug in the code that handles multiple callbacks being attached to one bokeh plot object. The issue arises when two streams are attached to one plot object, this can happen when one axis is linked between two plots. In these cases the RangeXY stream of one plot can end up with the y-range of another plot, which has a linked and therefore shared x-axis. The solution is to keep track of the IDs of the handles each stream is attached to, the frontend can send the matching ids along with the updated values and we can then check that only the appropriate data is actually forwarded to the stream.

jbednar · 2017-01-04T20:56:00Z

Does this fix holoviz/datashader#250?

philippjfr · 2017-01-04T20:59:08Z

The immediate problem there is already fixed, this is a more subtle issue that occurs under certain circumstances. In the case that it came up it was in a scenario where there are two datashader plots with RangeXY callbacks, which have one linked axis and one independent axis, e.g. two different quantities over time.

jbednar · 2017-01-04T21:18:57Z

Works well for me.

philippjfr · 2017-01-05T00:11:18Z

Some other issue happening here where callback events are clobbering each other and it times out.

philippjfr · 2017-01-05T12:35:10Z

Had to make some further changes to the ACK message protocol. Previously when multiple callbacks fired consecutively it could happen that some messages were getting throttled causing one particular comm to never receive an ACK message that would unblock it. Therefore callback event messages and the response may now contain the comms_target ID, ensuring that all comms are appropriately unblocked after an event has been sent.

philippjfr · 2017-01-05T13:03:41Z

Need to add more tests and update existing tests.

…ation

philippjfr · 2017-01-05T14:10:15Z

@jlstevens I believe this is now ready for review. Both issues that I fixed here are pretty subtle and caused strange bugs. Here's the example I've been using to test the fix:

import holoviews as hv
import numpy as np
import pandas as pd
from holoviews.operation.datashader import datashade, aggregate, shade

hv.notebook_extension('bokeh')

%opts DynamicMap [width=900, tools=['xwheel_zoom']] Layout [shared_datasource=False]

n=100000
def f(n):
    return np.cumsum(np.random.randn(n))
df = pd.DataFrame({'t': range(n), 'y1': f(n), 'y2': f(n), 'y3' : .10 * f(n)} )

def p(y):
    return hv.Curve(df, kdims=['t'], vdims=[y])

layout = datashade(p('y1')) * datashade(p('y2')) + datashade(p('y3'))

layout.cols(1)

A rough explanation of the changes implemented here:

Since a callback can be shared across multiple streams, and two callbacks can share multiple handles you can end up in situations where a bokeh JS callback sends data to two different callbacks. In the example above the x-range and y-range on the top plot share a JS callback, and the x-range and y-range on the bottom plot share a JS callback. Additionally since the x-axis is linked between the two plots, the x-range model is also shared between the plots. This means that the JS callback for one plot ends up sending the x-range and the y-range to both callbacks. However since the y-range is not shared this ends up supplying the wrong y-range to the callback on the other plot. The solution is to a) keep track of the IDs of the plot handles a particular stream is attached to and b) send the ID along with the value when an event is generated. Now the callback can check that the ID of the value its been given matches the IDs of the handles for a particular stream ensuring that only the correct values are sent to the stream.
Every time a message is sent to Python, it has to send a message back. This acknowledgement (ACK) message is used to unblock a comm for future events. In the case above it may end up unblocking the same comm twice, leaving the second comm blocked. To get around this it now sends the unique ID of the comm in the message to Python, allowing the ACK message to send that ID back, ensuring that the correct comm can be unblocked.

jlstevens · 2017-01-05T14:52:41Z

holoviews/plotting/bokeh/callbacks.py

@@ -10,7 +10,7 @@
 from ..comms import JupyterCommJS


-def attributes_js(attributes):
+def attributes_js(attributes, handles):
    """


Would be good to see the docstring updated with an example showing how handles is involved...

This docstring still needs to be updated I think.

Yes, still need to do that.

jlstevens · 2017-01-05T15:02:13Z

holoviews/plotting/bokeh/callbacks.py

+        # Gather the ids of the plotting handles attached to this callback
+        # This allows checking that a stream is not given the state
+        # of a plotting handle it wasn't attached to
+        stream_handle_ids = defaultdict(list)


Might be good to turn this bit of code into a method (get_handle_ids?) and turn the comment into a docstring.

jlstevens · 2017-01-05T15:05:17Z

holoviews/plotting/bokeh/callbacks.py

        for stream in self.streams:
-            stream.update(trigger=False, **msg)
+            ids = self.stream_handles[stream]
+            sanitized_msg = {}


I think this bit sounds like a 'message filter' (as opposed to sanitization) and could be its own method (with docstring). Something like message_filter(msg, ids)...

Yes, sounds good.

jlstevens · 2017-01-05T15:12:09Z

holoviews/plotting/bokeh/callbacks.py

            else:
                handle.callback.code += code
        else:
+            self.stream_handles.update(stream_handle_ids)


Looks like the contents of self.stream_handles can keep growing between set_customjs calls with no way to reset/clear it? Is this more state that hangs around when a visualization is removed (e.g a notebook cell is deleted?).

Yes, it'll stick around but compared to all the plotting state I'm not worried about a few IDs.

jlstevens · 2017-01-05T15:18:10Z

To summarize private discussion I've had directly with Philipp, I'll just say that I think the general strategy used in this PR looks sound - labelling comms with an id and keeping track of plotting handles.

I intend to use inline comments for suggestions specific to this PR and here are some of the more general (future) plans that we discussed:

There is code in callbacks that references jupyter comms specifically. This should be generalized to work with any comms type (e.g a websocket comms implementation).
I would suggest renaming 'comms_target' to 'comm_id'.
We want to document our message protocol and ideally encapsulate it all in one location (ie. in a class, maybe called Protocol that lives alongside the comm classes).
We should aim to generalize our message protocol. E.g always have a 'command' or 'msg_type' field. Maybe all messages could have 'msg_type' and 'data' fields that can be dispatched appropriately in the centralized message protocol class. Then we need something for 'ACK' and 'ERROR' messages...

philippjfr · 2017-01-07T00:13:34Z

@jordansamuels I just updated the PR fixing the issue you reported when resetting, could you test it as well?

philippjfr · 2017-01-07T02:51:57Z

@jlstevens Ready for review again.

jlstevens · 2017-01-07T12:30:59Z

holoviews/plotting/bokeh/callbacks.py

@@ -152,39 +164,84 @@ def __init__(self, plot, streams, source, **params):
        self.streams = streams
        self.comm = self._comm_type(plot, on_msg=self.on_msg)
        self.source = source
+        self.handle_ids = defaultdict(list)


    def initialize(self):
        plots = [self.plot]
        if self.plot.subplots:
            plots += list(self.plot.subplots.values())



Looks like the code below could be made into a small get_handles (or get_plot_handles) method. Alternatively, it could be a function in utils so the code below could be:

handle_ids = self._get_handle_ids(util.get_handles(plots)) self.handle_ids.update(handle_ids)

Which I think is clearer.

I am also assuming there is a good reason to use a dictionary update... i.e there may be other handle_ids on the Callback other than the ones currently returned by _get_handle_ids. If that is true, it is a bit strange to see this in an initialize method which sounds like it should only happen once.

Looks like the code below could be made into a small get_handles (or get_plot_handles) method.

Originally had one, didn't seem worth it though.

If that is true, it is a bit strange to see this in an initialize method which sounds like it should only happen once.

That's true, the handles get merged if multiple streams end up being attached to the same callback, in that case the set_customjs method below merges the callbacks:

for k, v in self.handle_ids.items(): cb.handle_ids[k] += v

I could just assign in initialize instead of using the dictionary update, but I wanted to indicate that the handle_ids may be modified by another callback.

Originally had one, didn't seem worth it though.

I agree it is a small bit of code and I was also wondering if it was worth it. On balance, I felt making the intent clearer for this block of code was more valuable.

I'm still a bit confused about self.handle_ids - I see it mentioned in four place:

When it is declared.

The bit in initialize above.

What looks like an a dictionary access in on_msg.

The iteration over k, v you just mentioned.

In none of these places do I see self.handle_ids being changed after it is initialized. If you mean that self.handle_ids is being changed somewhere else in the code, then I wouldn't worry about using update here.

In that case, I would also want to look at where the merging of self.handle_ids is happening to consider whether that should become part of the API of Callback.

I posted where it's being changed above, in set_customjs a callback will merge its own handle_ids with another callback's handle_ids if that callback has already been attached to the plot handle (since you can't attach multiple callbacks on one handle).

for k, v in self.handle_ids.items(): cb.handle_ids[k] += v

I see, I was looking for self.handle_ids but cb here is another Callback instance. In that case, how about merging to a different place e.g cb.joint_handle_ids? If it is too complicated, then I don't mind leaving it as it is.

No point keeping both around imo.

jlstevens · 2017-01-07T12:40:27Z

holoviews/plotting/bokeh/callbacks.py

        found = []
        for plot in plots:
            for handle in self.handles:
                if handle not in plot.handles or handle in found:
                    continue
-                self.set_customjs(plot.handles[handle])
+                self.set_customjs(plot.handles[handle], handles)


If I understand right, found is a list of handles that are in some sense 'unique'. So maybe it could be something like:

unique_handles = self.unique_handles(plots) for unique_handle in unique_handles: self.set_customjs(plot.handles[handle], handles)

Now you might want to call unique_handles something like filtered_handles instead and this variable would then replace found.

jlstevens · 2017-01-07T12:43:39Z

holoviews/plotting/bokeh/callbacks.py

        return msg


-    def set_customjs(self, handle):
+    def _get_handle_ids(self, handles):


I would like a name that conveys the filtering operation that is also happening here. Maybe _get_attached_ids or _get_attached_handle_ids?

jlstevens · 2017-01-07T12:48:42Z

holoviews/plotting/bokeh/callbacks.py

+            data['x_range'] = (msg['x0'], msg['x1'])
+        if 'y0' in msg and 'y1' in msg:
+            data['y_range'] = (msg['y0'], msg['y1'])
+        return data


 class RangeXCallback(Callback):


All of the new bits of code below seems to have the form:

if predicate(msg): return dictionary_from_msg(msg) else: return {}

If you agree this is a general pattern, maybe we can just have an applicable predicate method with the baseclass checking the predicate value. E.g

class RangeXCallback(Callback): handles = ['x_range'] def applicable(msg): return 'x0' in msg and 'x1' in msg def _process_msg(self, msg): return {'x_range': (msg['x0'], msg['x1'])}

And of course applicable can return True by default.

What if there are multiple predicates, such as in RangeXY? I'd prefer not to complicate this for now, although I do agree something like this is worth considering.

RangeXY is really the union of RangeX and RangeY. We might want to consider generalizing this idea of a union so we can build things like RangeXY out of the component pieces.

If you agree with this suggestion, maybe it should be made into a new issue (feature request)? I don't think it would be hard to implement later.

philippjfr · 2017-01-07T13:03:52Z

Okay, implemented most of your new suggestions. Need to update the attribute_js docstring and will try to add a few unit tests for the various Callback methods.

philippjfr · 2017-01-07T13:51:13Z

@jlstevens Added a unit test and revised the docstring. We can settle what to do about the callback _process_msg implementations in another issue as you suggest. Unless you object I think this is ready to merge once tests pass.

jlstevens · 2017-01-07T14:13:30Z

I'm happy to merge now although looking at the things we discussed, there might be one thing that could still be addressed in this PR:

I would suggest renaming 'comms_target' to 'comm_id'.

Here are the other suggestions made (that we should turn into issues):

As you agreed, we should file an issue to define Callbacks with an applicable predicate and introducing unions (so PositionXY would be the union of PositionX and PositionY).
There is code in callbacks that references jupyter comms specifically. This should be generalized to work with any comms type (e.g a websocket comms implementation).
We want to document our message protocol and ideally encapsulate it all in one location (ie. in a class, maybe called Protocol that lives alongside the comm classes).
We should aim to generalize our message protocol. E.g always have a 'command' or 'msg_type' field. Maybe all messages could have 'msg_type' and 'data' fields that can be dispatched appropriately in the centralized message protocol class. Then we need something for 'ACK' and 'ERROR' messages...

jlstevens · 2017-01-07T14:59:08Z

Thanks for doing the renaming (which looked a bit tricky)!

Merging.

philippjfr added tag: backend: bokeh type: bug Something isn't correct or isn't working labels Jan 4, 2017

philippjfr force-pushed the duplicate_callback_fix branch from 4f73b24 to eaf6249 Compare January 4, 2017 21:07

philippjfr added 2 commits January 5, 2017 12:35

Ensure multiple callbacks do not bleed wrong plot state

16e9f92

Added comms_target ID to ACK message protocol

9aa0a0f

philippjfr force-pushed the duplicate_callback_fix branch from 34cb892 to 9aa0a0f Compare January 5, 2017 12:36

philippjfr added 6 commits January 5, 2017 13:12

Fixed JSON decoding of Jupyter comms messages

ad6df39

Fixed stream callback unit test

4ee3623

Small fix to Comm._handle_msg

cccacfc

Added tests for Comms ACK messages

7c602cb

Fixed Python3 bug in Comms json decoding

6556637

Added comments to clarify Comms protocol and callback message sanitiz…

0665086

…ation

philippjfr force-pushed the duplicate_callback_fix branch from e447324 to 0665086 Compare January 5, 2017 13:56

philippjfr requested a review from jlstevens January 5, 2017 13:57

Python3 fix for detecting JSON decode error

80afcc8

jlstevens reviewed Jan 5, 2017

View reviewed changes

philippjfr added 2 commits January 7, 2017 00:07

Simplified JupyterComm json decoding

0ce5e35

Use correct comm when callback is triggered

eab77cb

philippjfr added 2 commits January 7, 2017 01:01

Refactored bokeh Callback class

3348b11

Minor cleanup of bokeh Callback JS

9b01ca1

jlstevens reviewed Jan 7, 2017

View reviewed changes

Refactored bokeh Callback class

2c86401

philippjfr added 3 commits January 7, 2017 13:22

Updated bokeh callback attribute_js docstring

ce4ad9c

Added unit test to test handling of callback ids

6a3124d

Only pass requested handles to bokeh callbacks

4234142

philippjfr force-pushed the duplicate_callback_fix branch 2 times, most recently from 17ad6ab to 0dedc3b Compare January 7, 2017 14:38

Renamed Comm.target to Comm.id

89921f1

philippjfr force-pushed the duplicate_callback_fix branch from 0dedc3b to 89921f1 Compare January 7, 2017 14:43

jlstevens merged commit 7806464 into master Jan 7, 2017

philippjfr deleted the duplicate_callback_fix branch January 7, 2017 15:00

Ensure multiple callbacks do not bleed wrong plot state #1034

Ensure multiple callbacks do not bleed wrong plot state #1034

Conversation

philippjfr commented Jan 4, 2017 • edited Loading

jbednar commented Jan 4, 2017

philippjfr commented Jan 4, 2017 • edited Loading

jbednar commented Jan 4, 2017

philippjfr commented Jan 5, 2017

philippjfr commented Jan 5, 2017

philippjfr commented Jan 5, 2017

philippjfr commented Jan 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Jan 5, 2017

philippjfr commented Jan 7, 2017

philippjfr commented Jan 7, 2017

jlstevens Jan 7, 2017 • edited Loading

Choose a reason for hiding this comment

jlstevens Jan 7, 2017 • edited Loading

Choose a reason for hiding this comment

philippjfr Jan 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens Jan 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens Jan 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philippjfr commented Jan 7, 2017

philippjfr commented Jan 7, 2017

jlstevens commented Jan 7, 2017 • edited Loading

jlstevens commented Jan 7, 2017

philippjfr commented Jan 4, 2017 •

edited

Loading

philippjfr commented Jan 4, 2017 •

edited

Loading

philippjfr commented Jan 5, 2017 •

edited

Loading

jlstevens Jan 7, 2017 •

edited

Loading

jlstevens Jan 7, 2017 •

edited

Loading

philippjfr Jan 7, 2017 •

edited

Loading

jlstevens Jan 7, 2017 •

edited

Loading

jlstevens Jan 7, 2017 •

edited

Loading

jlstevens commented Jan 7, 2017 •

edited

Loading