Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make serializers synchronous and guarantee to return a snapshot of the data #1270

Merged
merged 2 commits into from Apr 12, 2017

Conversation

@jasongrout
Copy link
Member

@jasongrout jasongrout commented Apr 7, 2017

WIP, as this causes problems for binary arrays - basically we’ll need an explicit serializer on attributes that have binary arrays now, which can choose to implement the snapshot however they think best. The default serializer will make a copy.

This is work towards solving fundamental problems exposed at #1044.

CC @maartenbreddels - this affects the binary serialization.

…e data.

WIP, as this causes problems for binary arrays - basically we’ll need an explicit serializer on attributes that have binary arrays now, which can choose to implement the snapshot however they think best. The default serializer will make a copy.
@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 7, 2017

Can you explain a bit more what the problem is?

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 7, 2017

This is buried in #1044. Because we have message throttling, state sync messages are not always sent immediately when save_changes is called. The problem is that we serialize the state attributes we are syncing when we send the message, rather than when we request the save. This means that if we sync some attributes, then change the object, the sync message could be sent after the change, which means we sent the wrong values.

What this PR does is serialize the object to get a snapshot of the synced attributes when the save is requested. That means that the serializers must return a snapshot of the state that won't change. It's easy to accomplish this with JSONable values - just stringify and parse to get a cheap deep copy. However, if there are binary arrays in the json state, there's no way to get a snapshot. I imagine that some libraries will not do a snapshot of binary data, but will just sync the current state of the binary array and hope for the best.

I suppose another approach is to make the state sync call explicitly just be a request that the future current state of the object be synced at some point in the future. This means that we then have to be careful to never leave the state of the object in some inconsistent intermediate state, which sounds tenuous at best.

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 7, 2017

Actually, can't we juse the code that I wrote for extracting the buffers to do this? I spend effort not to clone the whole thing, but if we rip that part out, it clones everything (except the arrays..)

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 7, 2017

Sure, the JSON parse/stringify is just an easy way to deep copy an object. We could use a different deep copy implementation instead, though I it's likely to be much slower (maybe we should time it, though). I think the underlying issue with using it to do a deep copy is that then we are condoning that this sort of async bug is expected for binary data.

(I wish we had simple copy-on-write ArrayBuffers, so we could get a real snapshot of the binary data.)

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 7, 2017

Is it an option to remove the throttling, or do it differently?

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 7, 2017

Actually, can't we juse the code that I wrote for extracting the buffers to do this? I spend effort not to clone the whole thing, but if we rip that part out, it clones everything (except the arrays..)

and officially sanction this async problem for binary arrays?

(I wish we had some sort of copy-on-write binary view of an array. I just looked into the Chrome source, and if I understand it correctly, they don't implement copy-on-write copies: https://cs.chromium.org/chromium/src/v8/src/builtins/builtins-arraybuffer.cc?l=281 )

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 7, 2017

Is it an option to remove the throttling, or do it differently?

Good question. I think it makes sense to have some sort of throttling implemented somewhere. A typical usecase is dragging a slider which may cause a lot of computation to be done, maybe drawing a plot. The nice thing about the pending messages throttle is that it dynamically adjusts for the delay - sync messages are sent exactly as often as the kernel can handle them.

What is the problem with snapshotting the data the widget wants to sync?

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 7, 2017

Actually, the throttling/debounding is something I raised a while ago #663 and recently updated with some code (on the python side). Maybe it's sth that needs to go into the interact decorator, and we just handle this purely on the python side.

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 7, 2017

What is the problem with snapshotting the data the widget wants to sync?

We could, it will take up memory, but maybe we should accept that.

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 7, 2017

We could, it will take up memory, but maybe we should accept that.

That's where we maybe look the other way if someone chooses to not hand us back a snapshot of the state. Or rather, widget authors can decide if they really need to hand back a copy or not. But by default, we make a copy.

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 7, 2017

The end result is that if you have an attribute that has binary data, and you don't care about this snapshotting business, you just declare that attribute to have the identity serializer, i.e., hand back the attribute as-is.

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 7, 2017

  1. Actually, thinking about it, I don't think there really will be that much change after all. Already, if you have binary data, you'll probably need a custom serializer to understand how to package up that binary data, so the automatic copying is not done with the current PR. Whether you actually make a copy or not is on your head (it's correct to make a copy, but maybe it's okay to fudge it a bit...).

  2. This is the sort of thing I was talking about on the original binary PR, that we should just treat every message we split up as a copy. However, your implementation will work even if people fudge things a bit and don't return copies. That's maybe a good thing to be robust against that.

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 10, 2017

Ping @maartenbreddels - I think we should be okay with merging this after all. If you didn't care about making a copy, you could just give an identity function as a serializer.

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 10, 2017

Hmm, I don't quite follow when send_sync_message and when serialize are being used, can you explain this a bit.

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 10, 2017

Great question. When someone calls for a sync (e.g., with save_changes()), the sync method gets called. Inside the sync method, we first serialize a message (which should snapshot the current state of the attributes we are going to sync). Then we check to see if there are any pending sync messages still in flight (i.e., a sync message that was sent, but we haven't had a corresponding execution status idle message). If there is no pending sync message still being processed by the kernel, then we directly send the message (calling send_sync_message). If there is a pending sync message still being processed by the kernel, then we store the serialized snapshotted attributes in this.msg_buffer. When we get back a status idle message for the pending message, we call send_sync_message directly to send the message we've been saving up: https://github.com/jasongrout/ipywidgets/blob/20e00848139a83659f80cf027d313cdf6ac00adf/jupyter-js-widgets/src/widget.ts#L243.

Does that answer your question?

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 10, 2017

Ok, thanks that explains a lot. I think I understand the issue now. If you don't give a serializer, it will try to clone your data, and fail if there is a binary buffer (that's fine I guess). In that case you have to give a serializer, which could just do nothing. So in effect the default serializer is a clone using stringify/JSON.parse, if you don't want cloning you need to specify so. I think that is good default behaviour.

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 10, 2017

Exactly! Do you mind looking this over and/or trying out this PR and giving me thumbs up/down? It's a fundamental enough change that I'm hesitant to merge it without someone else looking at it.

@jasongrout jasongrout added this to the 7.0 milestone Apr 11, 2017
@jasongrout jasongrout added this to the 7.0 milestone Apr 11, 2017
@SylvainCorlay
Copy link
Member

@SylvainCorlay SylvainCorlay commented Apr 11, 2017

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 11, 2017

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 12, 2017

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 12, 2017

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 12, 2017

Give me a 4 more hours! 😊

No pressure!

// How should we handle those? One way is to declare a serializer for fields
// that could have binary that copies what fields it can, and makes a decision
// about whether to copy the ArrayBuffer
state[k] = JSON.parse(JSON.stringify(state[k]));
Copy link
Member

@maartenbreddels maartenbreddels Apr 12, 2017

Can we maybe have a warning here, say when an exception occurs, give a hint what may have gone wrong?

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 12, 2017

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 12, 2017

Controller works, code looks, ipyvolume doesn't break. The issue in #1195 is still present though. I can kind of reproduce it, not easy, but we should fix that before 7.0, it is large performance penalty since it sends back (for some reason) especially binary buffers it seems.

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 12, 2017

Yes, exactly, say console.error('serializing failed for ...') and then rethrow the exception would be helpful already.

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 12, 2017

Interesting: JSON.stringify already works for ArrayBuffers:

> JSON.stringify(new Int16Array(4))
"{"0":0,"1":0,"2":0,"3":0}"

@jasongrout
Copy link
Member Author

@jasongrout jasongrout commented Apr 12, 2017

I added a message for context for serialization errors.

@maartenbreddels
Copy link
Member

@maartenbreddels maartenbreddels commented Apr 12, 2017

Excellent 👍

@jasongrout jasongrout merged commit fe6a8f1 into jupyter-widgets:master Apr 12, 2017
1 check passed
@jasongrout jasongrout mentioned this pull request Apr 13, 2017
@jasongrout jasongrout mentioned this pull request Apr 18, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants