New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make serializers synchronous and guarantee to return a snapshot of the data #1270

Merged
merged 2 commits into from Apr 12, 2017

Conversation

Projects
None yet
3 participants
@jasongrout
Member

jasongrout commented Apr 7, 2017

WIP, as this causes problems for binary arrays - basically we’ll need an explicit serializer on attributes that have binary arrays now, which can choose to implement the snapshot however they think best. The default serializer will make a copy.

This is work towards solving fundamental problems exposed at #1044.

CC @maartenbreddels - this affects the binary serialization.

Make serializers synchronous and guarantee to return a snapshot of th…
…e data.

WIP, as this causes problems for binary arrays - basically we’ll need an explicit serializer on attributes that have binary arrays now, which can choose to implement the snapshot however they think best. The default serializer will make a copy.
@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 7, 2017

Can you explain a bit more what the problem is?

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 7, 2017

This is buried in #1044. Because we have message throttling, state sync messages are not always sent immediately when save_changes is called. The problem is that we serialize the state attributes we are syncing when we send the message, rather than when we request the save. This means that if we sync some attributes, then change the object, the sync message could be sent after the change, which means we sent the wrong values.

What this PR does is serialize the object to get a snapshot of the synced attributes when the save is requested. That means that the serializers must return a snapshot of the state that won't change. It's easy to accomplish this with JSONable values - just stringify and parse to get a cheap deep copy. However, if there are binary arrays in the json state, there's no way to get a snapshot. I imagine that some libraries will not do a snapshot of binary data, but will just sync the current state of the binary array and hope for the best.

I suppose another approach is to make the state sync call explicitly just be a request that the future current state of the object be synced at some point in the future. This means that we then have to be careful to never leave the state of the object in some inconsistent intermediate state, which sounds tenuous at best.

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 7, 2017

Actually, can't we juse the code that I wrote for extracting the buffers to do this? I spend effort not to clone the whole thing, but if we rip that part out, it clones everything (except the arrays..)

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 7, 2017

Sure, the JSON parse/stringify is just an easy way to deep copy an object. We could use a different deep copy implementation instead, though I it's likely to be much slower (maybe we should time it, though). I think the underlying issue with using it to do a deep copy is that then we are condoning that this sort of async bug is expected for binary data.

(I wish we had simple copy-on-write ArrayBuffers, so we could get a real snapshot of the binary data.)

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 7, 2017

Is it an option to remove the throttling, or do it differently?

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 7, 2017

Actually, can't we juse the code that I wrote for extracting the buffers to do this? I spend effort not to clone the whole thing, but if we rip that part out, it clones everything (except the arrays..)

and officially sanction this async problem for binary arrays?

(I wish we had some sort of copy-on-write binary view of an array. I just looked into the Chrome source, and if I understand it correctly, they don't implement copy-on-write copies: https://cs.chromium.org/chromium/src/v8/src/builtins/builtins-arraybuffer.cc?l=281 )

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 7, 2017

Is it an option to remove the throttling, or do it differently?

Good question. I think it makes sense to have some sort of throttling implemented somewhere. A typical usecase is dragging a slider which may cause a lot of computation to be done, maybe drawing a plot. The nice thing about the pending messages throttle is that it dynamically adjusts for the delay - sync messages are sent exactly as often as the kernel can handle them.

What is the problem with snapshotting the data the widget wants to sync?

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 7, 2017

Actually, the throttling/debounding is something I raised a while ago #663 and recently updated with some code (on the python side). Maybe it's sth that needs to go into the interact decorator, and we just handle this purely on the python side.

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 7, 2017

What is the problem with snapshotting the data the widget wants to sync?

We could, it will take up memory, but maybe we should accept that.

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 7, 2017

We could, it will take up memory, but maybe we should accept that.

That's where we maybe look the other way if someone chooses to not hand us back a snapshot of the state. Or rather, widget authors can decide if they really need to hand back a copy or not. But by default, we make a copy.

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 7, 2017

The end result is that if you have an attribute that has binary data, and you don't care about this snapshotting business, you just declare that attribute to have the identity serializer, i.e., hand back the attribute as-is.

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 7, 2017

  1. Actually, thinking about it, I don't think there really will be that much change after all. Already, if you have binary data, you'll probably need a custom serializer to understand how to package up that binary data, so the automatic copying is not done with the current PR. Whether you actually make a copy or not is on your head (it's correct to make a copy, but maybe it's okay to fudge it a bit...).

  2. This is the sort of thing I was talking about on the original binary PR, that we should just treat every message we split up as a copy. However, your implementation will work even if people fudge things a bit and don't return copies. That's maybe a good thing to be robust against that.

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 10, 2017

Ping @maartenbreddels - I think we should be okay with merging this after all. If you didn't care about making a copy, you could just give an identity function as a serializer.

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 10, 2017

Hmm, I don't quite follow when send_sync_message and when serialize are being used, can you explain this a bit.

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 10, 2017

Great question. When someone calls for a sync (e.g., with save_changes()), the sync method gets called. Inside the sync method, we first serialize a message (which should snapshot the current state of the attributes we are going to sync). Then we check to see if there are any pending sync messages still in flight (i.e., a sync message that was sent, but we haven't had a corresponding execution status idle message). If there is no pending sync message still being processed by the kernel, then we directly send the message (calling send_sync_message). If there is a pending sync message still being processed by the kernel, then we store the serialized snapshotted attributes in this.msg_buffer. When we get back a status idle message for the pending message, we call send_sync_message directly to send the message we've been saving up: https://github.com/jasongrout/ipywidgets/blob/20e00848139a83659f80cf027d313cdf6ac00adf/jupyter-js-widgets/src/widget.ts#L243.

Does that answer your question?

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 10, 2017

Ok, thanks that explains a lot. I think I understand the issue now. If you don't give a serializer, it will try to clone your data, and fail if there is a binary buffer (that's fine I guess). In that case you have to give a serializer, which could just do nothing. So in effect the default serializer is a clone using stringify/JSON.parse, if you don't want cloning you need to specify so. I think that is good default behaviour.

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 10, 2017

Exactly! Do you mind looking this over and/or trying out this PR and giving me thumbs up/down? It's a fundamental enough change that I'm hesitant to merge it without someone else looking at it.

@jasongrout jasongrout modified the milestone: 7.0 Apr 11, 2017

@SylvainCorlay

This comment has been minimized.

Member

SylvainCorlay commented Apr 11, 2017

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 11, 2017

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 12, 2017

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 12, 2017

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 12, 2017

Give me a 4 more hours! 😊

No pressure!

// How should we handle those? One way is to declare a serializer for fields
// that could have binary that copies what fields it can, and makes a decision
// about whether to copy the ArrayBuffer
state[k] = JSON.parse(JSON.stringify(state[k]));

This comment has been minimized.

@maartenbreddels

maartenbreddels Apr 12, 2017

Member

Can we maybe have a warning here, say when an exception occurs, give a hint what may have gone wrong?

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 12, 2017

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 12, 2017

Controller works, code looks, ipyvolume doesn't break. The issue in #1195 is still present though. I can kind of reproduce it, not easy, but we should fix that before 7.0, it is large performance penalty since it sends back (for some reason) especially binary buffers it seems.

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 12, 2017

Yes, exactly, say console.error('serializing failed for ...') and then rethrow the exception would be helpful already.

@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 12, 2017

Interesting: JSON.stringify already works for ArrayBuffers:

> JSON.stringify(new Int16Array(4))
"{"0":0,"1":0,"2":0,"3":0}"
@jasongrout

This comment has been minimized.

Member

jasongrout commented Apr 12, 2017

I added a message for context for serialization errors.

@maartenbreddels

This comment has been minimized.

Member

maartenbreddels commented Apr 12, 2017

Excellent 👍

@jasongrout jasongrout merged commit fe6a8f1 into jupyter-widgets:master Apr 12, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment