Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow plotly.js to accept numpy buffers #1784

Open
jackparmer opened this issue Jun 14, 2017 · 14 comments
Open

allow plotly.js to accept numpy buffers #1784

jackparmer opened this issue Jun 14, 2017 · 14 comments
Labels
feature something new P3 not pressing

Comments

@jackparmer
Copy link
Contributor

jackparmer commented Jun 14, 2017

If we allowed all GL plot types to accept numpy buffers for plot x/y/z data, then the Python library could optionally avoid JSON serialization of array data, like @maartenbreddels does in his brillant project ipyvolume:

https://github.com/maartenbreddels/ipyvolume/blob/master/ipyvolume/serialize.py#L95

Here is the deserialization on the JS side:

https://github.com/maartenbreddels/ipyvolume/blob/master/js/src/serialize.js#L16

Plotly.js has all of these incredible WebGL figures for scientific computing, but their potential in Python, R, MATLAB, etc is limited by the JSON serialization step.

In a similar vein, it seems like all GL types should be able to accept Float32Array's directly instead of untyped JS arrays. Currently, it looks like only the Plotly trace type pointcloud accepts Float32Array's:

https://codepen.io/plotly/pen/GEoPgv

@jackparmer jackparmer changed the title Allow x,y,z in WebGL plots to accept Float32Array's Idea: Allow plotly.js to accept numpy buffers Jun 14, 2017
@alexcjohnson
Copy link
Collaborator

Interesting, definitely seems doable, but is the numpy buffer format accessible from other languages? It looks like a fairly straightforward encoding, so seems like even if it's not natively available elsewhere we could likely generate it - hopefully just translating headers from whatever is natively available.

@jackparmer
Copy link
Contributor Author

Here's an R package for writing numpy buffers.

@etpinard
Copy link
Contributor

Referencing #860

@rreusser
Copy link
Contributor

Very loosely related, only because Mikola wrote so much useful stuff that's not fully utilized: scijs/ndarray#18

There's been talk of letting the scijs/ndarray constructor accept a plain object with format {data: [...], shape: [...], stride: [...], offset: ...}. It's a bit annoying/inefficient to pack and unpack those via ndarray-unpack and ndarray-pack, but it's pretty trivial. Like if a numpy buffer could get unpacked into an scijs ndarray and then sent to plotly as an array of arrays via ndarray-unpack. The annoying part is mashalling to/from typed arrays, but maybe the above code handles that.

@maartenbreddels
Copy link

Let me add my 2c on this.
In the ipywidgets ecosystem we have two levels, the higher level at the widget level is where you define how an object gets serialized into json, which is what happens here for instance.

Let me repeat the stripped version here:

def array_to_json(ar, obj=None):
    return {'buffer':memoryview(ar), 'dtype':str(ar.dtype), 'shape':ar.shape}

On a lower level is how to send the json (or our extension of json) over the wire. In ipywidgets we allow for memoryview objects (basically a binary blob, we loosly refer to it as a buffer) to be present in the json, which is off course not real valid json, lets call it jsonb. This 'jsonb' then gets split into a real json part, and list of buffers, and a list of 'paths' where these buffers resided in the json structure, on the python side that happens here. The real json part, and the buffers are then send over the wire (websocket) as 1 binary blob, and at the JS side deserialized, most of the js magic is here

Let me repeat the example splitting of the jsonb in json, paths and buffers here:

>>> state = {'plain': [0, 'text'], 'x': {'ar': memoryview(ar1)}, 'y': {'shape': (10,10), 'data': memoryview(ar2)}}
>>> _remove_buffers(state)
    ({'plain': [0, 'text']}, {'x': {}, 'y': {'shape': (10, 10)}}, [['x', 'ar'], ['y', 'data']],
     [<memory at 0x107ffec48>, <memory at 0x107ffed08>])

This can be seen as an extension of json, and I think this part deserves it's own library, which I think can be useful for many other projects. For instance I noticed that bokeh (cc @bryevdv) also has binary transfer on their wish list, so maybe some coordination is useful.

Having a jsonb library for python, js, R and c++ would be of interest of many more people I think, beyond ipywidgets, plotly and bokeh.

What to do with the buffer object on the js side is is I think up to the app developer, in ipyvolume I now mostly directly use typed arrays (such as Float32Array), and for multi-d cases ndarray. I do however check on the Python side that the array is 'C_CONTIGUOUS', so I do not have to worry about strides.

(cc @SylvainCorlay @jasongrout )

PS: @jackparmer I don't transfer the full numpy array data any more (that was before ipywidgets 7), I now serialize only the array data, and send the dtype, and shape separately, i need to remove that code.

@jackparmer
Copy link
Contributor Author

Thanks @maartenbreddels for these tips - extremely helpful!

We're in the middle of a few other plotly.js projects right now, but are planning to circle back on this in a few weeks.

@SylvainCorlay @jasongrout @bryevdv happy to think about standalone implementations for this that could be universally useful. Feel free to chime in if you think of ideas 🥂

@monfera
Copy link
Contributor

monfera commented Jun 24, 2017

I haven't yet looked into the awesome details here, but we have an older conversation with a similar overall goal in mind (though maybe different context): plotly/plotly.py#550 (comment)

At the time we pondered that maybe the Python side could serialize with np.ndarray.tobytes into a WebSocket of binaryType: "arraybuffer". The resulting array buffer is directly usable with WebGL (regl or gl-vis). This way, there's still interprocess communication (I think there must be, at least in the browser) but it is limited to the minimum, and it should create no intermediary representation or storage, just a direct array->array binary flow. Wondering if this approach would be slower or faster than the above character based approach.

@maartenbreddels
Copy link

Hi,

There is no character based approach what I describe, maybe it seem that way since it is (partly) json, but all the array data is binary transfer with minimal amount of copies. Actually, I wouldn't recommend using np.ndarray.tobytes (which makes a copy), the memoryview(ar) strategy we use avoids that extra copy. Hope this clarifies our approach a bit more. On the JS part, indeed the buffer can be passed to the typed arrays, which can then be directly fed into the WebGL API, with no (or minimal) memory copies.

cheers,

Maarten

@monfera
Copy link
Contributor

monfera commented Jun 25, 2017

Thanks for the clarification @maartenbreddels - your approach looks like the one to be followed!

@etpinard
Copy link
Contributor

Copying from @jmmease's #2388 (comment), a proposal on how to encode large typed array inside JSONs:

I assume that a JSON list of numbers should be supported, but this wouldn't really offer efficiency gains in terms of storage size and (de)serialization time. Would it make sense to also support a HEX-string encoding of the typed array buffer?

If there is a shape property alongside type and vals then this same HEX-string approach could also be used to encode multi-dimensional arrays (e.g. by assuming row-major ordering).

@antoinerg
Copy link
Contributor

Somewhat related to this thread and the topic of data serialization: I came across Apache Arrow which is a cross-language in-memory representation for columnar data to go from the current inefficient copy & convert:

to a much-more efficient:

@alexcjohnson
Copy link
Collaborator

cc @catherinezucker - This came up at gluecon, would be really useful for volume rendering.

@jackparmer
Copy link
Contributor Author

This issue has been tagged with NEEDS SPON$OR

A community PR for this feature would certainly be welcome, but our experience is deeper features like this are difficult to complete without the Plotly maintainers leading the effort.

Sponsorship range: $10k-$15k

What Sponsorship includes:

  • Completion of this feature to the Sponsor's satisfaction, in a manner coherent with the rest of the Plotly.js library and API
  • Tests for this feature
  • Long-term support (continued support of this feature in the latest version of Plotly.js)
  • Documentation at plotly.com/javascript
  • Possibility of integrating this feature with Plotly Graphing Libraries (Python, R, F#, Julia, MATLAB, etc)
  • Possibility of integrating this feature with Dash
  • Feature announcement on community.plotly.com with shout out to Sponsor (or can remain anonymous)
  • Gratification of advancing the world's most downloaded, interactive scientific graphing libraries (>50M downloads across supported languages)

Please include the link to this issue when contacting us to discuss.

@astroboylrx
Copy link

cc @catherinezucker - This came up at gluecon, would be really useful for volume rendering.

I saw that plotly.graph_objects.Volume generates an HTML that stores value, x, y, z in 1D ascii array.
Would it be possible to make it a typed array in base64 format?

@gvwilson gvwilson self-assigned this Jun 10, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson changed the title Idea: Allow plotly.js to accept numpy buffers allow plotly.js to accept numpy buffers Aug 8, 2024
@gvwilson gvwilson added feature something new P3 not pressing labels Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P3 not pressing
Projects
None yet
Development

No branches or pull requests

9 participants