New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use msgpack to encode & decode data for channel transfer #204
Conversation
Perhaps orthogonal to this, but maybe there could be a special case for array buffers, when no encoding is needed, and you can send the buffer as-is? We have the same issue in the V8 package: when copying data between R to JS, it gets encoded as json, except for arraybuffers which are directly copied into R's "raw vectors" and vice versa:
So this provides a fast low level mechanism to share large binary data between R and JavaScript. It seems the same could apply for |
At the moment with webR the
Creating a new R raw vector with In the context of the webR communication channel, I don't think it's a good idea for us to assume that a raw Another option would be to think about how to send standard messages and then follow up with (or have a side channel for) extra raw I think this PR is a good start. I assume that since msgpack advertises efficiency, it copies the |
Improves performance when working with `ArrayBuffer`s by avoiding the incorrect creation of a very large array of names containing stringified indexes.
OK thanks. I'm going to 🚀 this into production and see if I run into any issues. |
So far things look good. The data export tool is much faster and it also seems to have resolved a mysterious bug where data would sometimes get corrupted in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, and it looks like msgpack handles complex types such as typed arrays out of the box, and is extensible if needed. This is a great improvement.
The @msgpack/msgpack npm library is a JS implementation of the serialisation format described at https://msgpack.org. It is designed to be a fast and efficient binary serialisation scheme, and IMO is a better fit for the types of messages (i.e. containing possibly large raw data) that we send over the webR channels.
Using msgpack, rather than encoding messages with JSON, leads to a significant performance improvement when sending large messages. See, for example, the benchmark given in #203. In that test there is a greater than 10x speedup in transferring data from the main thread to the worker thread with this change.
Additionally, a minor change is made to
toWebRData
fixing an issue whereArrayBuffer
is treated like an object rather than an array, generating a very largenames
array containing stringified indexes (e.g.['1', '2', ..., '10485760']
). In my own quick benchmark, this fix takes the time for anawait webR.RRaw(object)
for a 10MB object down from around 10 seconds to 500 ms.