Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use msgpack to encode & decode data for channel transfer #204

Merged
merged 2 commits into from May 3, 2023
Merged

Conversation

georgestagg
Copy link
Member

@georgestagg georgestagg commented May 2, 2023

The @msgpack/msgpack npm library is a JS implementation of the serialisation format described at https://msgpack.org. It is designed to be a fast and efficient binary serialisation scheme, and IMO is a better fit for the types of messages (i.e. containing possibly large raw data) that we send over the webR channels.

Using msgpack, rather than encoding messages with JSON, leads to a significant performance improvement when sending large messages. See, for example, the benchmark given in #203. In that test there is a greater than 10x speedup in transferring data from the main thread to the worker thread with this change.


Additionally, a minor change is made to toWebRData fixing an issue where ArrayBuffer is treated like an object rather than an array, generating a very large names array containing stringified indexes (e.g. ['1', '2', ..., '10485760']). In my own quick benchmark, this fix takes the time for an await webR.RRaw(object) for a 10MB object down from around 10 seconds to 500 ms.

@jeroen
Copy link
Contributor

jeroen commented May 2, 2023

Perhaps orthogonal to this, but maybe there could be a special case for array buffers, when no encoding is needed, and you can send the buffer as-is?

We have the same issue in the V8 package: when copying data between R to JS, it gets encoded as json, except for arraybuffers which are directly copied into R's "raw vectors" and vice versa:

So this provides a fast low level mechanism to share large binary data between R and JavaScript. It seems the same could apply for writeFile which already requires a Uint8Array data to begin width?

@georgestagg
Copy link
Member Author

At the moment with webR the writeFile message requires other information in addition to the raw file data,

{
  type: 'writeFile';
  data: {
    path: string;
    data: ArrayBufferView;
    flags?: string;
  };
}

Creating a new R raw vector with new RRaw(data) leads to a similar message structure where the type property encodes a particular operation.

In the context of the webR communication channel, I don't think it's a good idea for us to assume that a raw ArrayBuffer sent without any extra information means any one particular operation (even something as obvious as "turn this into an R raw atomic vector") since we also use the channel for several other purposes. Though, saying that, I am willing to revisit the idea in the future if binary data transfer continues to cause performance problems.

Another option would be to think about how to send standard messages and then follow up with (or have a side channel for) extra raw ArrayBuffer data. That would probably feel similar to using the transfer argument with PostMessage. Again, though, such a setup would require keeping track of some extra state so that the threads know what is going on, making things tricky to get right.

I think this PR is a good start. I assume that since msgpack advertises efficiency, it copies the Uint8Array data into the binary output with good performance. Certainly, it will be a vast improvement over encoding each byte into JSON and then parsing at the other end. Also, right now at least one buffer copy will always be required as part of the messaging scheme that we are using in place of PostMessage in any case.

@georgestagg georgestagg requested a review from lionel- May 2, 2023 09:46
Improves performance when working with `ArrayBuffer`s by avoiding
the incorrect creation of a very large array of names containing
stringified indexes.
jeroen added a commit to r-universe-org/cranlike-server that referenced this pull request May 2, 2023
@jeroen
Copy link
Contributor

jeroen commented May 2, 2023

OK thanks. I'm going to 🚀 this into production and see if I run into any issues.

@jeroen
Copy link
Contributor

jeroen commented May 2, 2023

So far things look good. The data export tool is much faster and it also seems to have resolved a mysterious bug where data would sometimes get corrupted in writeFile.

Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, and it looks like msgpack handles complex types such as typed arrays out of the box, and is extensible if needed. This is a great improvement.

@georgestagg georgestagg merged commit 407cfba into main May 3, 2023
1 check passed
@georgestagg georgestagg deleted the msgpack branch May 3, 2023 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants