Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow server to send Erlang Term Format #616

Closed
josevalim opened this issue Feb 6, 2020 · 11 comments
Closed

Allow server to send Erlang Term Format #616

josevalim opened this issue Feb 6, 2020 · 11 comments

Comments

@josevalim
Copy link
Member

This will improve serialization time on the server, especially for the format sent over the wire by LiveView (a lot of binaries).

@zookzook
Copy link

zookzook commented Feb 27, 2020

I took this idea and implemented a small demo. I checked various BERT javascript implementations and used them to build a new one, because the browser needs only to encode the JSON part of the push message as ArrayBuffer. After that I just configured the client and the server to use BERT serializer.

I did some benchmarks as well. I used the rainbow demo app for this and started the erlang observer. You can find the code and the result here: demo

I didn't make a PR because I need only to configure the serializer for the client and server. It seems that the code fits better to the phoenix framework than to the live view framework, because you can use the binary WebSocket for other channels and use cases.

@josevalim
Copy link
Member Author

Awesome work @zookzook! ❤️ I am glad to see it does reduce scheduler utilization! However, I am a bit curious to find the serialized output is larger than JSON. Do you have any insights on why this is the case?

Regarding the proof of concept, I was thinking only the server would send BERT but not receive it. There are some complications on receiving BERT, such as the user being able to pass in anonymous functions and others, which can open up the server for remote code execution.

And yes, you are correct, this would actually be a feature in Phoenix itself. And you are correct, we don't need a full sized BERT implementation. We don't have to support all data-types on the client and, as I mentioned, the client will still send JSON.

What do you think?

@OvermindDL1
Copy link

OvermindDL1 commented Feb 27, 2020

From what I recall, BERT is faster and more capable than JSON, however JSON tends to be surprisingly smaller than a lot of other formats just because it's delimiters for its 'types' are so tiny (single char in most cases) and the encodings of the data as strings can often be smaller than more accurate encodings until you start doing bit-swizzling or so that it's actually very small except for specially made formats like ASN1.

As an aside, ASN1 in Elixir, since there is an ASN1 compiler in erlang, it takes an ASN1 definition and makes erlang out of it, would be very nice to be a default elixir compiler, remaking it in elixir would be painful as it's a rather complicated and large spec, but it makes wonderfully fantastic tiny encodings if used well. ^.^

@josevalim
Copy link
Member Author

@OvermindDL1 my biggest concern with JSON though, especially in regards to LiveViews, is the processing to handle escape characters such as newlines and unicode.

@OvermindDL1
Copy link

Those do indeed balloon a little, but are they really that common?

@josevalim
Copy link
Member Author

For LiveView, since we are sending templates+HTML over the wire, then yes. :) But my concern here is not size, rather the cost of processing all strings, while in BERT we just send them as is.

@zookzook
Copy link

Awesome work @zookzook! ❤️ I am glad to see it does reduce scheduler utilization! However, I am a bit curious to find the serialized output is larger than JSON. Do you have any insights on why this is the case?

As OvermindDL1 already mentioned, JSON is very compact and in BERT a type, the length and the content are serialized and so the serialized version becomes a little bit longer. But it depends on the content. If you send a lot of binary data (images or sounds) the output size gets smaller, because there is no need for base64 encoding. In my first try I used BJSON and in this case the payload became bigger, too. So I tried BERT and got the same result for the size.

Regarding the proof of concept, I was thinking only the server would send BERT but not receive it. There are some complications on receiving BERT, such as the user being able to pass in anonymous functions and others, which can open up the server for remote code execution.

You are right. The first version of BERT-Serializer did only encode the push event from the server to the client. But it is very easy to provide a "half" BERT Serializer among the "full" Serializer for people who don't want to take risks. The idea to use the BERT encoding to push events from the client to server, is to send binary content like images, so the base64 encoding in case of JSON can be omitted.

What do you think?

I will investigate some time to examine all use cases. But what I can say now is that serialization using BERT on the server is the main reason for more output and lower scheduler utilisation.

@josevalim
Copy link
Member Author

The idea to use the BERT encoding to push events from the client to server, is to send binary content like images, so the base64 encoding in case of JSON can be omitted.

We were working on a multipart-like upload as part of #104 and it could be used to send large binaries. If we are going to allow BERT from the client, then we would need to always validate it recursively on the server and purge anything that is not a list, binary, map or number.

I will investigate some time to examine all use cases. But what I can say now is that serialization using BERT on the server is the main reason for more output and lower scheduler utilisation.

Wait, output in your example app is not the output size but rather the output rate? If so, I misread it as size and it may be that BERT is more compact after all. :D

@zookzook
Copy link

Wait, output in your example app is not the output size but rather the output rate? If so, I misread it as size and it may be that BERT is more compact after all. :D

I checked the serialized data sizes (byte_size(term)/byte_size(json)): TERM is about 1,25 times greater than JSON in the rainbow demo. And the output rate is about 1,25 times greater with TERM.

For another example where I use a form with three input types, a textarea and a submit button, then the factor byte_size(term)/byte_size(json) moved to < 1. (About 0.94). The more html is delivered, the smaller the factor, because JSON needs to escape some characters like ".

The size depends on the content.

@snewcomer
Copy link
Contributor

This was a phenomenal implementation! Thank you for this. I learned a lot. I'm seeing similar with BERT - slightly lower (not significant) utilization and ~5kb less (3-4% shavings) reduced payload down the wire in an infinite scrolling table I'm experimenting with.

@chrismccord
Copy link
Member

We added binary serialization for uploads, but erlang term format did not appear to show significant gains over JSON in our experiments, and CPU gains were marginal, so I don't think it's worth it currently for the added client bundle size and complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants