Allow server to send Erlang Term Format #616

josevalim · 2020-02-06T19:51:17Z

This will improve serialization time on the server, especially for the format sent over the wire by LiveView (a lot of binaries).

zookzook · 2020-02-27T08:33:06Z

I took this idea and implemented a small demo. I checked various BERT javascript implementations and used them to build a new one, because the browser needs only to encode the JSON part of the push message as ArrayBuffer. After that I just configured the client and the server to use BERT serializer.

I did some benchmarks as well. I used the rainbow demo app for this and started the erlang observer. You can find the code and the result here: demo

I didn't make a PR because I need only to configure the serializer for the client and server. It seems that the code fits better to the phoenix framework than to the live view framework, because you can use the binary WebSocket for other channels and use cases.

josevalim · 2020-02-27T09:14:35Z

Awesome work @zookzook! ❤️ I am glad to see it does reduce scheduler utilization! However, I am a bit curious to find the serialized output is larger than JSON. Do you have any insights on why this is the case?

Regarding the proof of concept, I was thinking only the server would send BERT but not receive it. There are some complications on receiving BERT, such as the user being able to pass in anonymous functions and others, which can open up the server for remote code execution.

And yes, you are correct, this would actually be a feature in Phoenix itself. And you are correct, we don't need a full sized BERT implementation. We don't have to support all data-types on the client and, as I mentioned, the client will still send JSON.

What do you think?

OvermindDL1 · 2020-02-27T15:35:21Z

From what I recall, BERT is faster and more capable than JSON, however JSON tends to be surprisingly smaller than a lot of other formats just because it's delimiters for its 'types' are so tiny (single char in most cases) and the encodings of the data as strings can often be smaller than more accurate encodings until you start doing bit-swizzling or so that it's actually very small except for specially made formats like ASN1.

As an aside, ASN1 in Elixir, since there is an ASN1 compiler in erlang, it takes an ASN1 definition and makes erlang out of it, would be very nice to be a default elixir compiler, remaking it in elixir would be painful as it's a rather complicated and large spec, but it makes wonderfully fantastic tiny encodings if used well. ^.^

josevalim · 2020-02-27T16:22:31Z

@OvermindDL1 my biggest concern with JSON though, especially in regards to LiveViews, is the processing to handle escape characters such as newlines and unicode.

OvermindDL1 · 2020-02-27T17:19:44Z

Those do indeed balloon a little, but are they really that common?

josevalim · 2020-02-27T17:54:49Z

For LiveView, since we are sending templates+HTML over the wire, then yes. :) But my concern here is not size, rather the cost of processing all strings, while in BERT we just send them as is.

zookzook · 2020-02-27T18:37:42Z

Awesome work @zookzook! ❤️ I am glad to see it does reduce scheduler utilization! However, I am a bit curious to find the serialized output is larger than JSON. Do you have any insights on why this is the case?

As OvermindDL1 already mentioned, JSON is very compact and in BERT a type, the length and the content are serialized and so the serialized version becomes a little bit longer. But it depends on the content. If you send a lot of binary data (images or sounds) the output size gets smaller, because there is no need for base64 encoding. In my first try I used BJSON and in this case the payload became bigger, too. So I tried BERT and got the same result for the size.

Regarding the proof of concept, I was thinking only the server would send BERT but not receive it. There are some complications on receiving BERT, such as the user being able to pass in anonymous functions and others, which can open up the server for remote code execution.

You are right. The first version of BERT-Serializer did only encode the push event from the server to the client. But it is very easy to provide a "half" BERT Serializer among the "full" Serializer for people who don't want to take risks. The idea to use the BERT encoding to push events from the client to server, is to send binary content like images, so the base64 encoding in case of JSON can be omitted.

What do you think?

I will investigate some time to examine all use cases. But what I can say now is that serialization using BERT on the server is the main reason for more output and lower scheduler utilisation.

josevalim · 2020-02-27T19:03:31Z

The idea to use the BERT encoding to push events from the client to server, is to send binary content like images, so the base64 encoding in case of JSON can be omitted.

We were working on a multipart-like upload as part of #104 and it could be used to send large binaries. If we are going to allow BERT from the client, then we would need to always validate it recursively on the server and purge anything that is not a list, binary, map or number.

I will investigate some time to examine all use cases. But what I can say now is that serialization using BERT on the server is the main reason for more output and lower scheduler utilisation.

Wait, output in your example app is not the output size but rather the output rate? If so, I misread it as size and it may be that BERT is more compact after all. :D

zookzook · 2020-02-28T08:30:51Z

Wait, output in your example app is not the output size but rather the output rate? If so, I misread it as size and it may be that BERT is more compact after all. :D

I checked the serialized data sizes (byte_size(term)/byte_size(json)): TERM is about 1,25 times greater than JSON in the rainbow demo. And the output rate is about 1,25 times greater with TERM.

For another example where I use a form with three input types, a textarea and a submit button, then the factor byte_size(term)/byte_size(json) moved to < 1. (About 0.94). The more html is delivered, the smaller the factor, because JSON needs to escape some characters like ".

The size depends on the content.

snewcomer · 2020-03-15T03:57:16Z

This was a phenomenal implementation! Thank you for this. I learned a lot. I'm seeing similar with BERT - slightly lower (not significant) utilization and ~5kb less (3-4% shavings) reduced payload down the wire in an infinite scrolling table I'm experimenting with.

chrismccord · 2022-06-05T12:30:23Z

We added binary serialization for uploads, but erlang term format did not appear to show significant gains over JSON in our experiments, and CPU gains were marginal, so I don't think it's worth it currently for the added client bundle size and complexity.

josevalim added the enhancement label Mar 31, 2020

chrismccord closed this as completed Jun 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow server to send Erlang Term Format #616

Allow server to send Erlang Term Format #616

josevalim commented Feb 6, 2020

zookzook commented Feb 27, 2020 •

edited

Loading

josevalim commented Feb 27, 2020

OvermindDL1 commented Feb 27, 2020 •

edited

Loading

josevalim commented Feb 27, 2020

OvermindDL1 commented Feb 27, 2020

josevalim commented Feb 27, 2020

zookzook commented Feb 27, 2020

josevalim commented Feb 27, 2020

zookzook commented Feb 28, 2020

snewcomer commented Mar 15, 2020

chrismccord commented Jun 5, 2022

Allow server to send Erlang Term Format #616

Allow server to send Erlang Term Format #616

Comments

josevalim commented Feb 6, 2020

zookzook commented Feb 27, 2020 • edited Loading

josevalim commented Feb 27, 2020

OvermindDL1 commented Feb 27, 2020 • edited Loading

josevalim commented Feb 27, 2020

OvermindDL1 commented Feb 27, 2020

josevalim commented Feb 27, 2020

zookzook commented Feb 27, 2020

josevalim commented Feb 27, 2020

zookzook commented Feb 28, 2020

snewcomer commented Mar 15, 2020

chrismccord commented Jun 5, 2022

zookzook commented Feb 27, 2020 •

edited

Loading

OvermindDL1 commented Feb 27, 2020 •

edited

Loading