Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size of serialized DOM #151

Open
eoghanmurray opened this issue Nov 25, 2019 · 6 comments
Open

size of serialized DOM #151

eoghanmurray opened this issue Nov 25, 2019 · 6 comments

Comments

@eoghanmurray
Copy link

@eoghanmurray eoghanmurray commented Nov 25, 2019

I'm seeing 10x character size of the serialization of the initial DOM state (EventType.FullSnapshot) compared with a plain HTML representation of the same thing.
Is minimizing the size of this on the agenda as a design goal?

I'm thinking that it could be reduced as follows:

  • simple things like renaming attributes to attrs
  • not storing empty childNodes/attributes lists/objects (making them implicit)
  • removing type: 2 (type: NodeType.Element) and similar, as that can be inferred from presence of childNodes
  • only setting isSVG/isStyle boolean attributes if they are unusual (i.e. True)

Are there any strong reasons not to do any of the above?

@IMFIL

This comment has been minimized.

Copy link
Contributor

@IMFIL IMFIL commented Nov 26, 2019

Reducing the size of the serialized DOM would be great. Did you want to create a PR ?

@bingjie3216

This comment has been minimized.

Copy link

@bingjie3216 bingjie3216 commented Nov 26, 2019

+1 on this proposal. Besides, what we can do is:
Provide the option of using relative path for the styles and css instead of downloading the whole content. I feel like the css/styles have taken too much space.

@IMFIL

This comment has been minimized.

Copy link
Contributor

@IMFIL IMFIL commented Nov 27, 2019

@bingjie3216 There's an option to keep absolute css paths within the html. Even with this option on the DOM size is sizeable.

@Yuyz0112

This comment has been minimized.

Copy link
Member

@Yuyz0112 Yuyz0112 commented Nov 27, 2019

@eoghanmurray @IMFIL @bingjie3216 Thanks for the feedback!

I believe there is a huge potential to reduce the size of the recorded events(I always seeing 90% size reduction when I gzip the events).
But keep the data structure explicit is also very important.

So the works for reducing size may contain the following part:

  1. Build a sizer tool, which can show the distribution of events size. This helps:
    1.1 Find the bottleneck of size for any specific situation.
    1.2 Check how will the record options affect the size.
    1.3 Check how will the pack strategies affect the size.
  2. Provide some pack/unpack strategies, which should be pluggable because it may introduce overhead.

Some pack/unpack strategies I know including:

  • The way @eoghanmurray suggested or similar, which is hand-made and rrweb specific.
  • MessagePack. It's like JSON, but fast and small.
  • pako. high-speed zlib port to javascript.
@Yuyz0112

This comment has been minimized.

Copy link
Member

@Yuyz0112 Yuyz0112 commented Jan 5, 2020

Sorry for the later.
After finishing a lot of works last month, finally, I've got time to start working on rrweb again!

I think this issue is the most important one in the current stage, and I would like to provide a solution int the next major release.

With the ideas that I illustrated above, I have done some POC code in this repo.

Currently, I have implemented a analyze framework and several packers:

  1. simple packer. Following @eoghanmurray's comments, this packer makes the keys shorter and omits some keys which can be inferred by the data structure.
  2. msgpack packer. Use msgpack-javascript to encode and decode events.
  3. pako packer. Use pako to deflate and inflate events.

Now the msgpack packer is not working as intend and I'm still checking my implementation. The other two shows some good result when testing on two real-world events log.

I'm using two real-world events log to benchmark the packers:

  • e1: An events log with a big full snapshot.
  • e2: An events log with many incremental snapshots created by a table-like UI, which means the DOMs are similar to others.

===

simple

e1

"packedSize": 1870789,
"size":       2115468,

e2

"packedSize": 6023940,
"size":       10457884,

pako

e1

"packedSize": 1093306,
"size":       2115468,

e2

"packedSize": 1435585,
"size":       10457884,
@MillionQW

This comment has been minimized.

Copy link

@MillionQW MillionQW commented Jan 6, 2020

Great! Thanks to the author for this contribution. Events size have been bothering us. Can we use the demo in production?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.