Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: stabilizing encoding #219

Merged
merged 65 commits into from
Jan 2, 2024
Merged

feat: stabilizing encoding #219

merged 65 commits into from
Jan 2, 2024

Conversation

zxch3n
Copy link
Member

@zxch3n zxch3n commented Dec 14, 2023

This PR implements a new encode schema that is more extendible and more compact. It’s also simpler and takes less binary size and maintaining effort. It is inspired by the Automerge Encoding Format.

The main motivation is the extensibility. When we integrate a new CRDT algorithm, we don’t want to make a breaking change to the encoding or keep multiple versions of the encoding schema in the code, as it will make our WASM size much larger. We need a stable and extendible encoding schema for our v1.0 version.

This PR also exposes the ops that compose the current container state. For example, now you can make a query about which operation a certain character quickly. This behavior is required in the new snapshot encoding, so it’s included in this PR.

Encoding Schema

Header

The header has 22 bytes.

  • (0-4 bytes) Magic Bytes: The encoding starts with loro as magic bytes.
  • (4-20 bytes) Checksum: MD5 checksum of the encoded data, including the header starting from 20th bytes. The checksum is encoded as a 16-byte array. The checksum and magic bytes fields are trimmed when calculating the checksum.
  • (20-21 bytes) Encoding Method (2 bytes, big endian): Multiple encoding methods are available for a specific encoding version.

Encode Mode: Updates

In this approach, only ops, specifically their historical record, are encoded, while document states are excluded.

Like Automerge's format, we employ columnar encoding for operations and changes.

Previously, operations were ordered by their Operation ID (OpId) before columnar encoding. However, sorting operations based on their respective containers initially enhance compression potential.

Encode Mode: Snapshot

This mode simultaneously captures document state and historical data. Upon importing a snapshot into a new document, initialization occurs directly from the snapshot, bypassing the need for CRDT-based recalculations.

Unlike previous snapshot encoding methods, the current binary output in snapshot mode is compatible with the updates mode. This enhances the efficiency of importing snapshots into non-empty documents, where initialization via snapshot is infeasible.

Additionally, when feasible, we leverage the sequence of operations to construct state snapshots. In CRDTs, deducing the specific ops constituting the current container state is feasible. These ops are tagged in relation to the container, facilitating direct state reconstruction from them. This approach, pioneered by Automerge, significantly improves compression efficiency.

Performance Changes

Speed

Performance change compared to the main branch: https://app.warp.dev/block/PgfIexIuOyIgLvw8rUFd01

WASM binary size change

Name WASM Size Compressed WASM Size
Old 1.1MB 434KB
New 1.0MB 392 KB

Benchmarks on Drawing Scenario

Comparing the drawing scenario simulation with the current main branch.

Note: Compression on updates has been removed in the new version, leading to larger update sizes. Users can now select their preferred compression algorithm in the application code. GZip achieves approximately a 1:2 compression rate for Loro's exported snapshots or updates.

New

Commit id: c5a9de9

task action_size peer_num ops_num changes_num snapshot_size updates_size apply_duration encode_snapshot_duration encode_udpate_duration decode_snapshot_duration decode_update_duration
async draw 100 1 331 1 4331 3249 1.7253749999999999 0.444375 0.136125 0.270375 0.098417
async draw 1000 1 3183 1 41244 30924 8.089292 3.608 0.8943329999999999 1.706042 0.677125
async draw 10000 1 30425 1 412193 304779 51.759791 21.762458 5.855875 10.804625 4.471084
async draw 100 5 331 6 4306 3268 2.621417 0.211708 0.06962499999999999 0.125666 0.049458
async draw 1000 5 3183 40 41700 31300 24.57975 2.176625 0.7471669999999999 1.120708 0.450083
async draw 10000 5 30425 423 399620 298549 259.627834 25.277917000000002 7.533333 11.203167 4.434083
async draw 1000 10 3183 74 42253 31876 62.091167000000006 2.730542 1.007916 1.249708 0.485583
async draw 10000 10 30425 802 409375 307095 706.9671669999999 30.747792 12.96475 12.208833 4.6555
async draw 100000 10 300974 8020 4391001 3277488 7845.397167 412.351958 163.365458 197.313416 71.150375
async draw 100000 10 301649 8025 4393930 3279154 7741.7285 389.887416 164.647583 175.813834 58.29075
realtime draw 100 5 183 19 2611 2009 3.116041 0.158208 0.063375 0.092542 0.03625
realtime draw 1000 5 2224 225 30679 23151 36.664875 2.621333 0.7188329999999999 0.9147919999999999 0.350334
realtime draw 10000 5 19958 2078 279096 207387 340.255042 18.365584000000002 6.899958000000001 8.970832999999999 3.4971249999999996
realtime draw 1000 10 2224 247 30811 23345 88.070708 2.722125 0.7929999999999999 0.916875 0.355667
realtime draw 10000 10 19958 2267 280460 209220 906.6364169999999 28.438458999999998 9.982083 9.192499999999999 3.402541
realtime draw 100000 10 201536 22807 3020382 2250934 9471.699084000002 265.325334 97.806834 112.942625 42.071833000000005
realtime draw 100000 10 200147 22689 2996669 2231459 9421.214875 255.862458 109.221375 111.309667 39.806416999999996

Old

Commit id: 727b5c2

task action_size peer_num ops_num changes_num snapshot_size updates_size apply_duration encode_snapshot_duration encode_udpate_duration decode_snapshot_duration decode_update_duration
async draw 100 1 331 1 6309 3196 2.3123750000000003 0.152416 0.068583 0.163917 0.043292
async draw 1000 1 3183 1 65346 30412 3.6163749999999997 0.857167 0.563375 1.0254999999999999 0.259208
async draw 10000 1 30425 1 689366 127846 33.938833 9.000291 27.895834 12.676708999999999 3.2680830000000003
async draw 100 5 331 6 5841 3195 2.337041 0.10370900000000001 0.059500000000000004 0.125 0.035042000000000004
async draw 1000 5 3183 40 64579 30964 23.193790999999997 0.886 0.666792 1.04525 0.287791
async draw 10000 5 30425 423 648862 134323 270.331667 10.477291 30.871875 14.26975 4.268083
async draw 1000 10 3183 74 64215 32056 73.371667 1.158 1.283958 1.5499159999999998 0.39008400000000004
async draw 10000 10 30425 802 656302 139013 768.7101250000001 8.794791 34.85925 13.085458 3.876375
async draw 100000 10 300974 8020 7473115 1375519 7457.90225 169.94775 390.182417 186.63025 43.447292
async draw 100000 10 301649 8025 7463445 1374789 7288.698917 144.877208 409.18995800000005 182.909583 54.961083
realtime draw 100 5 183 19 3246 1897 1.795458 0.070958 0.06179199999999999 0.08354199999999999 0.027834
realtime draw 1000 5 2224 225 46273 22680 21.180457999999998 0.574958 0.6417499999999999 0.82675 0.24195799999999998
realtime draw 10000 5 19958 2078 428831 203476 190.994834 7.591584 6.350334 8.236833 2.2846670000000002
realtime draw 1000 10 2224 247 45682 22777 47.032917 0.5868340000000001 0.743958 0.966208 1.161958
realtime draw 10000 10 19958 2267 429854 205141 459.522916 7.437708000000001 9.953875 8.139208 2.444708
realtime draw 100000 10 201536 22807 5020772 1001571 5092.416 98.40154199999999 260.887208 130.19129199999998 39.607583000000005
realtime draw 100000 10 200147 22689 4981825 998571 5026.834375 92.946041 256.480167 109.06908299999999 34.260958

@zxch3n zxch3n marked this pull request as draft December 14, 2023 15:02
Copy link
Member

@Leeeon233 Leeeon233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the change in document size?

@zxch3n zxch3n marked this pull request as ready for review December 28, 2023 10:16
@zxch3n zxch3n changed the title WIP: feat: stabilizing encoding feat: stabilizing encoding Dec 28, 2023
crates/loro-internal/src/encoding.rs Show resolved Hide resolved
crates/loro-internal/src/diff_calc/tree.rs Outdated Show resolved Hide resolved
crates/loro-internal/src/encoding.rs Outdated Show resolved Hide resolved
firedbg/version.toml Outdated Show resolved Hide resolved
crates/loro-internal/src/diff_calc.rs Show resolved Hide resolved
@zxch3n zxch3n merged commit bc27a47 into main Jan 2, 2024
1 check passed
@zxch3n zxch3n deleted the feat-encode-stable branch January 2, 2024 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants