-
Notifications
You must be signed in to change notification settings - Fork 268
Output DnaWasm as gzip-compressed, base-64 encoded list of Strings #1958
base: develop
Are you sure you want to change the base?
Conversation
o Use DnaWasm type for all DNA WASM encoding, including in `hc package`
Ok(base64::encode(&wasm_buf)) | ||
Ok(DnaWasm::from_bytes(wasm_buf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were actually not even using DnaWasm's Serialize methods to output WASM; we were just serializing to base-64, and it happened to be compatible...
@@ -21,6 +21,7 @@ serde_derive = "=1.0.89" | |||
serde_json = { version = "=1.0.39", features = ["preserve_order"] } | |||
lazy_static = "=1.2.0" | |||
multihash = "=0.8.0" | |||
flate2 = "=1.0.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're already using flate2 in core_types, so size increment should be small.
@pjkundert this sounds like a cool idea, would you be interested in looking into it claims ~3-4x speedup on encode/decode logic by using the AVX2 algo https://docs.rs/radix64/0.6.2/radix64/index.html i'm bringing it up now because it seems topical and also might produce incompatible hashes moving forward if the encoding character set is different |
Hey, I would highly suggest we introduce a binary DNA format instead. Discussed several times already but never got to it because it wasn't a priority. But 99% of the bytes of a DNA are WASM binary code and don't make sense to be viewed in an editor anyway. Making it even bigger with base64 just to be able to stick to JSON, and then having to deal with the problems of that, feels very weird. That is, if this working already, awesome and thanks for proactively taking this on. Just wanted to make visible that there might be another solution we might want to approach long-term anyway. |
@lucksus what makes sense to you for the rest of the JSON data? |
I really like the single .dna.json. Sure, a big blog of binary data in there is an eye-sore, but its a single, user-readable, easy to manage package. Separate binary files will present a more difficult DNA package consistency problem. We seldom need to peer inside the DNA package, so even keeping the one long String is fine (as long as we compress it to keep the size down). We've long had base-64 compressed binary data (see: usenet), and they're not bad. If we need to get that 5/4 expansion back (which is unlikely), it easily compresses away. Mostly, the change from big String to list of Strings is to support the rare occasion when we do need to take a quick look at some of the other properties in the Objects in the JSON, that's all. |
I've left the |
@pjkundert @lucksus i feel like this discussion got lost a bit recently, is there something to merge or chase up here or should we close this? |
I think that the question of what This maintains backward compatibility to load existing DNAs, while compressing and new DNAs in a list of strings, instead of one multi-megabyte long string, which is vastly more useful for any human interaction with the DNA file... I think, most importantly, it restores the usage of the DnaWasm's serialization to control the output of the |
I think its a step improvement in DNA WASM <-> JSON handling, and won't interfere too much with any future improvements we might want to do. |
@pjkundert i'm happy with all that, makes sense to me, this isn't preventing us from doing anything @lucksus is suggesting in some future work so why not merge? :) |
08be816
to
610d5c0
Compare
fd9f885
to
7a969b8
Compare
It would be nice, I think, to get this DNA WASM compression into the product before we get too far into open alpha. The DNA space savings are pretty dramatic, and the produced .dna.json file is quite a bit more user-friendly. |
PR summary
Output .dna.json files are unwieldy and large, because WASM is output as a single base-64 encoded String. It is difficult to view .dna.json files in pagination utilities (eg.
less
), due to the extreme length of the WASM String. Produces, instead, something like this (with a list of 128-symbol JSON strings of base-64, instead of a 6 Megabyte single base-64 string):Also, since WASM contains much redundancy, Gzip-encoding yields ~75% reduction in sizes.
Serializes to a list of Strings, instead of just a big String; small Zomes (< 1KB, eg. for tests) still round-trip to a single String.
Historical single String (eg.
{ "code": "AbC...XyZ==" }
) WASM will load fine, but will not produce an equivalenthc hash
. Sincehc hash
was broken until recently, this should be fine; all futureDnaWasm
s serialized will produce consistenthc package
andhc hash
hashes.testing/benchmarking notes
( if any manual testing or benchmarking was/should be done, add notes and/or screenshots here )
followups
( any new tickets/concerns that were discovered or created during this work but aren't in scope for review here )
changelog
documentation