Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Sparse Merkle Trees (SMTs) designs #1472
SSZ for general purpose merkleization is lacking a few types, of which sparse merkle trees are probably the most different from what is already there.
It only passed by a few times in the specs:
However I do think it is necessary to have if SSZ is to be adopted more widely than the current spec state machines.
There is not one way to implement this, we have many options:
Personally I dislike breaking changes here, and feel like those solutions are too specialized, hard to optimize or simply too incompatible with other parts of the spec.
The full hashing is also something I like to avoid, 256 hashes is computationally unacceptable, even with proof encoding compression (something proof backings should be optimizing, not sparse merkle trees).
Another possible variant of the compact sparse merkle trees, to make it work somewhat with generalized indices, and define two missing sparse data structures:
We define a
Now, to access a set or map value with a stable generalized index we need the generalized index to know where to stop and look for a
A generalized index could be like:
And then add an annotation like "at bit index
Such annotation could be just an index, maybe 2 bytes with the first bit reserved to describe the map case (2**15 = 32768 max merkle depth).
And one modification could be made to consider the hash with zeroes (
As I am implementing a merkle-tree backing and typed views/partials in Go I'm starting to be a fan of single-node mixins. I.e. A mix-in like the list length is not opt-out: it's always part of a proof about the data. You are forced into the right direction, enabled to check in every case, and deal with less edge cases.
Similarly the zeroes as stopper for the sparse tree work well, they are always part of the proof data.
Edited this out of the main-post. It is a complementary idea to the compact trees that has some efficiency problems (2x proof size of individual leaf), but offers some fun new possibilities. Think of it as a thought-experiment to complete the picture of the different compact merkle trees.
The idea of compact sparse merkle trees has been re-hashed (no pun intended
Compact tree affinity
One thing that is particularly striking is that prefixes just fit well with binary trees and our generalized indices:
Vector commitment mixin
Now take a look at SSZ lists:
Now this new idea is to do the same for sparse trees: separate the vector commitment, and call it the "sparse tree mixin".
And this vector commitment can be a compact sparse tree of keys, optimized exactly the same way.
And since the vector commitment may not change as much (well, in some cases it does, in others never at all), an extra hash for an update there wouldn't be as bad.
Also note that the contents part would back the
And now that the vector commitment can be summarized into 32 bytes, and the contents commitment part is compatible with regular SSZ trees, we can have really interesting type-less but safe tree interactions:
Another special property is that moving a value by just a little (slight key alternation) may only change the keys commitment part.
Compactness vs Stable depth
The remaining problem here is that compactness on a sparse structure cannot be achieved without breaking the stable depth. A generalized index to an actual sparse tree leaf cannot be derived without knowing about the contents of the tree. On the bright side, the separation of the vector commitment makes this easy to learn and communicate.
The compactness gets us
size: If the only use case is to both proof the key and value at the same time, and the tree changes a lot, it would be ~2x more data (separate key and value proofs) compared to regular compact trees that harden the leaf nodes with a position. The total amount of nodes may be comparable still, but the 2x is especially noticeable when proofing
Out of all the options I think I like leaf-level mixins the most (so to store
From this list of pairs, you can translate the generalized index into a "modified generalized index" (basically, the bit positions corresponding to levels that were elided in the tree get snipped out of the generalized index), and generate a list of mixin checks ("check that the value at generalized index x is y").
def transform_generalized_index(gindex: Bitlist, positions: Sequence[Tuple[int,int]]) -> Bitlist: o =  my_pos = 0 for pos, length in positions: assert pos >= my_pos # my_pos:pos is everything in between the dicts, pos:pos+branch is the branch within the dict itself o.extend(gindex[my_pos:pos + length]) # Keys have length 32, plus one bit for the mixin my_pos = pos + 33 # Mixin is the left child of the leaf node, value is the right child o.append(1) return o
The function could also be augmented to return a list of generalized indices at the leaf of each dict along the path, along with the key; this would be fed into a Merkle branch or multiproof checker so it can verify that the key provided in the mixin equals to the expected key.
Instead of transforming the generalized indices, and then still having to pass the information to check the keys, we could also mark the "anchor points" with generalized indices:
@Tolsi Here is my view, primarily focused on how compact trees compare. I probably missed a few things, but will do a proper full read when I have more time.
If single-node branches are already reduced to just the node itself, and keys are random and uniform, I do not think rebalancing is helpful. It is more like another way to compress a set of leafs into a binary looking tree, just with more edge-cases an optimized merkleization implementation has to deal with. A compact tree of
There do not have to be internal keys indeed; if you already key the leafs, the internal nodes just have to reach them indeed, and be able to verify the leaf corresponds to some key. And IMHO it is much better if it is intuitive and simple: walk the path of the key until you find the key. Rather than determine your path based on the entire contents of the tree. All of this may end up in a every-op-counts smartcontract. And although the verifier does not need to know where the leaf was found, being able to quickly locate and read/write a leaf in a proof is important.
Another thing with SSZ is to keep optimized tree descriptions separate from merkleization: new use-cases and algorithms will keep emerging. Merkleization should just be fast, minimal and straightforward. The lookup tricks and tree description can be provided alongside the proof. A "backing" is a representation that implements the merkle proof tree traversal needs, and optimizes reads/writes/etc. as it likes. Tree-rotation is something too much (ssz trees are static and predictable, and we hope SMTs can be nearly-statically described), but skip lists etc. can be part of the "backing". E.g. you can optimize lookups without moving nodes perfectly fine. See tree-offsets described here: https://github.com/protolambda/eth-merkle-trees
Yes, deterministic structures and values are better. A SSZ SMT should hash to the same root regardless of tree constructor or insertion order.
Regarding determinism in compact trees: I do think choosing keys is not so much of a problem: keys are already hashes of other data (with the exception of basic keys, see below). And because of the compacting property (a branch stops at a single node, no extra depth), the only real attack surface is to insert a large amount of keys, or lots of keys with the same prefix (which is exponentially hard the longer the prefix is, so only effectively adds a few hashes).
We should run some numbers on compact trees, but I don't expect this worst case to occur very often with uniform keys.
As with any multi-merkle-proof? Providing just the leaf nodes, and witness data that can't be constructed from lower witness-data and leaves is relatively easy. There is some (naive but readable) pseudo code here: multiproofs. Compression of the actual description of the tree is then a responsibility of the "backing", replaceable later.
Impressed by the amount of analysis in the paper, but prefer simplicity and closer affinity with the general binary merkle spec. The
Ah, I was thinking of the case where the SMT is fully embedded into the SSZ suite, so you could have the SMT be an element of other structures, other structures be an element of the SMT, etc, with possibly multiple SMTs in one path. Yes, if you have a gindex for just an SMT leaf from its own root then that is sufficient.
Right, for each SMT anchor (start of the tree, i.e. the SMT root) in an SSZ structure you would need to declare it's an SMT to get the special traversal behavior. But nothing extra depending on SMT contents. And it's just the anchor to be described if we standardize on 32 byte SMT keys. (Take hash-tree-root of key).
You would also need to specify the length of the branches. Otherwise if you just provide a branch it's not clear what part is supposed to go down to the leaf and what part is doing something further after that leaf. One possible case to think about would be a direct 2-level SMT (ie. an SMT of SMTs); you need to declare where the first one ends and the second begins.
Well, you don't need an end and start I think, just the anchors of each SMT: if you have the start (anchor) info of both SMTs, and you traverse the first one, you can skip 256 bits of gindex after this first anchor. Navigate the SMT till you hit the bottom (zero right hand, or 256 gindex bits all consumed), check the key (against the skipped 256 bits), and then continue reading the remaining part of the gindex to navigate deeper. Which could be into the next deeper SMT, anchored at
I was thinking (and described it in