Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
MMR Storage Optimization #2873
There is a relatively minor change that we can make to our internal MMR implementation that would reduce the storage requirements significantly.
Currently we maintain a data file and a hash file.
In the trivial case of 2 leaves and a single parent we would store elements 1 and 2 in the data file and the hashes for positions 1, 2 and 3 in the hash file.
We could skip the leaves in the hash file, trading off reduced storage requirements for the added cost of needing to regenerate the hash of any leaf position.
The interesting thing is given an MMR of height n, removing the leaves produces an MMR of height n-1.
So in the example above, the MMR has height 2. We would not store the hashes for the leaves at positions 1 and 2 and just store the hash for the parent (also a peak) at position 3.
We would need to go back to the data file and hash an element if we required the hash for either leaf at position 1 or 2. But this is just a single additional hash operation.
We gain significant space savings by doing this (at least 50% of positions are leaf positions).
Today the kernel MMR hash file contains approx 1,200,000 hashes.
We can also take this further if we consider what we do with our MMRs.
There are basically 3 things we need to be able to do -
For the kernel MMR we do not need to provide the hash file to a peer (see #2743). The peer can rebuild the full hash file given only the underlying data file. This is because we do not prune the kernel MMR. For the output and rangeproof MMR we prune but we know which positions have been pruned, so we can work around this. Let us ignore (3) for now.
We can easily support (1) with only the hashes of the peaks.
To support (2) we need some subset rightmost hashes, beneath the rightmost peaks in the MMR. Removing rightmost leaf positions from the MMR "rewinds" and results in previous positions becoming peaks of the MMR.
For the kernel MMR (non-prunable) we could store the following -
We need the set of peaks to determine the root hash.
Hashing the root would be slightly more complex and slightly more expensive but we would save significant local storage as we would need to store a relatively small subset of the total MMR hashes.
The output and rangeproof MMRs are more complex as we can prune/compact data from the MMR (both data and hash files).
I more like the faster calculation comparing to this storage saving at this stage.