Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexed Merkle tree #1666

Merged
merged 58 commits into from
Jun 4, 2024
Merged

Indexed Merkle tree #1666

merged 58 commits into from
Jun 4, 2024

Conversation

mitschabaude
Copy link
Collaborator

@mitschabaude mitschabaude commented May 27, 2024

closes #1655

This PR introduces IndexedMerkleMap, an all around better version of MerkleMap. See #1655 for a detailed description, including the API which this PR implements.

In short, there are two motivations to introduce a new Merkle storage primitive:

  • The old primitives didn't cleanly map to provable types, and are appallingly hard to use in circuits
  • The old Merkle map is a sparse tree of height 256, which meant a large number of in-circuit hashes.

IndexedMerkleMap uses about 4-8x fewer constraints than MerkleMap when used with height 31 (which supports 1 billion entries). Here are some constraint counts for different operations:

indexed merkle map (get) 461
indexed merkle map (get option) 975
sparse merkle map (get) 4208

indexed merkle map (insert) 1696
indexed merkle map (update) 905
indexed merkle map (set) 1878
sparse merkle map (set) 8160

indexed merkle map (assert included) 460
indexed merkle map (assert not included) 507
indexed merkle map (is included) 508

EDIT: Based on feedback from @dfstio, the API was expanded to be useful for cases where only inclusion of a key (but not the value) is important, see discussion below.

@mitschabaude
Copy link
Collaborator Author

Thanks for your feedback @dfstio. Based on it, I added a new version of get() which only proves inclusion and made it the default version. The old version, which also handled non-inclusion, is called getOption() now.

Here are the numbers for height 11, it matches with what you wrote:

indexed merkle map (get) 176
indexed merkle map (get option) 405
sparse merkle map (get) 4208

indexed merkle map (insert) 542
indexed merkle map (update) 350
indexed merkle map (set) 758
sparse merkle map (set) 8160

@dfstio
Copy link

dfstio commented May 29, 2024

Nonetheless, your comments gave me a good idea: Add a cheaper version of get() which asserts that the key-value pair is included (the current version gracefully handles the case when it's not included, and returns an option)

It would be a great addition. Proving inclusion and exclusion (for nullifiers) are very important operations that should probably be reflected in the IndexedMerkleMap methods.

And given that data is public, I can do myself toJSON () and fromJSON () to/from base64.

@dfstio
Copy link

dfstio commented May 29, 2024

The old version, which also handled non-inclusion, is called getOption() now.

Can the version that will only prove exclusion (non-inclusion) take less than 405 constraints, similar to the get() that takes half of it?

@mitschabaude
Copy link
Collaborator Author

The old version, which also handled non-inclusion, is called getOption() now.

Can the version that will only prove exclusion (non-inclusion) take less than 405 constraints, similar to the get() that takes half of it?

Great point, done!

indexed merkle map (assert included) 175
indexed merkle map (assert not included) 222
indexed merkle map (is included) 223

@dfstio
Copy link

dfstio commented May 29, 2024

Can assertIncluded() return a value for the key?

@mitschabaude
Copy link
Collaborator Author

Can assertIncluded() return a value for the key?

the version of assertIncluded() which returns a value already exists - it's get()!

@dfstio
Copy link

dfstio commented May 29, 2024

How do we serialize the witness to calculate recursive proofs using many workers?

If I want to create a proof confirming that I've correctly added ten key-values to the IndexedMerkleMap and want to split the calculation between 10 separate workers running in parallel, each calculating the proof to be merged later for one key-value pair, I would need to be able to generate a serializable witness to be passed to the worker. Otherwise, I should serialize the whole map, which would take much longer.

Effectively, for this use case, _computeRoot() should be split to map.getWitness(key) and computeRoot(witness), with the witness being easily serializable.

map.getWitness() should be called in the master worker for all ten key-value pairs, and computeRoot() should be called in the provable code for each of the ten workers. Each worker should not have the map, just a witness.

Example when it is needed: Serializing Proving

@mitschabaude
Copy link
Collaborator Author

How do we serialize the witness to calculate recursive proofs using many workers?

Interesting challenge!

@mitschabaude
Copy link
Collaborator Author

mitschabaude commented May 30, 2024

map.getWitness() should be called in the master worker for all ten key-value pairs, and computeRoot() should be called in the provable code for each of the ten workers. Each worker should not have the map, just a witness.

I thought about this a bit, and think it can be done in a way that is compatible with the current design of the IndexedMerkleTree data structure.

The idea is that the current implementation should work if you don't have the full tree, but just the subset that are touched by your updates. This is quite similar to what you propose, since in the end a collection of Merkle witnesses is also just a subset of the tree.

There are two internal data structures: nodes and sortedLeaves. Both should currently allow pruning to the values you actually need. For nodes, you'd need to store arrays of the same length, but mostly filled with empty slots, not sure how much memory that saves. In the case of sortedLeaves, only having a subset should just work.

So for parallel proving, we could:

  • Run all tree updates serially, without proving
  • Take a pruned snapshot every k updates
  • Send each pruned snapshot to a worker, which will perform a proof of k updates
  • Merge those proofs in a tree to get a single proof

The nice thing is that circuits can be written exactly as in a normal, serial implementation.

Actually this is extremely close to what Mina does with transaction proofs, where snapshots of the ledger are updated :D

@mitschabaude
Copy link
Collaborator Author

Note to self / reviewers: the implementation currently doesn't do proof of updates correctly. The problem is that it doesn't connect the Merkle path for the update with the path previously validated against the old commitment

@dfstio
Copy link

dfstio commented May 31, 2024

For nodes, you'd need to store arrays of the same length, but mostly filled with empty slots, not sure how much memory that saves

I've done some testing with MerkleTree to evaluate serialized map size and serialized MerkleTree witness size, and the results are as follows:

random indexes in the tree:
height: 11, elements: 1000, tree size: 75,508 chars, witness size: 495 chars (0.66%)
height: 20, elements: 10000, tree size: 3,243,225 chars, witness size: 860 chars (0.03%)
height: 30, elements: 10000, tree size: 8,159,377 chars, witness size: 1,312 chars (0.02%)
height: 50, elements: 10000, tree size: 18,475,240 chars, witness size: 2,211 chars (0.012%)
height: 100, elements: 100000, tree size: 456,081,607 chars, witness size: 4,507 chars (0.0010%)
height: 255, elements: 25000, tree size: 405,075,106 chars, witness size: 11,600 chars (0.0029%)

ordered indexes in the tree:
height: 11, elements: 1000, tree size: 93,319 chars, witness size: 499 chars (0.53%)
height: 20, elements: 10000, tree size: 941,828 chars, witness size: 910 chars (0.096%)
height: 30, elements: 10000, tree size: 942,558 chars, witness size: 1,367 chars (0.15%)
height: 50, elements: 10000, tree size: 943,572 chars, witness size: 2,283 chars (0.24%)
height: 100, elements: 100000, tree size: 9,526,927 chars, witness size: 4,567 chars (0.048%)
height: 255, elements: 100000, tree size: 9,535,772 chars, witness size: 11,662 chars (0.12%)

The IndexedMerkleMap should be closer to ordered indexes, so by creating a witness or pruned snapshot we can decrease the serialized witness size by circa 1000x. Btw, it also shows the serialized map size savings IndexedMerkleMap will bring: it should be 170x (16k per element in MerkleMap vs 95 bytes per element with IndexedMerkleMap)

Take a pruned snapshot every k updates

We need to take a pruned snapshot several times BEFORE running the circuit without proofs for k updates and make sure that low leaves are also included.

Actually this is extremely close to what Mina does with transaction proofs, where snapshots of the ledger are updated :D

I believe that IndexedMerkleMap is extremely important for rollups on Mina protocol and will save a lot of money in proving costs

@mitschabaude mitschabaude merged commit 8758daa into main Jun 4, 2024
14 checks passed
@mitschabaude mitschabaude deleted the feature/indexed-merkle-map branch June 4, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indexed Merkle tree to improve offchain state efficiency
3 participants