Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle tree structure for UTXO set #10

Closed
ignopeverell opened this issue Oct 31, 2016 · 7 comments
Closed

Merkle tree structure for UTXO set #10

ignopeverell opened this issue Oct 31, 2016 · 7 comments

Comments

@ignopeverell
Copy link
Contributor

The current Merkle tree implementation is mostly a placeholder and is going to be inadequate to handle the UTXO set tree. The following are highly desirable property:

  • cheap lookups, additions and deletions
  • efficient store and update operations in db (likely rocksdb)
  • simple
  • an immutable data structure (pruned on a later pass) may be easier to reason about

The MMR [1] [2] algorithm has been proposed and Merklix trees [3] offer interesting possibilities as well (i.e. p2p querying, sharding).

[1] https://github.com/opentimestamps/opentimestamps-server/blob/master/python-opentimestamps/opentimestamps/core/timestamp.py#L324
[2] https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md
[3] http://www.deadalnix.me/2016/09/29/using-merklix-tree-to-checkpoint-an-utxo-set/

@GarrickOllivander @merope07 and @apoelstra have expressed interest.

@merope07
Copy link
Contributor

merope07 commented Oct 31, 2016

Thanks for finding Merklix, I will study it. Something I see is the prefixes are variable-length, which means that an implementation is likely to be as complex as one of a Patricia tree. Also, I'm uncertain about serialization of Merklix trees, since the article does not go into this, although I think it's easy to do in a malleability-free way since the prefixes force the structure of the tree. Not a big deal, just something to think about.

It appears straightforward to create a sum-Merklix tree which is good.

As for the benefits, we never need deletions or proof-of-absense for standard MW operation. For the purpose of a UTXO set, we can simulate this with a "spent count" on each output which is always 1 or 0, then a proof-of-absense is simply a proof-of-presense where the spent-count reveals whether or not the output has been spent. This isn't exactly a proof-of-absense since it does not allow proving that a UTXO never existed, but it does allow to prove that a UTXO existed then was spent, which is all that's required for fraud proofs.

Both addition and in-place updates are O(ln(n)) for both MMR and Merklix.

~M

@ignopeverell
Copy link
Contributor Author

One advantage I saw for Merklix trees is the possibility of sharding. Maintaining the UTXO set is not really standard MW (in the Jedusor version at least), but it's really handy when it comes to block cut-through. It should allow us to do full cut-through initial sync without too much worry and I think will also let us discard rangeproofs in the full cut-through horizon (something like head minus 1000 blocks), which is a huge win.

With all that in mind, the ability to shard that UTXO set would be pretty cool :)

@apoelstra
Copy link

I think because of https://www.reddit.com/r/Bitcoin/comments/4vub3y/mimblewimble_noninteractive_coinjoin_and_better/d62cux6/ committing to the UTXO set is a necessary extension to the Jedusor scheme (like, there are consensus failures if you don't).

I like the sound of sharding .. though I think if the UTXOs that users store are randomly chosen we should be able to do this with MMRs. Unsure. Maybe @merope07 knows more about this.

Discarding rangeproofs means SPV security of noninflation. I definitely think this should be a supported mode of operation but I don't think we can really get rid of the data for full validators.

@merope07
Copy link
Contributor

With a MMR, since you never delete data, only update, and updating always requires only the rightmost branch of the tree, I think sharding is fine. Any data you retain, you need to know enough auxilary hashes to compute the root, and these hashes are exactly the ones you need to update the data (e.g. to increment a spent-count).

It's probably more efficient to store and transmit, e.g. "all the outputs whose hashes start with 0x0e" in a Merklix tree, but I don't have a clear idea of by how much.

@GarrickOllivander
Copy link
Contributor

I am probably missing the obvious, but how would MMR support lookup?

@merope07
Copy link
Contributor

merope07 commented Nov 2, 2016

You would need to maintain a mapping from UTXO to its index in the tree.

@gellert-grindelwald
Copy link

LMDB is a more suitable embedded-database for fast operations and incredible resiliency to corruption than RocksDB. Not sure if any of the database work has already underway, but it is definitely something worth considering.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants