Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move leaf_set (aka utxo bitmap) to db #3437

Open
antiochp opened this issue Sep 8, 2020 · 1 comment
Open

Move leaf_set (aka utxo bitmap) to db #3437

antiochp opened this issue Sep 8, 2020 · 1 comment
Assignees

Comments

@antiochp
Copy link
Member

antiochp commented Sep 8, 2020

We currently maintain a "leaf_set" for both the output and rangeproof MMR data structures on disk.

This is stored in the pmmr_leaf.bin file below -

chain_data/txhashset/output:
total 61M
drwxr-xr-x 2 root root 4.0K Sep  8 12:02 .
drwxr-xr-x 5 root root 4.0K Feb 20  2020 ..
-rw-r--r-- 1 root root  13M Sep  8 12:02 pmmr_data.bin
-rw-r--r-- 1 root root  47M Sep  8 12:02 pmmr_hash.bin
-rw-r--r-- 1 root root 354K Sep  8 12:02 pmmr_leaf.bin
-rw-r--r-- 1 root root 772K Sep  7 21:25 pmmr_prun.bin

This file is a serialized roaring bitmap and the entire file is rewritten for every new block added to the chain.

This file fits poorly in the overall MMR data structure on disk. It does not materially affect the MMR itself.
It is simply an index of leaf positions that represent the utxo (leaves that represent unspent outputs).

For efficiency we also maintain the "leaf_set" in memory.
This "cache" is initialized on node startup based on the leaf_set on disk.

Note we also maintain an output_pos index in the database.
This index allows us to lookup MMR positions by output commitment.
This allows us to quickly lookup the MMR position for an output for a transaction when attempting to spend the output.

So the "leaf_set" is effectively maintained in 3 separate places -

  • on disk: pmmr_leaf.bin
  • in memory: leaf_set "cache"
  • in the db: output_pos index

If these get out of sync for any reason we can find ourselves in a "corrupted data" situation very easily.
for example, we are in the process of writing the MMR files to disk and we do not successfully write pmmr_leaf.bin.

As the leaf_set is simply an index into the actual MMR structure it would make more sense for this to be maintained in the db. We could still cache it in memory for performance, but "source of truth" would be the db index.

We would not actually need to store this data locally on disk.
Note: We do still need this file during IBD, for both receiving and providing txhashet.zip. But this can be done "on demand".

If we store this data in the db we can take full advantage of the transactional semantics. If we accept a new block and update the head of the chain we can also update the "leaf_set" in the same db transaction. We would no longer risk being in a state where the leaf_set was not updated correctly and was not aligned with the current chain head.

PR experimenting with this approach is here - #3428

@antiochp antiochp self-assigned this Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
@antiochp and others