PMMR segment creation and validation #3453

jaspervdm · 2020-09-27T22:45:02Z

This PR implements a (P)MMR "segment", defined by this WIP RFC. In short, a segment is a set of 2**b (with variable b) consecutive leaves along with the necessary data to reconstruct the subtree root and data to verify membership of the subtree in the original MMR (a Merkle proof). In case of a prunable MMR, the segment only contains the unpruned leaves in the segment range and also contains intermediary MMR hashes that are necessary to construct the segment root.

Concretely, a segment consist of the following elements:

Segment identifier: b, the 2-log of the size of the number of leaves (before pruning) and a zero-based segment index idx
List of intermediary hashes, sorted by MMR position in ascending order. Only for prunable MMRs, it contains the unpruned hashes in the segment range. They are necessary to reconstruct the segment root. As of this PR it contains all unpruned hashes above the leaves, which possibly contains redundant data. This is something we can improve over time.
List of unpruned leaves in the segment (leaf index range [i*2**b, (i+1)*2**b)), sorted by MMR position in ascending order.
Segment merkle proof, required to reproduce the MMR root starting with the segment root, thereby proving membership.

Given that a segment contains a number of leaves that is a power of 2, a full segment forms a full subtree in the MMR and as such it has a single root. The final segment possibly has less than 2**b elements. In this case the peaks in the segment are also peaks in the full MMR and we define its root as these peaks bagged together.

Segment creation
A segment is created by looping over all the MMR positions in the segment range. If the position is an unpruned leaf or its subling is an unpruned leaf, add it to the leaf data list. If not, check if its hash is unpruned and add it to the list of hashes. Next, generate the merkle proof by filling a list of hashes with:

the siblings along the path from the subtree root (final MMR position of the segment) up to its corresponding peak in the MMR
peaks to the right of our subtree root, bagged together to a single hash
peaks to the left (from right to left) with a position smaller than the first MMR position of the segment

Note that this procedure will also behave as expected for the partially filled final segment, since in that case step and 1 and 2 will not produce any hashes and step 3 will give us all the other peaks in the MMR.

Segment verification
Iterate over all the MMR positions in the segment range. If the position is a leaf: check (for prunable MMRs) to see if the element is expected to be there (based on presence of the element or its sibling in the bitmap) and if it is get it from the list of leaves, hash it and store the hash in a list of temporary hashes.
For all other positions: if both children are not present in the list of temporary hashes (i.e. they are both pruned), do nothing. If either or both hashes are present in the list, hash them together and store the new hash in the list of temporary hashes. If only one of the children was present, obtain the hash of the other child from the list of hashes in the segment instead.
After looping through all positions, we are either left with 1 entry (full segment) or multiple entries (partially filled, final segment) in the list of temporary hashes. If there are multiple, bag them together. We are left with the segment root.

Next, verify the proof by attempting to reproduce the MMR root hash: first, hash together with the siblings along the path from the subtree root to its peak in the MMR, then hash it together with the bagged peaks to the right and finally hash it together with the peaks on the left (going right to left). Verification passes if the calculated hash is equal to the MMR root hash.

Open points before merging

~~In case of a heavily pruned MMR, is this data always sufficient to reconstruct the full pruned MMR? Or would we miss any intermediary hashes?~~ Yes, now that we added support for fully pruned segments.
~~I think we need a function to extract the intermediary and proof hashes for purposes of storing them in the MMR we are building~~ TBD in a future PR
~~Would it be more natural to pass in a bitmap indicating the spent or the unspent positions?~~ Unspent
~~Is bitmap/leaf index 0-based (should be a quick check)~~ Yes, it is.

antiochp · 2020-10-13T08:30:38Z

Quick question that occurred to me reading over the PR description (nice description by the way!) -

Segment identifier: b, the 2-log of the size of the number of leaves (before pruning) and a zero-based segment index idx

Does it potentially make more sense to provide a "starting leaf index" as segment identifier? The current "segment index" is heavily dependent on the "segment size". Clearly you can translate between these easily enough but it may make sense to identify these based on both number of leaves via b and leaf idx.

antiochp · 2020-10-13T08:32:49Z

List of unpruned leaves in the segment (leaf index range [i*2b, (i+1)*2b)), sorted by MMR position in ascending order.

Is the intention to have these segments be self-contained? For pruned MMR (outputs and rangeproofs) do we also provide the bitmap or is the assumption we already have the corresponding bitmap segment?

core/src/core/pmmr/segment.rs

antiochp · 2020-10-19T14:33:27Z

(apologies for the barrage of questionable feedback here...) 😄

jaspervdm · 2020-10-20T10:53:16Z

Not at all! I think they highlight some of the subtleties in the PR so it is good that they are discussed explicitly.

antiochp · 2020-10-28T09:58:23Z

Just posting here for reference -

We discussed "empty" segments. Proposal is to return a single root (found by recursing up the MMR from the empty segment). This provides all necessary hashes for reconstruction, even across a heavily pruned MMR, with pruning going beyond segment size. The root position will exist outside the defined segment.
There is an optimization in the above, if the requester can specify the segment size/height. Given the output bitmap they can determine which segments are empty and request larger segments (output and rangeproof) that cover multiple empty segments (or combination of empty and adjacent non-empty segments) of the MMR.

jaspervdm · 2020-10-29T18:59:40Z

Fixed an edge case: if there is an uneven number of leaves, and the final leaf is spent, we still require it to be present in the final segment. Previously we only checked the bitmap for it and its (non-existent) sibling, which are both 0. This led us to assume they were pruned, but this is not the case.

This was actually caught by the pruning test in store/tests/segment.rs, except for the fact that there was a bug in the test itself related to the bitmap indices. Fixing the bug in the test made the test fail as it should have.

jaspervdm · 2020-11-01T13:34:31Z

@antiochp

We discussed "empty" segments. Proposal is to return a single root (found by recursing up the MMR from the empty segment). This provides all necessary hashes for reconstruction, even across a heavily pruned MMR, with pruning going beyond segment size. The root position will exist outside the defined segment.

We now support pruned segments. The root() function now returns Option<Hash>, where a None indicates a full segment that is completely pruned. In this situation the root (or one of its parents) needs to be obtained from the list of hashes in the segment. This is done in first_unpruned_parent().

In order to find the first parent up the path to the peak that is unpruned, the new first_unpruned_parent() function can be used. If the segment is not fully pruned, it will return (hash, None) where the hash is the root of the segment. If the segment is fully pruned, it will return (hash, Some(pos)) where the hash is the hash of the first parent that isn't compacted away, and pos is the corresponding position.

I've also added a bunch of tests to make sure it behaves as expected for full segments and doesn't affect the partially filled final segment.

antiochp · 2020-11-03T11:50:24Z

👍 Sounds good - I'm planning to take a closer look at this today.

antiochp · 2020-11-09T19:25:26Z

This looks good. Want to move it out of Draft status?

Just one minor point/question -

In order to find the first parent up the path to the peak that is unpruned, the new first_unpruned_parent() function can be used. If the segment is not fully pruned, it will return (hash, None) where the hash is the root of the segment. If the segment is fully pruned, it will return (hash, Some(pos)) where the hash is the hash of the first parent that isn't compacted away, and pos is the corresponding position.

Do we need to have None vs Some(pos) here for this?
I wonder if we could simply do (hash, pos) consistently for both scenarios? If its the "real" root of the subtree then its just the pos of the root. If its a fully pruned subtree then pos is just a higher up parent pos. Is there an advantage to only including an optional pos?

jaspervdm · 2020-11-10T11:53:19Z

You are right, we don't really need to return a (hash, Some(pos)), will update it to (hash, pos). For conceptual clarity I'd like to keep the Option<Hash> on the root() function though.

antiochp

I think we're looking good here with this. 👍
What else is outstanding before we can merge?

jaspervdm · 2020-11-17T15:24:40Z

I'm working on the deser of bitmap segments in this PR, but it probably makes more sense to do that in a separate one. I think we can merge this.

antiochp · 2020-11-17T19:45:14Z

🎉

) * Chunk generation and validation * Rename chunk -> segment * Missed a few * Generate and validate merkle proof * Fix bugs in generation and validation * Add test for unprunable MMR of various sizes * Add missing docs * Remove unused functions * Remove segment error variant on chain error type * Simplify calculation by using a Vec instead of HashMap * Use vectors in segment definition * Compare subtree root during tests * Add test of segments for a prunable mmr * Remove assertion * Only send intermediary hashes for prunable MMRs * Get hash from file directly * Require both leaves if one of them is not pruned * More pruning tests * Add segment (de)serialization * Require sorted vectors in segment deser * Store pos and data separately in segment * Rename log_size -> height * Fix bitmap index in root calculation * Add validation function for output (bitmap) MMRs * Remove left over debug statements * Fix test * Edge case: final segment with uneven number of leaves * Use last_pos instead of segment_last_pos * Simplify pruning in test * Add leaf and hash iterators * Support fully pruned segments * Drop backend before deleting dir in pruned_segment test * Simplify output of first_unpruned_parent

jaspervdm added 16 commits September 28, 2020 00:42

Chunk generation and validation

f25e020

Merge remote-tracking branch 'upstream/master' into pmmr_chunk

fdcf5f0

Rename chunk -> segment

26f7cfb

Missed a few

ba284ca

Generate and validate merkle proof

1d512a4

Fix bugs in generation and validation

d3d37b8

Add test for unprunable MMR of various sizes

163cad6

Add missing docs

4347fb1

Remove unused functions

3d6fa32

Remove segment error variant on chain error type

1cd73d9

Simplify calculation by using a Vec instead of HashMap

e6b3f7c

Use vectors in segment definition

ec4e4d2

Compare subtree root during tests

5aea29e

Add test of segments for a prunable mmr

50633cf

Remove assertion

42e430f

Only send intermediary hashes for prunable MMRs

7f276e2

jaspervdm mentioned this pull request Oct 6, 2020

Get the hash of a PMMR leaf ignoring the leafset #3463

Closed

Get hash from file directly

c0cd251

jaspervdm changed the title ~~[WIP] PMMR chunk generation and validation~~ PMMR segment generation and validation Oct 6, 2020

jaspervdm changed the title ~~PMMR segment generation and validation~~ PMMR segment creation and validation Oct 6, 2020

jaspervdm added 4 commits October 7, 2020 14:10

Merge remote-tracking branch 'upstream/master' into pmmr_chunk

2d61b3b

Require both leaves if one of them is not pruned

89c3520

More pruning tests

0482f56

Add segment (de)serialization

3d8d293

jaspervdm requested a review from antiochp October 7, 2020 15:14

Require sorted vectors in segment deser

751d0bb

jaspervdm mentioned this pull request Oct 8, 2020

Define segment p2p messages #3470

Closed

antiochp reviewed Oct 13, 2020

View reviewed changes

core/src/core/pmmr/segment.rs Outdated Show resolved Hide resolved

antiochp reviewed Oct 19, 2020

View reviewed changes

core/src/core/pmmr/segment.rs Outdated Show resolved Hide resolved

jaspervdm added 4 commits October 27, 2020 13:55

Fix bitmap index in root calculation

22f7a41

Add validation function for output (bitmap) MMRs

435ff77

Remove left over debug statements

f9813a8

Fix test

04c096b

antiochp mentioned this pull request Oct 28, 2020

add segmenter for generating segments from txhashset with consistent rewind #3482

Merged

jaspervdm added 2 commits October 29, 2020 19:50

Edge case: final segment with uneven number of leaves

8d94d32

Use last_pos instead of segment_last_pos

8f6983b

jaspervdm added 3 commits October 30, 2020 16:58

Simplify pruning in test

a7799af

Add leaf and hash iterators

38dc5bf

Support fully pruned segments

643ee8e

Drop backend before deleting dir in pruned_segment test

a7f2d13

Simplify output of first_unpruned_parent

de1d5e3

jaspervdm marked this pull request as ready for review November 10, 2020 12:10

antiochp approved these changes Nov 16, 2020

View reviewed changes

jaspervdm merged commit 8faba4e into mimblewimble:master Nov 17, 2020

jaspervdm deleted the pmmr_chunk branch November 26, 2020 17:58

antiochp mentioned this pull request Nov 26, 2020

v5.0.0 Release Notes #3506

Closed

yeastplume mentioned this pull request Feb 22, 2022

[DNM] PIBD Task / Issue Tracker #3695

Merged

26 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PMMR segment creation and validation #3453

PMMR segment creation and validation #3453

jaspervdm commented Sep 27, 2020 •

edited

Loading

antiochp commented Oct 13, 2020

antiochp commented Oct 13, 2020

antiochp commented Oct 19, 2020

jaspervdm commented Oct 20, 2020

antiochp commented Oct 28, 2020 •

edited

Loading

jaspervdm commented Oct 29, 2020 •

edited

Loading

jaspervdm commented Nov 1, 2020

antiochp commented Nov 3, 2020

antiochp commented Nov 9, 2020

jaspervdm commented Nov 10, 2020

antiochp left a comment

jaspervdm commented Nov 17, 2020

antiochp commented Nov 17, 2020

PMMR segment creation and validation #3453

PMMR segment creation and validation #3453

Conversation

jaspervdm commented Sep 27, 2020 • edited Loading

antiochp commented Oct 13, 2020

antiochp commented Oct 13, 2020

antiochp commented Oct 19, 2020

jaspervdm commented Oct 20, 2020

antiochp commented Oct 28, 2020 • edited Loading

jaspervdm commented Oct 29, 2020 • edited Loading

jaspervdm commented Nov 1, 2020

antiochp commented Nov 3, 2020

antiochp commented Nov 9, 2020

jaspervdm commented Nov 10, 2020

antiochp left a comment

Choose a reason for hiding this comment

jaspervdm commented Nov 17, 2020

antiochp commented Nov 17, 2020

jaspervdm commented Sep 27, 2020 •

edited

Loading

antiochp commented Oct 28, 2020 •

edited

Loading

jaspervdm commented Oct 29, 2020 •

edited

Loading