Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle Proving #16

Merged
merged 35 commits into from
Jan 22, 2024
Merged

Conversation

tchataigner
Copy link
Member

@tchataigner tchataigner commented Jan 15, 2024

Description

This Pr introduces a gadget to prove a leaf inclusion in a Merkle Tree.

Current progress

Used Jellyfish Merkle Tree paper to implement a Proof of Inclusion algorithm.

It has a generic approach to which hashing algorithm is used, and currently natively handles the Keccak and Sha3 implementation in bellpepper-keccak.

Links to any relevant issues or pull requests

Related to linear#29

E: PrimeField,
CS: ConstraintSystem<E>,
{
sha3(cs.namespace(|| "hash combine"), &[hash1, hash2].concat())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing context for this pull request. Can the sha3 here be replaced with a generic hash function circuit? For example, someone may want to use Poseidon to construct a Merkle tree and prove inclusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, I'm currently working on a commit that's dedicated to it. Should be pushed quite soon 👍

@tchataigner tchataigner marked this pull request as ready for review January 17, 2024 15:09
@tchataigner
Copy link
Member Author

tchataigner commented Jan 17, 2024

@huitseeker , I added a test over the number of constraints in 437f1c8 as you suggested. I think I got it right 🤔

Edit to get a more generalized formula for the number of constraints of the circuit.

The number of constraints is as follow:

$ constraints_inputs + constraints_computation
$ expected_root_digest_length + leaf_digest_length + tree_depth + digest_length * nbr_siblings + enforce_equal_over_digest + hasher_constraints * nbr_siblings
$ digest_length + digest_length + nbr_siblings + digest_length * nbr_siblings + digest_length + hasher_constraints * nbr_siblings

$ 3 * digest_length + ( digest_length + hasher_constraints + 1 ) * nbr_siblings

So for a 256 digest length over a depth 3 leaf:

$ 3 * 256 + ( 256 + 151424 + 1 ) * 3
$ 455811

crates/merkle-inclusion/Cargo.toml Show resolved Hide resolved
crates/merkle-inclusion/src/traits.rs Show resolved Hide resolved
crates/merkle-inclusion/tests/gadget.rs Outdated Show resolved Hide resolved

// Example use of the macro with OutOfCircuitHasher specified
create_gadget_digest_impl!(Sha3, sha3, 256, Sha3_256);
create_gadget_digest_impl!(Keccak, keccak256, 256, Keccak256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great to check we're not hard-coding the length to be 256, and an easy way to do this is to test with the neighboring SHA512 crate (which output is 512 bits)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also help with endianness: SHA512 uses big-endian, whereas the keccak crate is little-endian.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See in the recent PR how I deal with this using Bitvec: #20
This might help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is pretty helpful!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 04ae409.

crates/merkle-inclusion/Cargo.toml Outdated Show resolved Hide resolved
crates/merkle-inclusion/Cargo.toml Outdated Show resolved Hide resolved
crates/merkle-inclusion/Cargo.toml Outdated Show resolved Hide resolved
/// // Process each leaf and intermediary node's hash and key...
/// }
/// ```
pub fn construct_merkle_tree<D: Digest>() -> SimpleMerkleTree {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed for this iteration, but we'll probably want to open an issue to use proptest to generate this sort of test instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huitseeker I just added it to have a clean tree to use for testing. Would you rather have me removing this implementation and implement a more straight forward construction of the leafs / proofs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's ok. I'm just mentioning specifically that proptest_recurse can generate random tree-structured data:
https://docs.rs/proptest-recurse/latest/proptest_recurse/
This is probably something we'll want, but in the future.

crates/merkle-inclusion/src/lib.rs Outdated Show resolved Hide resolved
crates/merkle-inclusion/src/lib.rs Outdated Show resolved Hide resolved
/// Each leaf node contains a key-value pair, where the key is used to determine the position in the tree and the value
/// is the data stored.
pub struct Leaf {
key: Key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected endianness of this key? Shall we document it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now I handle the key to be Little Endian all the time. I think it would actually make more sense if it could be of the same endian of the hashing algorithm it is being associated to. Will update the code to reflect it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this in 775a5fa

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior needed correction, updated in a74aa26

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it make sense to have the endianness of the key be that of the hashing gadget?

I'm not against a simplifying assumption such as "key, siblings and gadget output should all have the same endianness" (which I'm not sure is required strictly speaking by the Aptos spec), but I'd expect that to be spectacularly well documented.

That is, if I pass a key, proof pair to the algorithm, and that proof is being iterated on in the direction of the endianness of a hashing gadget that I happen to have specified separatedly, I will be surprised to see a dependency come from that gadget's implementation detail unless it's well documented: the gadget's implementer and the provider of the proof may not be the same people.

To be explicit, there's good documentation about this on the conditional_reverse function, which is private. I'd expect this documentation to be on verify_proof and on the Leaf struct.

crates/merkle-inclusion/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close, I think we need to improve how we document the assumptions we'd have on the key-siblings joint iteration. Thanks!

version = "0.1.0"
edition = "2021"
authors = ["Lurk Lab Engineering <engineering@lurk-lab.com>"]
license = "MIT OR Apache-2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be great to take this occasion to set up a workspace-wide license field here:
https://github.com/lurk-lab/bellpepper-gadgets/blob/main/Cargo.toml#L12-L14

/// Each leaf node contains a key-value pair, where the key is used to determine the position in the tree and the value
/// is the data stored.
pub struct Leaf {
key: Key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it make sense to have the endianness of the key be that of the hashing gadget?

I'm not against a simplifying assumption such as "key, siblings and gadget output should all have the same endianness" (which I'm not sure is required strictly speaking by the Aptos spec), but I'd expect that to be spectacularly well documented.

That is, if I pass a key, proof pair to the algorithm, and that proof is being iterated on in the direction of the endianness of a hashing gadget that I happen to have specified separatedly, I will be surprised to see a dependency come from that gadget's implementation detail unless it's well documented: the gadget's implementer and the provider of the proof may not be the same people.

To be explicit, there's good documentation about this on the conditional_reverse function, which is private. I'd expect this documentation to be on verify_proof and on the Leaf struct.

let mut actual_root_hash = proof.leaf().hash().to_vec();

let key_iterator =
conditional_reverse::<_, _, GD>(proof.leaf().key().iter().take(proof.siblings().len()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I have for example :

  • proof.siblings.len() == 5 and
  • proof.leaf().key().len() == 8, and
  • GD::is_little_endian() == true

The sequence of indexes of proof.leaf.key() that will be used to thread proof().siblings() is [4, 3, 2, 1, 0]. Is that what the user should expect, or should they expect [7, 6, 5, 4, 3] instead? Is the MSB of the key not at index 7 in LE order?

I think there's two ways to go about this:

  1. explicitly state everywhere that key and gadget should have the same endianness, and in that case, reverse first, and .take(proof.siblings.len()) after. This is closer to the Aptos spec (note the mention of MSB of the key there), but may be less simple for other Merkle algorithms.
  2. not explicitly make any relation between the gadget's endianness and the key's iteration order, and say that in a proof, key and siblings should be iterable in the same order, whatever it is: to the point that the algorithm should operate on the iterator formed by proof.leaf.key().iter().zip(proof.siblings.iter()) (after checking proof.keaf.key().len() > proof.siblings().len())

(I don't particularly care which of these two solutions we use, but I am mindful that we put the user in a situation to know precisely which approach we're enforcing.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've went along with solution 2. , I think it should suffice in our context.

GD: GadgetDigest<E>,
>(
iter: I,
) -> Box<dyn DoubleEndedIterator<Item = &'a Boolean> + 'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could inline this and avoid the performance cost of Box<dyn MyType> for these few lines.

Comment on lines 97 to 121
for (i, (sibling_hash, bit)) in proof.siblings().iter().zip(key_iterator).enumerate() {
if let Some(b) = bit.get_value() {
if b {
actual_root_hash = GD::digest(
cs.namespace(|| format!("sibling {}", i)),
&[sibling_hash.to_owned(), actual_root_hash].concat(),
)?
} else {
actual_root_hash = GD::digest(
cs.namespace(|| format!("sibling {}", i)),
&[actual_root_hash, sibling_hash.to_owned()].concat(),
)?
}
} else {
return Err(SynthesisError::Unsatisfiable);
}
}

hash_equality(cs, expected_root, actual_root_hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aka, I believe:

for (i, (sibling_hash, bit)) in proof.siblings().iter().zip(key_iterator).enumerate() {
    let b = bit.get_value().ok_or(SynthesisError::Unsatisfiable)?;

    // Determine the order of hashing based on the bit value
    let hash_order = if b {
        vec![sibling_hash.to_owned(), actual_root_hash]
    } else {
        vec![actual_root_hash, sibling_hash.to_owned()]
    };

    // Compute the new hash
    actual_root_hash = GD::digest(cs.namespace(|| format!("sibling {}", i)), &hash_order.concat())?;
}

hash_equality(cs, expected_root, actual_root_hash)

LICENSE-APACHE Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the License files replicated at the root of the repo if we copy them in each crate, which we have done so far.

I'd love to address those copies, but I'd want to do it in another PR, to keep this one focused on Merkle trees => I think this should not add any license files other than the ones in its own crate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll remove the license files at the root.

Copy link
Member

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

@huitseeker huitseeker merged commit 5e93eca into lurk-lab:main Jan 22, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants