Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in logic of from_sub_trees_as_trees constructor #98

Merged
merged 29 commits into from Jun 21, 2022

Conversation

storojs72
Copy link

@storojs72 storojs72 commented Mar 28, 2022

This PR is rebased on top of #100 (that introduced framework for integration testing). It refactors from_sub_trees_as_trees constructor of MerkleTree (top-tree) instance in a more generic way, e.g. values of SubTreeArity and TopTreeArity can be either even or odd (previously only even numbers were supported and, with odd numbers, mentioned constructor panicked). This change has been evaluated using high-level seal-lifecycle tests for production sectors (32gib and 64gib). Logs of successful executing mentioned tests on proofs-buildbot are included.

This commit separates testing of MerkleTree's functions-constructors
into separate test suite. This seems reasonable, since different
constructors may use different datasets actually which in fact has impact
on tree's root computation, expressing one more dimension that is a subject
of additional testing.

Analysis of unit-tests majority from test_xor128.rs source file one
can see that a lot of tests have much in common in their internal logic,
though they use different constructors for tree's instantiation and
also different tree parameters (arity and storage type). In such
circumstances we can use passing function as parameter to particular
test. Here I tried to follow this approach and separate test logic
into simple (yet generic over tree structure) function which takes
instantiated tree (it doesn't matter how it was instantiated actually)
and checks if tree's functionality (computing length, number of leaves,
root and generating / verifying inclusion proofs) works as expected.

Ultimately, I think that such approach will allow us having less number
of unordered tests and consequently avoid duplication of test logic.
Constructor 'new' is basically a 'try_from_iter' with the only
difference in representation of incoming dataset. It should be
something that allows calling 'into_iter' on it. That's why we
can extend existing 'try_from_iter' test by just replacing
crafted 'new' constructor function.
As far as I understand, a 'from_byte_slice' constructor is an attempt
to simplify instantiation of tree from raw bytes without requiring input
dataset to be iterable Since Algorithm abstraction enforces particular
way of hashing (as specified in RFC6962: https://datatracker.ietf.org/doc/html/rfc6962)
to prevent some cryptographic attacks on Merkle Tree, random slice (even of correct
length) can't be used as a source for input leaves. So we can use test suite for
'from_data' constructor, but keep in mind that preparation of the dataset
is completely different (specified in 'instantiate_from_byte_slice' function).
This commit introduces testing of 'from_par_iter' constructor.
It is very similar to its single-threaded friend 'from_iter' in a
sense that it doesn't require any changes in expected values.
@storojs72 storojs72 added the testing Improvements in unit/integration tests label Mar 28, 2022
@storojs72 storojs72 self-assigned this Mar 28, 2022
Copy link

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor things.

src/merkle.rs Outdated
let sub_tree_arity = SubTreeArity::to_usize();
let base_tree_arity = BaseTreeArity::to_usize();

let base_tree_len = match get_merkle_tree_len(leaves, base_tree_arity) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be written as let base_tree_len = get_merkle_tree_len(leaves, base_tree_arity)?;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! Thanks!

src/merkle.rs Outdated
@@ -1810,6 +1810,37 @@ impl Element for [u8; 32] {
}
}

pub fn get_merkle_tree_len_generic<
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some documentation would be good as it is quite interesting in which cases which values are returned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added missing explanation

src/merkle.rs Outdated
Comment on lines 2077 to 2078
assert!(get_merkle_tree_len_generic::<U4, U0, U0>(16).is_ok());
assert!(get_merkle_tree_len_generic::<U1, U0, U0>(3).is_ok());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use assert_eq!() and test for the actual return value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added mentioned tests. They will never be redundant

Comment on lines 3 to 10
use crate::hash::{Algorithm, Hashable};
use crate::merkle::{
get_merkle_tree_len_generic, Element, FromIndexedParallelIterator, MerkleTree,
};
use crate::store::{Store, VecStore};
use crate::test_common::{Item, Sha256Hasher, XOR128};
use rayon::iter::{IntoParallelIterator, ParallelIterator};
use typenum::{Unsigned, U0, U2, U3, U5};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not the case in this crate, but in most places we separate the imports (similar to what Python does, in case you're familiar with that). It's three blocks, separated by an empty line. First the std imports, than external dependencies, then crate dependencies.

In this case it would be

Suggested change
use crate::hash::{Algorithm, Hashable};
use crate::merkle::{
get_merkle_tree_len_generic, Element, FromIndexedParallelIterator, MerkleTree,
};
use crate::store::{Store, VecStore};
use crate::test_common::{Item, Sha256Hasher, XOR128};
use rayon::iter::{IntoParallelIterator, ParallelIterator};
use typenum::{Unsigned, U0, U2, U3, U5};
use rayon::iter::{IntoParallelIterator, ParallelIterator};
use typenum::{Unsigned, U0, U2, U3, U5};
use crate::hash::{Algorithm, Hashable};
use crate::merkle::{
get_merkle_tree_len_generic, Element, FromIndexedParallelIterator, MerkleTree,
};
use crate::store::{Store, VecStore};
use crate::test_common::{Item, Sha256Hasher, XOR128};

I mention it, as I think it makes sense to do it this way for new files. Though if you prefer consistency across this crate, feel free to keep it like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Introduced this change. Thanks for pointing. Will try to keep in mind

Comment on lines 14 to 21
let mut dataset = Vec::<E>::new();
for index in 0..leaves {
// we are ok with usize -> u8 conversion problems, since we need just predictable dataset
let vector: Vec<u8> = (0..E::byte_len()).map(|x| (index + x) as u8).collect();
let element = E::from_slice(vector.as_slice());
dataset.push(element);
}
dataset
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also write the outer loop with an functional-style iterator. This way you don't need mutability and it's easy to spot that you operate on all elements. I would write this function as:

    (0..leaves).map(|index| {
        // we are ok with usize -> u8 conversion problems, since we need just predictable dataset
        let vector: Vec<u8> = (0..E::byte_len()).map(|x| (index + x) as u8).collect();
        E::from_slice(vector.as_slice())
    })

Copy link
Author

@storojs72 storojs72 Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this refactoring. Thanks. That's a "negative" habit from procedural programming

dataset
}
fn generate_vector_of_usizes(leaves: usize) -> Vec<usize> {
let mut dataset = Vec::with_capacity(leaves);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I'd use map().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Thanks

This commit integrates test of 'from_tree_slice' constructor of
MerkleTree into existing test suite. It wasn't tested previously
so, as it seems to me, there was a bug while getting value of root
from raw tree's data, when we tried to extract last element from
data using wrong indexing. Now this is fixed and tests are added.
In subsequent commits I'm going to add testing of 'from_slices'
constructor (for instantiating compound trees ) which internally uses
'from_tree_slice'.
This commits integrates 'from_data_store' constructor testing into
existing test-suite. Function that instantiates tree from data store
works as following: it creates new tree using already defined 'new'
instantiator, then serializes its data and recreates new storage over
existing one (note that we can't generically clone storage); after that
new tree is instantiated using newly created storage. Since we use same
dataset as for 'test_from_slice', we can easily combine these two constructors
into single group in order to reduce number of lines for expressing new test.
src/merkle.rs Outdated
@@ -447,7 +447,7 @@ impl<
);

let store = S::new_from_slice(tree_len, &data).context("failed to create data store")?;
let root = store.read_at(data.len() - 1)?;
let root = store.read_at(store.len() - 1)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure these are compatible? I thought one was in element counts and one was in data/byte counts(?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, for pointing on this. This is a place, where I'm not sure completely, TBH. But two following thoughts caused me to introduce this change:

  • I have double-checked that from_tree_slice constructor doesn't have tests neither in this repository, nor in rust_fil_proofs - so it is impossible to check compatibility actually;
  • data from this constructor is actually a byte slice that, as far as I understand this, represents serialised storage in some custom format. But store is an instance of storage, created after deserialising data via S::new_from_slice, and if we want to get last element from store (which is root of tree), we need to use indexing of store - not data.

I added tests for this constructor, where base binary tree with vector storage is instantiated with 4 leaves (each leaf is a 16-bytes array) and basic functionality of the tree is evaluated. This tree has 7 elements in total (4 leaves + 2 nodes + 1 root), which gives 7 * 16 = 112 bytes slice for serialised tree (our data). With previous version of code I was getting panic:

thread 'test_merkle_constructors::test_from_tree_slice_group' panicked at 'index out of bounds: the len is 7 but the index is 111', src/store/vec.rs:86:12
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test test_merkle_constructors::test_from_tree_slice_group ... FAILED

specifically at this place, when root was being read. And it is reasonable, since we use data which is 112 bytes slice, instead of store (of VecStore type) which is actually vector of elements (Element type, which in my case is Item - 16-bytes array), so according to the logic in mind, we need to take last Item from our store, which has index: [store.len() - 1].

@storojs72
Copy link
Author

@vmx, thanks for initial review. I will introduce your remarks, once finishing with other constructors

Previously compound trees were tested (in context of new test suite)
redundantly. E.g. for 'from_trees' constructor we performed tests varying
base tree constructors which is not needed actually to be ensure that
'from_trees' works as expected. It is sufficient to use some single
base tree constructor and run tests one per each dataset group, which
is actually introduced in this commit.
Copy link
Collaborator

@cryptonemo cryptonemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the map usage suggestions, but looks good overall

@storojs72
Copy link
Author

@cryptonemo , take a look please at 46fb497. Here I introduced changes into from_sub_tree_as_trees construction of compound-compound tree. Previously it didn't have tests, so I couldn't definitely understand its intention. My suspicion regarding this is that given an ordered set of base trees, this constructor groups base trees according to provided sub-tree arity, then instantiates at first set of compound trees and finally combines them into compound-compound tree. I don't know for sure whether split_off was (and is now) used properly, so now I changed logic a bit, according to my understanding and added proper test for it. Let me know please, if I missed something

@storojs72 storojs72 requested review from vmx and cryptonemo April 5, 2022 16:31
@storojs72
Copy link
Author

@vmx , @cryptonemo , I've finally come up with all constructors covered. And I added proposed changes. Take please a look once again. In a meanwhile, I will try to experiment with tests in rust-fil-proofs, in order to evaluate split_off-based changes in from_sub_tree_as_trees constructor, which is questionable. Thanks!

Copy link

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed all constructors closely. If I should, please let me know.

I saw a few for loops that could be map()s, but in some cases I can also imagine that the type annotations might become crazy. So I'm OK with just keeping the current state.

@storojs72
Copy link
Author

storojs72 commented Apr 6, 2022

I can also imagine that the type annotations might become crazy

That's true, you are right, @vmx ! I also tried to replace for loops with functional constructions where possible

@storojs72 storojs72 changed the title [WIP] Introduce testing of MerkleTree constructors Introduce testing of MerkleTree constructors Apr 11, 2022
@storojs72
Copy link
Author

storojs72 commented Apr 11, 2022

Last commit is a result of investigation - why some tests of rust-fil-proofs (which is main consumer of merkletree dependency) was failing. The reason was in from_sub_trees_as_trees constructor, which is used here for building 8_8_2 compound-compound trees while running most of the tests with actual sectors. While introducing the change in 46fb497 I didn't consider that order of trees is critical, so after analysing some of rust-fil-proofs test execution (where mentioned constructor is involved), I've come up with workaround. I've also checked that tests successfully pass on CI of rust-fil-proofs if we pick this branch of merkletree dependency.

As all constructors are finally covered, and we investigated the reason of high-level fails - I think this PR is ready for merging, so I remove 'WIP' label in its name

@storojs72
Copy link
Author

storojs72 commented Apr 11, 2022

As we agreed with @cryptonemo , it is worth to perform testing of this branch with 32/64 GiG sectors used in production. So let's no rush with merging it

let mut grouped_trees = Vec::with_capacity(sub_tree_count);
for _ in (0..sub_tree_count).step_by(trees.len() / sub_tree_count) {
grouped_trees.push(trees.split_off(trees.len() / sub_tree_count));
let mut grouped_trees = Vec::with_capacity(top_tree_arity);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test (or otherwise point it out if it exists) that shows what cases the previous code here breaks in?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created reproducible example that demonstrates mentioned caveat of from_sub_trees_as_trees constructor in this branch. If you pull it and run:

cargo test reproducible_example --release -- --nocapture

you should see that for from_sub_trees_as_trees successfully instantiates 2-4-2 compound-compound tree, but fails on 2-5-3. It is also visible on our CI.

storojs72 added a commit that referenced this pull request Apr 23, 2022
This commit drops DiskStore-based trees from test_xor128.rs as they are
presented in new test suite (in tests/test_base_constructors.rs) and also
moves some missing tests from #98
@storojs72 storojs72 mentioned this pull request Apr 23, 2022
@storojs72 storojs72 changed the title Introduce testing of MerkleTree constructors Change in logic of from_sub_trees_as_trees constructor Apr 23, 2022
@storojs72
Copy link
Author

storojs72 commented Apr 23, 2022

@cryptonemo, I narrow down the scope of this PR to discussion of from_sub_trees_as_trees constructor, by introducing #100. It allows to unblock addition of integration tests.

This commit drops introduced constructors' in order to simplify merging
#100
@storojs72
Copy link
Author

storojs72 commented Apr 23, 2022

I upload full log of successful executing test_seal_lifecycle_32gib_porep_id_v1_1_top_8_8_0_api_v1_1 test from rust-fil-proofs: 32gib.txt. I'm still working on executing similar 64gib test in order to check compatibility of from_sub_trees_as_trees with sectors used in production

@storojs72
Copy link
Author

Uploading full log of successful executing test_seal_lifecycle_64gib_porep_id_v1_1_top_8_8_2_api_v1_1 test:
64gib.txt

Copy link

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code is way easier to follow. I think it could even be more clear using map instead of for loops.

@vmx
Copy link

vmx commented Jun 17, 2022

I forgot to add that I think it would be good to squash it into a single commit with a good commit message explaining that the old code was wrong etc.

@storojs72
Copy link
Author

I think it could even be more clear using map instead of for loops.

Yes. Will keep in mind in further PRs.

@storojs72 storojs72 merged commit 6f3cbd5 into master Jun 21, 2022
@storojs72 storojs72 deleted the constructors_testing branch June 21, 2022 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Improvements in unit/integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants