New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change in logic of from_sub_trees_as_trees
constructor
#98
Conversation
This commit separates testing of MerkleTree's functions-constructors into separate test suite. This seems reasonable, since different constructors may use different datasets actually which in fact has impact on tree's root computation, expressing one more dimension that is a subject of additional testing. Analysis of unit-tests majority from test_xor128.rs source file one can see that a lot of tests have much in common in their internal logic, though they use different constructors for tree's instantiation and also different tree parameters (arity and storage type). In such circumstances we can use passing function as parameter to particular test. Here I tried to follow this approach and separate test logic into simple (yet generic over tree structure) function which takes instantiated tree (it doesn't matter how it was instantiated actually) and checks if tree's functionality (computing length, number of leaves, root and generating / verifying inclusion proofs) works as expected. Ultimately, I think that such approach will allow us having less number of unordered tests and consequently avoid duplication of test logic.
Constructor 'new' is basically a 'try_from_iter' with the only difference in representation of incoming dataset. It should be something that allows calling 'into_iter' on it. That's why we can extend existing 'try_from_iter' test by just replacing crafted 'new' constructor function.
As far as I understand, a 'from_byte_slice' constructor is an attempt to simplify instantiation of tree from raw bytes without requiring input dataset to be iterable Since Algorithm abstraction enforces particular way of hashing (as specified in RFC6962: https://datatracker.ietf.org/doc/html/rfc6962) to prevent some cryptographic attacks on Merkle Tree, random slice (even of correct length) can't be used as a source for input leaves. So we can use test suite for 'from_data' constructor, but keep in mind that preparation of the dataset is completely different (specified in 'instantiate_from_byte_slice' function).
This commit introduces testing of 'from_par_iter' constructor. It is very similar to its single-threaded friend 'from_iter' in a sense that it doesn't require any changes in expected values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor things.
src/merkle.rs
Outdated
let sub_tree_arity = SubTreeArity::to_usize(); | ||
let base_tree_arity = BaseTreeArity::to_usize(); | ||
|
||
let base_tree_len = match get_merkle_tree_len(leaves, base_tree_arity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be written as let base_tree_len = get_merkle_tree_len(leaves, base_tree_arity)?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! Thanks!
src/merkle.rs
Outdated
@@ -1810,6 +1810,37 @@ impl Element for [u8; 32] { | |||
} | |||
} | |||
|
|||
pub fn get_merkle_tree_len_generic< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation would be good as it is quite interesting in which cases which values are returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added missing explanation
src/merkle.rs
Outdated
assert!(get_merkle_tree_len_generic::<U4, U0, U0>(16).is_ok()); | ||
assert!(get_merkle_tree_len_generic::<U1, U0, U0>(3).is_ok()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use assert_eq!()
and test for the actual return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Added mentioned tests. They will never be redundant
src/test_merkle_constructors.rs
Outdated
use crate::hash::{Algorithm, Hashable}; | ||
use crate::merkle::{ | ||
get_merkle_tree_len_generic, Element, FromIndexedParallelIterator, MerkleTree, | ||
}; | ||
use crate::store::{Store, VecStore}; | ||
use crate::test_common::{Item, Sha256Hasher, XOR128}; | ||
use rayon::iter::{IntoParallelIterator, ParallelIterator}; | ||
use typenum::{Unsigned, U0, U2, U3, U5}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not the case in this crate, but in most places we separate the imports (similar to what Python does, in case you're familiar with that). It's three blocks, separated by an empty line. First the std
imports, than external dependencies, then crate dependencies.
In this case it would be
use crate::hash::{Algorithm, Hashable}; | |
use crate::merkle::{ | |
get_merkle_tree_len_generic, Element, FromIndexedParallelIterator, MerkleTree, | |
}; | |
use crate::store::{Store, VecStore}; | |
use crate::test_common::{Item, Sha256Hasher, XOR128}; | |
use rayon::iter::{IntoParallelIterator, ParallelIterator}; | |
use typenum::{Unsigned, U0, U2, U3, U5}; | |
use rayon::iter::{IntoParallelIterator, ParallelIterator}; | |
use typenum::{Unsigned, U0, U2, U3, U5}; | |
use crate::hash::{Algorithm, Hashable}; | |
use crate::merkle::{ | |
get_merkle_tree_len_generic, Element, FromIndexedParallelIterator, MerkleTree, | |
}; | |
use crate::store::{Store, VecStore}; | |
use crate::test_common::{Item, Sha256Hasher, XOR128}; |
I mention it, as I think it makes sense to do it this way for new files. Though if you prefer consistency across this crate, feel free to keep it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Introduced this change. Thanks for pointing. Will try to keep in mind
src/test_merkle_constructors.rs
Outdated
let mut dataset = Vec::<E>::new(); | ||
for index in 0..leaves { | ||
// we are ok with usize -> u8 conversion problems, since we need just predictable dataset | ||
let vector: Vec<u8> = (0..E::byte_len()).map(|x| (index + x) as u8).collect(); | ||
let element = E::from_slice(vector.as_slice()); | ||
dataset.push(element); | ||
} | ||
dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also write the outer loop with an functional-style iterator. This way you don't need mutability and it's easy to spot that you operate on all elements. I would write this function as:
(0..leaves).map(|index| {
// we are ok with usize -> u8 conversion problems, since we need just predictable dataset
let vector: Vec<u8> = (0..E::byte_len()).map(|x| (index + x) as u8).collect();
E::from_slice(vector.as_slice())
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this refactoring. Thanks. That's a "negative" habit from procedural programming
src/test_merkle_constructors.rs
Outdated
dataset | ||
} | ||
fn generate_vector_of_usizes(leaves: usize) -> Vec<usize> { | ||
let mut dataset = Vec::with_capacity(leaves); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I'd use map()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Thanks
This commit integrates test of 'from_tree_slice' constructor of MerkleTree into existing test suite. It wasn't tested previously so, as it seems to me, there was a bug while getting value of root from raw tree's data, when we tried to extract last element from data using wrong indexing. Now this is fixed and tests are added. In subsequent commits I'm going to add testing of 'from_slices' constructor (for instantiating compound trees ) which internally uses 'from_tree_slice'.
This commits integrates 'from_data_store' constructor testing into existing test-suite. Function that instantiates tree from data store works as following: it creates new tree using already defined 'new' instantiator, then serializes its data and recreates new storage over existing one (note that we can't generically clone storage); after that new tree is instantiated using newly created storage. Since we use same dataset as for 'test_from_slice', we can easily combine these two constructors into single group in order to reduce number of lines for expressing new test.
src/merkle.rs
Outdated
@@ -447,7 +447,7 @@ impl< | |||
); | |||
|
|||
let store = S::new_from_slice(tree_len, &data).context("failed to create data store")?; | |||
let root = store.read_at(data.len() - 1)?; | |||
let root = store.read_at(store.len() - 1)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these are compatible? I thought one was in element counts and one was in data/byte counts(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, for pointing on this. This is a place, where I'm not sure completely, TBH. But two following thoughts caused me to introduce this change:
- I have double-checked that
from_tree_slice
constructor doesn't have tests neither in this repository, nor inrust_fil_proofs
- so it is impossible to check compatibility actually; data
from this constructor is actually a byte slice that, as far as I understand this, represents serialised storage in some custom format. Butstore
is an instance of storage, created after deserialisingdata
viaS::new_from_slice
, and if we want to get last element from store (which is root of tree), we need to use indexing of store - not data.
I added tests for this constructor, where base binary tree with vector storage is instantiated with 4 leaves (each leaf is a 16-bytes array) and basic functionality of the tree is evaluated. This tree has 7 elements in total (4 leaves + 2 nodes + 1 root), which gives 7 * 16 = 112 bytes slice for serialised tree (our data
). With previous version of code I was getting panic:
thread 'test_merkle_constructors::test_from_tree_slice_group' panicked at 'index out of bounds: the len is 7 but the index is 111', src/store/vec.rs:86:12
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test test_merkle_constructors::test_from_tree_slice_group ... FAILED
specifically at this place, when root was being read. And it is reasonable, since we use data
which is 112 bytes slice, instead of store
(of VecStore
type) which is actually vector of elements (Element
type, which in my case is Item
- 16-bytes array), so according to the logic in mind, we need to take last Item
from our store
, which has index: [store.len() - 1]
.
d89b2f3
to
4b296d0
Compare
@vmx, thanks for initial review. I will introduce your remarks, once finishing with other constructors |
Previously compound trees were tested (in context of new test suite) redundantly. E.g. for 'from_trees' constructor we performed tests varying base tree constructors which is not needed actually to be ensure that 'from_trees' works as expected. It is sufficient to use some single base tree constructor and run tests one per each dataset group, which is actually introduced in this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the map usage suggestions, but looks good overall
@cryptonemo , take a look please at 46fb497. Here I introduced changes into |
@vmx , @cryptonemo , I've finally come up with all constructors covered. And I added proposed changes. Take please a look once again. In a meanwhile, I will try to experiment with tests in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed all constructors closely. If I should, please let me know.
I saw a few for
loops that could be map()
s, but in some cases I can also imagine that the type annotations might become crazy. So I'm OK with just keeping the current state.
That's true, you are right, @vmx ! I also tried to replace for loops with functional constructions where possible |
88a316a
to
2be824d
Compare
Last commit is a result of investigation - why some tests of As all constructors are finally covered, and we investigated the reason of high-level fails - I think this PR is ready for merging, so I remove 'WIP' label in its name |
As we agreed with @cryptonemo , it is worth to perform testing of this branch with 32/64 GiG sectors used in production. So let's no rush with merging it |
…constructors_testing
let mut grouped_trees = Vec::with_capacity(sub_tree_count); | ||
for _ in (0..sub_tree_count).step_by(trees.len() / sub_tree_count) { | ||
grouped_trees.push(trees.split_off(trees.len() / sub_tree_count)); | ||
let mut grouped_trees = Vec::with_capacity(top_tree_arity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test (or otherwise point it out if it exists) that shows what cases the previous code here breaks in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created reproducible example that demonstrates mentioned caveat of from_sub_trees_as_trees
constructor in this branch. If you pull it and run:
cargo test reproducible_example --release -- --nocapture
you should see that for from_sub_trees_as_trees
successfully instantiates 2-4-2 compound-compound tree, but fails on 2-5-3. It is also visible on our CI.
This commit drops DiskStore-based trees from test_xor128.rs as they are presented in new test suite (in tests/test_base_constructors.rs) and also moves some missing tests from #98
from_sub_trees_as_trees
constructor
@cryptonemo, I narrow down the scope of this PR to discussion of |
This commit drops introduced constructors' in order to simplify merging #100
I upload full log of successful executing test_seal_lifecycle_32gib_porep_id_v1_1_top_8_8_0_api_v1_1 test from |
Uploading full log of successful executing test_seal_lifecycle_64gib_porep_id_v1_1_top_8_8_2_api_v1_1 test: |
…constructors_testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code is way easier to follow. I think it could even be more clear using map
instead of for loops.
I forgot to add that I think it would be good to squash it into a single commit with a good commit message explaining that the old code was wrong etc. |
Yes. Will keep in mind in further PRs. |
This PR is rebased on top of #100 (that introduced framework for integration testing). It refactors
from_sub_trees_as_trees
constructor of MerkleTree (top-tree) instance in a more generic way, e.g. values ofSubTreeArity
andTopTreeArity
can be either even or odd (previously only even numbers were supported and, with odd numbers, mentioned constructor panicked). This change has been evaluated using high-level seal-lifecycle tests for production sectors (32gib and 64gib). Logs of successful executing mentioned tests on proofs-buildbot are included.