Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to verify the digest of the whole file when digests of file parts are known #18

Open
haimgel opened this issue May 17, 2023 · 9 comments

Comments

@haimgel
Copy link

haimgel commented May 17, 2023

From my reading of Blake3 paper, and my understanding of Merkle trees in general, I think this should be doable in theory: if I have Blake3 digests of parts of a file, I should be able to calculate the digest of the whole (by building the tree of digests and then getting the root). But I don't think this module's API exposes enough to make it work, or am I missing something?

@pcfreak30
Copy link

I think this is connected to how abao can verify slices of data. https://github.com/n0-computer/abao

But #17 (comment)

@lukechampine
Copy link
Owner

if I have Blake3 digests of parts of a file, I should be able to calculate the digest of the whole

kinda. A BLAKE3 digest is the root of a Merkle tree, but the root hash and the interior node hashes use different flags. (This was an intentional design decision.) So if you have two files A and B, there's no way to get BLAKE3(A || B) from BLAKE3(A) and BLAKE3(B).

As pcfreak said, you probably want to use Bao for this. The Bao encoding contains the interior node hashes, which lets you reconstruct the Merkle root.

@haimgel
Copy link
Author

haimgel commented May 17, 2023

@lukechampine, thank you for your response!

Unfortunately, I cannot use Bao since Bao requires scanning the whole file in sequence, and I need to avoid it (the parts could be processed out of sequence, on different machines, etc.) I can, however, get some other aspect of an internal state of Blake3 for each part separately. I wonder if the complete blake3 digest can be constructed from just the chaining values of each part?

@lukechampine
Copy link
Owner

lukechampine commented May 18, 2023

Ah. Well, as long as each part is aligned to 1024-byte boundaries of the original file, it should be possible to compute the chaining value(s) of each part and combine them into the total root. This package doesn't provide a direct way to do that, but you could achieve this by calling BaoEncode on each part and manually extracting the relevant chaining values. (This will be somewhat wasteful though, as the Bao file will contain many more chaining values than the ones you actually need.)

If I were you, I think I would fork this package and add a Stack method that returns the internal stack field. Then write a function that takes a set of stacks and computes their combined root. That way, you can calculate the chaining values for each part independently, and merge their stacks together at the end.
To make your life a lot easier, ensure that the number of chunks in each part is a power of 2. That way, each part will only have a single chaining value, and you can merge them using the algorithm in mergeSubtreesGeneric.

@haimgel
Copy link
Author

haimgel commented May 18, 2023

@lukechampine, thank you, this is very helpful! I'll try these suggestions.

@haimgel
Copy link
Author

haimgel commented May 24, 2023

@lukechampine, I managed to get it to work, I also had to make the "counter" work right for parts other than the first one.

Would you be interested in a PR that implements a wrapper around Hasher that minimally exposes some internal state and functionality of the Hasher, similar to what the Rust crate has in their "guts" module? They do it for a very similar purpose.

@lukechampine
Copy link
Owner

no promises, but if you open a PR I'll definitely take a look!

@lukechampine
Copy link
Owner

@haimgel fyi, I've added a guts package now :)

@haimgel
Copy link
Author

haimgel commented Jun 8, 2024

Thank you so much for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants