-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCC-697 Digestible rework to use merlin #389
MCC-697 Digestible rework to use merlin #389
Conversation
4e1068c
to
591ea19
Compare
c522edf
to
b8a60a6
Compare
@@ -30,7 +29,7 @@ lazy_static! { | |||
pub fn hash_leaf(tx_out: &TxOut) -> [u8; 32] { | |||
let mut hasher = Blake2b256::new(); | |||
hasher.update(&TXOUT_MERKLE_LEAF_DOMAIN_TAG); | |||
tx_out.digest(&mut hasher); | |||
hasher.update(&tx_out.hash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't want to mess around with this for now. we could try to make this use merlin instead of blake2b, but I don't really see the point.
i think its fine either way
b8a60a6
to
4185886
Compare
Won't this break the use of ed25519 as the leaf key in the fog authority signing chain? |
pair.verify_digest(hasher, &sig) | ||
let mut hasher = PseudoMerlin(Sha512::default()); | ||
data.append_to_transcript(b"test", &mut hasher); | ||
pair.verify_digest(hasher.inner, &sig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcape here's how you do ed25519ph signature, with no merlin, after this revision
crypto/digestible/README.md
Outdated
|
||
Note that fields may not be re-ordered or renamed. | ||
|
||
However, in the future, we may support a proc-macro attribute in digestible-derive that allows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, that would be useful.
Tangentially, the Jackson JSON processor in Java might be a good source of inspiration. That supports a variety of ways to annotate a class in order to customize how it gets serialized, e.g. https://www.baeldung.com/jackson-annotations
4185886
to
ab95592
Compare
this is now rebased on master thanks Matt for all the detailed comments! i will work to address them all after 5pm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Chris!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro processing seems reasonable, and it looks like my main concerns from the last PR are addressed (e.g. it's not possible to have vec![u16; 2]
and vec![u32; 1]
with identical bytes map to identical representations), but there are a couple nitpicks (e.g. I think we can use const generics now, since LengthAtMost32
was removed from rust).
/// The data is the canonical bytes representing the primitive. | ||
/// If the primitive does not have a canonical representation as bytes then | ||
/// it isn't appropriate to treat it as a primitive in this hashing scheme. | ||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should either be inline(always)
, or left to LTO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is right,
(1) my experience has been that benchmarks got significantly better in ORAM when I used #[inline]
, instead of doing what you are suggesting, for functions in aligned
, aligned-cmov
and other crates
(2) I think the convention in many essential rust libraries is to use #[inline]
when we want to permit the optimizer to inline this across crate boundaries, which is usually the case for tiny, low-level functions like this. For example, here's a related discussion in the hashbrown
crate which is very informative: rust-lang/hashbrown#119, which demonstrated that using #[inline]
can lead to better performance.
I think it's better to permit the compiler to inline things in cases like this, if llvm decides its a good idea. Thin-LTO is a late stage optimization pass, with less information and less time to make optimization decisions, and it shouldn't be relied on or used as a replacement for things that would normally be done at -O2 or -O3 optimizations.
If we omit #[inline]
then the optimizer is not permitted to inline this function across crate boundaries, even at -O2
and -O3
levels.
The main reason to remove inline annotations is to sacrifice runtime performance for better compile times. We shouldn't do that unless the compile times for this are actually slow. I see no evidence that that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could have an inline-more
feature like hashbrown does: https://github.com/rust-lang/hashbrown/blob/master/Cargo.toml#L53
It just seems kind of silly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you building with lto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this pending CI
…gested Thanks again to matt for thoughtful feedback on tests
Motivation
Currently, hashes of transaction and the blockchain are derived using the
mc-crypto-digestible
crate. This is a trait, plus a proc-macro crate, which allows to bring arbitrary cryptographic hashes ("digest" implementors) to a digestible object and hash it. This essentially combines serialization and hashing into one step. The goals of this were (1) provide stability guarantees that aren't normally present in serialization, (2) provide security guarantees that the hash is not malleable.When this crate was merged we did not all agree that this was the best way to do this -- there was an argument that we should use merlin instead of basing it on a hash function, because merlin is a framework and offers a higher-level API, and is well integrated with the rest of the dalek ecosystem, particularly the Schnorrkel digital signature crate.
In this PR we pivot to using merlin. One benefit of this switch is that I think it is possible to prove a slightly stronger security property after this revision than before -- we can say that no two distinct ASTs will have the same hash, not just no two instances of the same structure.
But the most interesting thing in this PR is likely that we are here building in a form of support for schema evolution which was not present in the initial code.
(3) Schema evolution here means that just as in protobuf, we can add new optional fields to structures without breaking compatibility with the old structures. In this case, compatibility means ledger-compatibility. We can add optional fields to thing without changing the hashes of old blocks.
With this PR we will be able to add new Optional members to e.g. TxOut type or BlockContents type without having to figure out how to migrate the ledger, because the hashes of the old blocks won't change as a result of the change. So, in some cases we can add features without having to figure out how to migrate the ledger, just as we normally would when using protobuf.
There will be minor ancillary benefits like, less low-level things will depend on the rust
digest
crate which has seen recent breaking changes. (although we seem to be past the breaking changes now, courtesy of Eran :) )In this PR
In the tests for this PR, the
inspect_ast
test functionality can be used to show exactly what structure digestible produced for the user-provided type. The AST can be mapped to json and pretty-printed.The AST is not actually computed in production when we need to digest something -- it is only computed explicitly in debugging tools. However, the AST completely captures the resulting hash -- if we capture the AST for a structure and then compute the hash from the AST, we always get the same hash. (There is quite a lot of test coverage backing up that claim.)
The AST can be displayed as json: the json faithfully represents the entire AST. So, looking at the json can give the reviewer a good sense of what is being captured by the hash.
Here's the AST for a random TxOut:
Here's the AST for a random origin block: