MCC-697 Digestible rework to use merlin #389

cbeck88 · 2020-08-23T20:26:05Z

Motivation

Currently, hashes of transaction and the blockchain are derived using the mc-crypto-digestible crate. This is a trait, plus a proc-macro crate, which allows to bring arbitrary cryptographic hashes ("digest" implementors) to a digestible object and hash it. This essentially combines serialization and hashing into one step. The goals of this were (1) provide stability guarantees that aren't normally present in serialization, (2) provide security guarantees that the hash is not malleable.

When this crate was merged we did not all agree that this was the best way to do this -- there was an argument that we should use merlin instead of basing it on a hash function, because merlin is a framework and offers a higher-level API, and is well integrated with the rest of the dalek ecosystem, particularly the Schnorrkel digital signature crate.

In this PR we pivot to using merlin. One benefit of this switch is that I think it is possible to prove a slightly stronger security property after this revision than before -- we can say that no two distinct ASTs will have the same hash, not just no two instances of the same structure.

But the most interesting thing in this PR is likely that we are here building in a form of support for schema evolution which was not present in the initial code.

(3) Schema evolution here means that just as in protobuf, we can add new optional fields to structures without breaking compatibility with the old structures. In this case, compatibility means ledger-compatibility. We can add optional fields to thing without changing the hashes of old blocks.

With this PR we will be able to add new Optional members to e.g. TxOut type or BlockContents type without having to figure out how to migrate the ledger, because the hashes of the old blocks won't change as a result of the change. So, in some cases we can add features without having to figure out how to migrate the ledger, just as we normally would when using protobuf.

There will be minor ancillary benefits like, less low-level things will depend on the rust digest crate which has seen recent breaking changes. (although we seem to be past the breaking changes now, courtesy of Eran :) )

In this PR

Diff is +3000 lines, but the vast majority of that is tests (which become many lines after rustfmt) and readme
Add substantial test coverage for the portions not related to digestible-derive, the portions related to digestible-derive, and the schema-evolution properties
Add test-vectors for stability of hashes of origin blocks, tx-outs, test-utils generated block chain

In the tests for this PR, the inspect_ast test functionality can be used to show exactly what structure digestible produced for the user-provided type. The AST can be mapped to json and pretty-printed.

The AST is not actually computed in production when we need to digest something -- it is only computed explicitly in debugging tools. However, the AST completely captures the resulting hash -- if we capture the AST for a structure and then compute the hash from the AST, we always get the same hash. (There is quite a lot of test coverage backing up that claim.)

The AST can be displayed as json: the json faithfully represents the entire AST. So, looking at the json can give the reviewer a good sense of what is being captured by the hash.

Here's the AST for a random TxOut:

  "elems": [
    {
      "amount": "aggregate",
      "elems": [
        {
          "commitment": "primitive",
          "data": "[242, 152, 40, 146, 146, 42, 240, 192, 35, 99, 79, 54, 220, 236, 108, 181, 213, 143, 214, 94, 153, 71, 189, 181, 22, 77, 172, 211, 204, 73, 63, 3]",
          "type_name": "ristretto"
        },
        {
          "data": "[187, 146, 125, 76, 38, 34, 179, 187]",
          "masked_value": "primitive",
          "type_name": "uint"
        }
      ],
      "name": "Amount"
    },
    {
      "data": "[126, 132, 117, 253, 124, 40, 56, 37, 230, 94, 15, 206, 138, 168, 124, 53, 160, 123, 163, 167, 130, 15, 16, 157, 171, 117, 12, 8, 214, 240, 105, 113]",
      "target_key": "primitive",
      "type_name": "ristretto"
    },
    {
      "data": "[238, 35, 140, 56, 133, 221, 101, 105, 2, 101, 249, 201, 51, 21, 57, 222, 41, 51, 60, 205, 41, 220, 180, 61, 236, 162, 72, 97, 136, 209, 89, 88]",
      "public_key": "primitive",
      "type_name": "ristretto"
    },
    {
      "data": "[165, 234, 17, 98, 158, 70, 82, 80, 174, 54, 135, 135, 107, 141, 215, 81, 56, 69, 162, 101, 225, 237, 184, 140, 10, 219, 123, 17, 207, 187, 210, 165, 66, 245, 105, 183, 136, 31, 101, 233, 86, 191, 35, 6, 42, 81, 44, 59, 246, 77, 18, 141, 90, 67, 169, 243, 2, 56, 101, 10, 2, 55, 212, 240, 173, 69, 76, 136, 133, 105, 181, 252, 11, 22, 207, 187, 106, 149, 108, 150, 246, 214, 15, 106, 92, 189, 148, 227, 237, 135, 44, 121, 190, 52, 189, 0, 67, 68, 68, 248, 208, 231, 145, 194, 4, 77, 47, 49, 238, 63, 73, 195, 126, 92, 230, 175, 136, 76, 47, 237, 65, 50, 107, 139, 225, 12, 0, 0]",
      "e_account_hint": "primitive",
      "type_name": "bytes"
    }
  ],
  "name": "TxOut",
  "test": "aggregate"
}

Here's the AST for a random origin block:

{
  "elems": [
    {
      "data": "[138, 161, 54, 74, 202, 243, 220, 127, 18, 223, 79, 47, 175, 248, 56, 118, 93, 203, 45, 96, 32, 111, 2, 177, 99, 85, 125, 216, 81, 32, 223, 120]",
      "id": "primitive",
      "type_name": "bytes"
    },
    {
      "data": "[0, 0, 0, 0]",
      "type_name": "uint",
      "version": "primitive"
    },
    {
      "data": "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]",
      "parent_id": "primitive",
      "type_name": "bytes"
    },
    {
      "data": "[0, 0, 0, 0, 0, 0, 0, 0]",
      "index": "primitive",
      "type_name": "uint"
    },
    {
      "cumulative_txo_count": "primitive",
      "data": "[5, 0, 0, 0, 0, 0, 0, 0]",
      "type_name": "uint"
    },
    {
      "elems": [
        {
          "elems": [
            {
              "data": "[0, 0, 0, 0, 0, 0, 0, 0]",
              "from": "primitive",
              "type_name": "uint"
            },
            {
              "data": "[0, 0, 0, 0, 0, 0, 0, 0]",
              "to": "primitive",
              "type_name": "uint"
            }
          ],
          "name": "Range",
          "range": "aggregate"
        },
        {
          "data": "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]",
          "hash": "primitive",
          "type_name": "bytes"
        }
      ],
      "name": "TxOutMembershipElement",
      "root_element": "aggregate"
    },
    {
      "contents_hash": "primitive",
      "data": "[142, 234, 222, 1, 66, 175, 228, 166, 18, 94, 58, 74, 199, 168, 43, 52, 175, 1, 126, 90, 222, 153, 120, 59, 217, 187, 215, 47, 43, 213, 93, 67]",
      "type_name": "bytes"
    }
  ],
  "name": "Block",
  "test": "aggregate"
}

transaction/core/src/blockchain/block_signature.rs

cbeck88 · 2020-08-26T18:46:20Z

transaction/core/src/membership_proofs/mod.rs

@@ -30,7 +29,7 @@ lazy_static! {
 pub fn hash_leaf(tx_out: &TxOut) -> [u8; 32] {
    let mut hasher = Blake2b256::new();
    hasher.update(&TXOUT_MERKLE_LEAF_DOMAIN_TAG);
-    tx_out.digest(&mut hasher);
+    hasher.update(&tx_out.hash());


i didn't want to mess around with this for now. we could try to make this use merlin instead of blake2b, but I don't really see the point.
i think its fine either way

jcape · 2020-08-26T19:14:57Z

Won't this break the use of ed25519 as the leaf key in the fog authority signing chain?

cbeck88 · 2020-08-26T19:35:45Z

crypto/keys/src/ed25519.rs

-        pair.verify_digest(hasher, &sig)
+        let mut hasher = PseudoMerlin(Sha512::default());
+        data.append_to_transcript(b"test", &mut hasher);
+        pair.verify_digest(hasher.inner, &sig)


@jcape here's how you do ed25519ph signature, with no merlin, after this revision

mfaulk · 2020-09-01T20:02:59Z

crypto/digestible/README.md

+
+Note that fields may not be re-ordered or renamed.
+
+However, in the future, we may support a proc-macro attribute in digestible-derive that allows


Cool, that would be useful.

Tangentially, the Jackson JSON processor in Java might be a good source of inspiration. That supports a variety of ways to annotate a class in order to customize how it gets serialized, e.g. https://www.baeldung.com/jackson-annotations

crypto/digestible/README.md

crypto/digestible/derive/src/lib.rs

crypto/digestible/derive/test/tests/behavior.rs

crypto/digestible/src/lib.rs

crypto/digestible/test-utils/src/inspect_ast.rs

crypto/digestible/test-utils/README.md

crypto/digestible/test-utils/src/mock_merlin.rs

transaction/core/src/blockchain/block_contents.rs

cbeck88 · 2020-09-01T22:52:25Z

this is now rebased on master

thanks Matt for all the detailed comments! i will work to address them all after 5pm

mfaulk

Thank you, Chris!

jcape

The macro processing seems reasonable, and it looks like my main concerns from the last PR are addressed (e.g. it's not possible to have vec![u16; 2] and vec![u32; 1] with identical bytes map to identical representations), but there are a couple nitpicks (e.g. I think we can use const generics now, since LengthAtMost32 was removed from rust).

crypto/digestible/README.md

crypto/digestible/derive/src/lib.rs

jcape · 2020-09-02T20:50:17Z

crypto/digestible/src/lib.rs

+    /// The data is the canonical bytes representing the primitive.
+    /// If the primitive does not have a canonical representation as bytes then
+    /// it isn't appropriate to treat it as a primitive in this hashing scheme.
+    #[inline]


These should either be inline(always), or left to LTO.

I don't think this is right,
(1) my experience has been that benchmarks got significantly better in ORAM when I used #[inline], instead of doing what you are suggesting, for functions in aligned, aligned-cmov and other crates
(2) I think the convention in many essential rust libraries is to use #[inline] when we want to permit the optimizer to inline this across crate boundaries, which is usually the case for tiny, low-level functions like this. For example, here's a related discussion in the hashbrown crate which is very informative: rust-lang/hashbrown#119, which demonstrated that using #[inline] can lead to better performance.

I think it's better to permit the compiler to inline things in cases like this, if llvm decides its a good idea. Thin-LTO is a late stage optimization pass, with less information and less time to make optimization decisions, and it shouldn't be relied on or used as a replacement for things that would normally be done at -O2 or -O3 optimizations.
If we omit #[inline] then the optimizer is not permitted to inline this function across crate boundaries, even at -O2 and -O3 levels.

The main reason to remove inline annotations is to sacrifice runtime performance for better compile times. We shouldn't do that unless the compile times for this are actually slow. I see no evidence that that is the case.

I guess we could have an inline-more feature like hashbrown does: https://github.com/rust-lang/hashbrown/blob/master/Cargo.toml#L53

It just seems kind of silly

Were you building with lto?

crypto/digestible/src/lib.rs

tsegaran

I'm happy with this pending CI

…gested Thanks again to matt for thoughtful feedback on tests

cbeck88 requested review from jcape, tsegaran, xoloki, joekottke, mfaulk, eranrund, rjwalters, sugargoat and a team August 23, 2020 20:26

cbeck88 force-pushed the merlin-digestible branch 4 times, most recently from 4e1068c to 591ea19 Compare August 26, 2020 17:40

cbeck88 marked this pull request as ready for review August 26, 2020 17:41

cbeck88 force-pushed the merlin-digestible branch 2 times, most recently from c522edf to b8a60a6 Compare August 26, 2020 18:42

cbeck88 commented Aug 26, 2020

View reviewed changes

transaction/core/src/blockchain/block_signature.rs Show resolved Hide resolved

cbeck88 commented Aug 26, 2020

View reviewed changes

cbeck88 changed the title ~~WIP merlin digestible~~ MC-697 Digestible rework to use merlin Aug 26, 2020

cbeck88 force-pushed the merlin-digestible branch from b8a60a6 to 4185886 Compare August 26, 2020 18:55

cbeck88 commented Aug 26, 2020

View reviewed changes

sugargoat added the 1.0.0 label Aug 28, 2020

kylefleming changed the title ~~MC-697 Digestible rework to use merlin~~ MCC-697 Digestible rework to use merlin Aug 31, 2020