Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCC-697 Digestible rework to use merlin #389

Merged
merged 6 commits into from
Sep 3, 2020

Conversation

cbeck88
Copy link
Contributor

@cbeck88 cbeck88 commented Aug 23, 2020

Motivation

Currently, hashes of transaction and the blockchain are derived using the mc-crypto-digestible crate. This is a trait, plus a proc-macro crate, which allows to bring arbitrary cryptographic hashes ("digest" implementors) to a digestible object and hash it. This essentially combines serialization and hashing into one step. The goals of this were (1) provide stability guarantees that aren't normally present in serialization, (2) provide security guarantees that the hash is not malleable.

When this crate was merged we did not all agree that this was the best way to do this -- there was an argument that we should use merlin instead of basing it on a hash function, because merlin is a framework and offers a higher-level API, and is well integrated with the rest of the dalek ecosystem, particularly the Schnorrkel digital signature crate.

In this PR we pivot to using merlin. One benefit of this switch is that I think it is possible to prove a slightly stronger security property after this revision than before -- we can say that no two distinct ASTs will have the same hash, not just no two instances of the same structure.

But the most interesting thing in this PR is likely that we are here building in a form of support for schema evolution which was not present in the initial code.

(3) Schema evolution here means that just as in protobuf, we can add new optional fields to structures without breaking compatibility with the old structures. In this case, compatibility means ledger-compatibility. We can add optional fields to thing without changing the hashes of old blocks.

With this PR we will be able to add new Optional members to e.g. TxOut type or BlockContents type without having to figure out how to migrate the ledger, because the hashes of the old blocks won't change as a result of the change. So, in some cases we can add features without having to figure out how to migrate the ledger, just as we normally would when using protobuf.

There will be minor ancillary benefits like, less low-level things will depend on the rust digest crate which has seen recent breaking changes. (although we seem to be past the breaking changes now, courtesy of Eran :) )

In this PR

  • Diff is +3000 lines, but the vast majority of that is tests (which become many lines after rustfmt) and readme
  • Add substantial test coverage for the portions not related to digestible-derive, the portions related to digestible-derive, and the schema-evolution properties
  • Add test-vectors for stability of hashes of origin blocks, tx-outs, test-utils generated block chain

In the tests for this PR, the inspect_ast test functionality can be used to show exactly what structure digestible produced for the user-provided type. The AST can be mapped to json and pretty-printed.

The AST is not actually computed in production when we need to digest something -- it is only computed explicitly in debugging tools. However, the AST completely captures the resulting hash -- if we capture the AST for a structure and then compute the hash from the AST, we always get the same hash. (There is quite a lot of test coverage backing up that claim.)

The AST can be displayed as json: the json faithfully represents the entire AST. So, looking at the json can give the reviewer a good sense of what is being captured by the hash.

Here's the AST for a random TxOut:

  "elems": [
    {
      "amount": "aggregate",
      "elems": [
        {
          "commitment": "primitive",
          "data": "[242, 152, 40, 146, 146, 42, 240, 192, 35, 99, 79, 54, 220, 236, 108, 181, 213, 143, 214, 94, 153, 71, 189, 181, 22, 77, 172, 211, 204, 73, 63, 3]",
          "type_name": "ristretto"
        },
        {
          "data": "[187, 146, 125, 76, 38, 34, 179, 187]",
          "masked_value": "primitive",
          "type_name": "uint"
        }
      ],
      "name": "Amount"
    },
    {
      "data": "[126, 132, 117, 253, 124, 40, 56, 37, 230, 94, 15, 206, 138, 168, 124, 53, 160, 123, 163, 167, 130, 15, 16, 157, 171, 117, 12, 8, 214, 240, 105, 113]",
      "target_key": "primitive",
      "type_name": "ristretto"
    },
    {
      "data": "[238, 35, 140, 56, 133, 221, 101, 105, 2, 101, 249, 201, 51, 21, 57, 222, 41, 51, 60, 205, 41, 220, 180, 61, 236, 162, 72, 97, 136, 209, 89, 88]",
      "public_key": "primitive",
      "type_name": "ristretto"
    },
    {
      "data": "[165, 234, 17, 98, 158, 70, 82, 80, 174, 54, 135, 135, 107, 141, 215, 81, 56, 69, 162, 101, 225, 237, 184, 140, 10, 219, 123, 17, 207, 187, 210, 165, 66, 245, 105, 183, 136, 31, 101, 233, 86, 191, 35, 6, 42, 81, 44, 59, 246, 77, 18, 141, 90, 67, 169, 243, 2, 56, 101, 10, 2, 55, 212, 240, 173, 69, 76, 136, 133, 105, 181, 252, 11, 22, 207, 187, 106, 149, 108, 150, 246, 214, 15, 106, 92, 189, 148, 227, 237, 135, 44, 121, 190, 52, 189, 0, 67, 68, 68, 248, 208, 231, 145, 194, 4, 77, 47, 49, 238, 63, 73, 195, 126, 92, 230, 175, 136, 76, 47, 237, 65, 50, 107, 139, 225, 12, 0, 0]",
      "e_account_hint": "primitive",
      "type_name": "bytes"
    }
  ],
  "name": "TxOut",
  "test": "aggregate"
}

Here's the AST for a random origin block:

{
  "elems": [
    {
      "data": "[138, 161, 54, 74, 202, 243, 220, 127, 18, 223, 79, 47, 175, 248, 56, 118, 93, 203, 45, 96, 32, 111, 2, 177, 99, 85, 125, 216, 81, 32, 223, 120]",
      "id": "primitive",
      "type_name": "bytes"
    },
    {
      "data": "[0, 0, 0, 0]",
      "type_name": "uint",
      "version": "primitive"
    },
    {
      "data": "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]",
      "parent_id": "primitive",
      "type_name": "bytes"
    },
    {
      "data": "[0, 0, 0, 0, 0, 0, 0, 0]",
      "index": "primitive",
      "type_name": "uint"
    },
    {
      "cumulative_txo_count": "primitive",
      "data": "[5, 0, 0, 0, 0, 0, 0, 0]",
      "type_name": "uint"
    },
    {
      "elems": [
        {
          "elems": [
            {
              "data": "[0, 0, 0, 0, 0, 0, 0, 0]",
              "from": "primitive",
              "type_name": "uint"
            },
            {
              "data": "[0, 0, 0, 0, 0, 0, 0, 0]",
              "to": "primitive",
              "type_name": "uint"
            }
          ],
          "name": "Range",
          "range": "aggregate"
        },
        {
          "data": "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]",
          "hash": "primitive",
          "type_name": "bytes"
        }
      ],
      "name": "TxOutMembershipElement",
      "root_element": "aggregate"
    },
    {
      "contents_hash": "primitive",
      "data": "[142, 234, 222, 1, 66, 175, 228, 166, 18, 94, 58, 74, 199, 168, 43, 52, 175, 1, 126, 90, 222, 153, 120, 59, 217, 187, 215, 47, 43, 213, 93, 67]",
      "type_name": "bytes"
    }
  ],
  "name": "Block",
  "test": "aggregate"
}

@cbeck88 cbeck88 force-pushed the merlin-digestible branch 4 times, most recently from 4e1068c to 591ea19 Compare August 26, 2020 17:40
@cbeck88 cbeck88 marked this pull request as ready for review August 26, 2020 17:41
@cbeck88 cbeck88 force-pushed the merlin-digestible branch 2 times, most recently from c522edf to b8a60a6 Compare August 26, 2020 18:42
@@ -30,7 +29,7 @@ lazy_static! {
pub fn hash_leaf(tx_out: &TxOut) -> [u8; 32] {
let mut hasher = Blake2b256::new();
hasher.update(&TXOUT_MERKLE_LEAF_DOMAIN_TAG);
tx_out.digest(&mut hasher);
hasher.update(&tx_out.hash());
Copy link
Contributor Author

@cbeck88 cbeck88 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't want to mess around with this for now. we could try to make this use merlin instead of blake2b, but I don't really see the point.
i think its fine either way

@cbeck88 cbeck88 changed the title WIP merlin digestible MC-697 Digestible rework to use merlin Aug 26, 2020
@jcape
Copy link
Contributor

jcape commented Aug 26, 2020

Won't this break the use of ed25519 as the leaf key in the fog authority signing chain?

pair.verify_digest(hasher, &sig)
let mut hasher = PseudoMerlin(Sha512::default());
data.append_to_transcript(b"test", &mut hasher);
pair.verify_digest(hasher.inner, &sig)
Copy link
Contributor Author

@cbeck88 cbeck88 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcape here's how you do ed25519ph signature, with no merlin, after this revision

@kylefleming kylefleming changed the title MC-697 Digestible rework to use merlin MCC-697 Digestible rework to use merlin Aug 31, 2020

Note that fields may not be re-ordered or renamed.

However, in the future, we may support a proc-macro attribute in digestible-derive that allows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that would be useful.

Tangentially, the Jackson JSON processor in Java might be a good source of inspiration. That supports a variety of ways to annotate a class in order to customize how it gets serialized, e.g. https://www.baeldung.com/jackson-annotations

crypto/digestible/src/lib.rs Outdated Show resolved Hide resolved
@cbeck88
Copy link
Contributor Author

cbeck88 commented Sep 1, 2020

this is now rebased on master

thanks Matt for all the detailed comments! i will work to address them all after 5pm

Copy link
Contributor

@mfaulk mfaulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Chris!

Copy link
Contributor

@jcape jcape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro processing seems reasonable, and it looks like my main concerns from the last PR are addressed (e.g. it's not possible to have vec![u16; 2] and vec![u32; 1] with identical bytes map to identical representations), but there are a couple nitpicks (e.g. I think we can use const generics now, since LengthAtMost32 was removed from rust).

crypto/digestible/README.md Outdated Show resolved Hide resolved
crypto/digestible/derive/src/lib.rs Outdated Show resolved Hide resolved
/// The data is the canonical bytes representing the primitive.
/// If the primitive does not have a canonical representation as bytes then
/// it isn't appropriate to treat it as a primitive in this hashing scheme.
#[inline]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should either be inline(always), or left to LTO.

Copy link
Contributor Author

@cbeck88 cbeck88 Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right,
(1) my experience has been that benchmarks got significantly better in ORAM when I used #[inline], instead of doing what you are suggesting, for functions in aligned, aligned-cmov and other crates
(2) I think the convention in many essential rust libraries is to use #[inline] when we want to permit the optimizer to inline this across crate boundaries, which is usually the case for tiny, low-level functions like this. For example, here's a related discussion in the hashbrown crate which is very informative: rust-lang/hashbrown#119, which demonstrated that using #[inline] can lead to better performance.

I think it's better to permit the compiler to inline things in cases like this, if llvm decides its a good idea. Thin-LTO is a late stage optimization pass, with less information and less time to make optimization decisions, and it shouldn't be relied on or used as a replacement for things that would normally be done at -O2 or -O3 optimizations.
If we omit #[inline] then the optimizer is not permitted to inline this function across crate boundaries, even at -O2 and -O3 levels.

The main reason to remove inline annotations is to sacrifice runtime performance for better compile times. We shouldn't do that unless the compile times for this are actually slow. I see no evidence that that is the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could have an inline-more feature like hashbrown does: https://github.com/rust-lang/hashbrown/blob/master/Cargo.toml#L53

It just seems kind of silly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you building with lto?

crypto/digestible/src/lib.rs Show resolved Hide resolved
Copy link
Contributor

@tsegaran tsegaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this pending CI

@cbeck88 cbeck88 merged commit 0ff6bf0 into mobilecoinfoundation:master Sep 3, 2020
@cbeck88 cbeck88 deleted the merlin-digestible branch September 3, 2020 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants