Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move wire_format to a separate tagged-union structure MLSMessage #523

Merged
merged 6 commits into from Feb 11, 2022

Conversation

TWal
Copy link
Contributor

@TWal TWal commented Dec 7, 2021

The current use of wire_format is a bit weird to me: it is the first value of both MLSPlaintext and MLSCiphertext which allows the implementations to differentiate between them. Then it looks like it should be implemented as a regular tagged union, which is what this pull request does.

This does not imply any changes to existing implementations.

One thing that this pull-request changes is that MLSPlaintext is forced to have wire_format = mls_plaintext, but I don't see why it would have any other values in this case?

@raphaelrobert
Copy link
Member

@raphaelrobert raphaelrobert commented Dec 10, 2021

I think this problematic, because it implies that the wire format of an MLSPlaintext is always mls_plaintext (which is true for the final format on the wire, but not when handling things internally). Since the wire format is part of MLSPlaintextTBS, this would be confusing for a decrypted MLSCiphertext for example. The way it is now is also really nice from an implementation standpoint, because the wire format field essentially tracks whether an MLSPlaintext was previously an MLSCiphertext.

@TWal
Copy link
Contributor Author

@TWal TWal commented Dec 10, 2021

Thank you, now I understand better why MLSPlaintext doesn't have wire_format = mls_plaintext like in MLSCiphertext.

I didn't think that MLSCiphertext were decoded to a MLSPlaintext. In fact, the document never mentions this, however it mentions the converse at the end of section 9.0 (but doesn't talk about wire_format though).
In the rest of the document, messages are named as MLSPlaintext (e.g. "The MLSPlaintext MUST be signed" and not "The message MUST be signed"), so implicitly MLSCiphertext have been converted to MLSPlaintext (which is a bit weird I think, especially regarding the membership_tag that exists in MLSPlaintext and not in MLSCiphertext).

I think it would be valuable to refactor the part of the document related to message framing to remove such ambiguities.

The easiest way would be to be explicit about the conversion between MLSPlaintext and MLSCiphertext.

I spent a lot of time thinking about the nicest way to handle these structures when I implemented these things in the F* implementation of MLS, and I think it could lead to a refactorization of this document.

We can define the content of a message (so really, meta-data + the real content) like this:

struct {
    opaque group_id<0..255>;
    uint64 epoch;
    Sender sender;
    opaque authenticated_data<0..2^32-1>;

    ContentType content_type;
    select (MLSPlaintext.content_type) {
        case application:
          opaque application_data<0..2^32-1>;

        case proposal:
          Proposal proposal;

        case commit:
          Commit commit;
    }
} MLSMessageContent;

And the authentication of a message like this:

struct {
    opaque signature<0..2^16-1>;
    optional<MAC> confirmation_tag;
} MLSMessageAuth;

If you have a MLSMessageContent, you can compute MLSMessageAuth (using a signing key, confirmation_key, interim_transcript_hash and GroupContext).
This would be done by filling correctly the structures MLSMessageContentTBS and MLSMessageContentTBM (previously MLSPlaintextTB.)

You can then do the following conversions:

  • MLSPlaintext to (MLSMessageContent, MLSMessageAuth)
  • (MLSMessageContent, MLSMessageAuth) to MLSPlaintext (using membership_key and GroupContext to compute membership_tag)
  • MLSCiphertext to (MLSMessageContent, MLSMessageAuth) (using sender_data_secret, and the current state of the Secret Tree)
  • (MLSMessageContent, MLSMessageAuth) to MLSCiphertext (using sender_data_secret, and the current state of the Secret Tree)

Then the rest of the document can then talk about MLSMessageContent instead of MLSPlaintext.

I would understand if you want to do the minimal possible changes, however I think doing this kind of big refactor would make this document a lot easier to understand for a first-time reader.

@bifurcation
Copy link
Collaborator

@bifurcation bifurcation commented Dec 14, 2021

@TWal - That proposal actually seems like it could make the programming easier, in addition to making the text clearer. I would be game for a PR if you'd like to write one up.

Going back to the original point about wire_format: Note that the signature in the MLSMessageAuth is specific to a wire format, so the conversions you note might not be quite as decoupled as you think. But you still end up with something nicer than what we have today.

# Signing + Encryption

MLSMessageContent 
  --(GroupContext, wire_format, sig_priv, confirmation_key)--> 
  (wire_format, MLSMessageContent, MLSMessageAuth)

  * Compute message signature
  * Update confirmed transcript hash
  * Compute confirmation tag
  * Update interim transcript hash


(mls_plaintext, MLSMessageContent, MLSMessageAuth) 
  --(GroupContext, membership_key)--> 
  MLSPlaintext

  * Compute membership_tag
  * Assemble MLSPlaintext


(mls_ciphertext, MLSMessageContent, MLSMessageAuth) 
  --(sender_data_secret, Secret Tree)-->
  MLSCiphertext

  * Encode and encrypt content
  * Encode and encrypt sender data
  * Assemble MLSCiphertext


# Decryption + Verification

MLSPlaintext 
  --(GroupContext, membership_key)-->
  (mls_plaintext, MLSMessageContent, MLSMessageAuth)

  * Verify membership_tag
  * Parse MLSPlaintext into MLSMessageContent, MLSMessageAuth


MLSCiphertext
  --(sender_data_secret, Secret Tree)-->
  (mls_ciphertext, MLSMessageContent, MLSMessageAuth)

  * Decrypt and decode sender data
  * Decrypt and decode content
  * Parse MLSCiphertext and decrypted values into MLSMessageContent, MLSMessageAuth


(wire_format, MLSMessageContent, MLSMessageAuth)
  --(GroupContext, wire_format, sig_pub, confirmation_key)-->
  MLSMessageContent

  * Verify message signature
  * Update confirmed transcript hash
  * Verify confirmation tag
  * Update interim transcript hash

@TWal
Copy link
Contributor Author

@TWal TWal commented Dec 16, 2021

Okay, I'll start working on a PR that uses this decomposition!

@raphaelrobert
Copy link
Member

@raphaelrobert raphaelrobert commented Dec 16, 2021

I'm not against attempting to refactor this. It has grown organically over time and could certainly benefit from a holistic overhaul. (@TWal not that you omitted the membership tag in your proposal above)

@TWal
Copy link
Contributor Author

@TWal TWal commented Dec 17, 2021

I made a first draft on how the Message Framing section would look like after the refactor. If you agree about these changes I'll change the mentions to MLSPlaintext everywhere in the document. What do you think about it? I'm not used to write RFC-like documents so I'm not 100% sure about the formulation I used.

@raphaelrobert the membership tag is present only in the plaintext and not is the ciphertext, so is it is not part of MLSMessageAuth, but only part of MLSPlaintext.

@rohan-wire
Copy link
Contributor

@rohan-wire rohan-wire commented Jan 10, 2022

I independently was working on a rewrite of the definition of MLSPlaintext as follows. Note that this makes the presence of the confirmation_tag and membership_tag more explicit.

I am also not a fan of the name authenticated_data because all the data in MLSPlaintextTBS is authenticated. I am open to other suggestions.

struct {
    WireFormat wire_format;
    opaque group_id<0..255>;
    uint64 epoch;
    Sender sender;
    opaque extra_authenticated_data<0..2^32-1>;
    ContentType content_type;
    select (MLSPlaintext.content_type) {
        case application:
          opaque application_data<0..2^32-1>;
        case proposal:
          Proposal proposal;
        case commit:
          Commit commit;
    }
} MLSPlaintextTBS;

struct {
    WireFormat wire_format = mls_plaintext;
    opaque group_id<0..255>;
    uint64 epoch;
    Sender sender;
    opaque extra_authenticated_data<0..2^32-1>;
    ContentType content_type;
    select (MLSPlaintext.content_type) {
        case proposal:
          Proposal proposal;
        case commit:
          Commit commit;
    }
    opaque mls_plaintext_signature<Signature_length>;
    select (MLSPlaintext.content_type) {
        case commit:
            opaque confirmation_tag<MAC_length>;
    }
    select (MLSPlaintext.sender.sender_type) {
        case member:
            opaque membership_tag<MAC_length>;
    }
} MLSPlaintext;

@TWal
Copy link
Contributor Author

@TWal TWal commented Jan 12, 2022

About authenticated_data: I agree with you, extra_authenticated_data is indeed a better name!

About putting things into selects:

  • From a specification point of view, I agree it is appealing to constrain the message format instead of saying that "some optional MUST contain a value when [...]" in the prose below.
  • From an implementation point of view, it makes the message parsing/serializing a bit more difficult to do. I'm guessing in most TLS parsing/serializing libraries, it's easy to do tagged-union-like parsing but a bit annoying to do more complex things.

About using MAC_length instead of variable-length arrays: I think this is a bad idea, because it means the parser has to know which ciphersuite is in use. We typically don't know this when parsing the MLSPlaintext or MLSCiphertext since we don't know which group it is for before parsing it. We could technically parse the beginning of the message to extract the group_id and then parse the rest of the message, but I think it's a bit ugly to do it like this instead of having a variable-length field.

@kkohbrok
Copy link
Contributor

@kkohbrok kkohbrok commented Jan 12, 2022

If we're changing the name of authenticated_data, I'd suggest associated_authenticated_data (aad), matching the terminology from AEAD (authenticated encryption with associated data), which is where it ends up being used.

@rohan-wire
Copy link
Contributor

@rohan-wire rohan-wire commented Jan 27, 2022

@TWal have you made any more progress on this refactor?

I am definitely still partial to using selects. Your comment about MAC_size makes sense to me.

@bifurcation
Copy link
Collaborator

@bifurcation bifurcation commented Jan 27, 2022

Discussion on working call:

  • General agreement that this approach is good
  • Holding this PR until after section rearrangement
  • @rohan-wire suggests extending the wire_format idea to other types
    • e.g., Welcome, PublicGroupState
    • @kkohbrok notes that this could help with unambiguous signatures as well
    • Let's handle as a follow-on once this merges

@TWal
Copy link
Contributor Author

@TWal TWal commented Feb 2, 2022

I just made the merge post-section rearranging (it was an ugly one, it was easier to re-do it from scratch).

@rohan-wire I opened the issue #574 to talk about it, since I feel it belong to a separate discussion.

About adding Welcome and PublicGroupState in wire_format, that means we should add them in the select for MLSMessage?

@bifurcation
Copy link
Collaborator

@bifurcation bifurcation commented Feb 3, 2022

Discussion on virtual interim:

  • Re: @TWal's question - Yes, let's cover all the types
    • @rohan-wire notes that we should also cover KeyPackage

Copy link
Collaborator

@bifurcation bifurcation left a comment

Looks good to me, once the additional MLSMessageType values are in.

@rohan-wire
Copy link
Contributor

@rohan-wire rohan-wire commented Feb 3, 2022

@rohan-wire I opened the issue #574 to talk about it, since I feel it belong to a separate discussion.
Thanks. Good call.

About adding Welcome and PublicGroupState in wire_format, that means we should add them in the select for MLSMessage?
Yes. And also KeyPackage.

@bifurcation
Copy link
Collaborator

@bifurcation bifurcation commented Feb 7, 2022

@TWal - Also happy to merge this and do the extra content types in a follow-up. Up to you.

@TWal
Copy link
Contributor Author

@TWal TWal commented Feb 7, 2022

We still need to change the references to MLSPlaintext everywhere. I'll handle this tomorrow, and handle the extra content types at the same time

@TWal
Copy link
Contributor Author

@TWal TWal commented Feb 8, 2022

I just updated it, however I noticed something a bit weird when removing the mentions to MLSPlaintext:

ProposalRef, the `value` input is the MLSPlaintext carrying the proposal, and
the KDF is determined by the group's ciphersuite.

Why do we make a hashref of the MLSPlaintext and not the Proposal?
In any case it looks like we need to do a non-editorial change here.

@raphaelrobert
Copy link
Member

@raphaelrobert raphaelrobert commented Feb 8, 2022

I think this goes in the right direction, but I'm wondering about the authentication of the wire_format field in MLSMessage:

  • It is authenticated as part of MLSMessageContentTBS for MLSPlaintext/MLSCiphertext`
  • It is not authenticated for all other message types

I'd like to propose the following changes:

  • the wire_format filed should also be authenticated for other messages, it can simply be part of KeyPackageTBS etc.
  • we have a protocol_version field in all messages except MLSPlaintext/MLSCiphertext: we should extract it and add it right after the wire_format field for all messages. This would increase the size by two bytes in some cases, but I think that's tolerable. Obviously we can also authenticate that field.
  • with the above, there is no need for the optional version number proposed in #581. I think having it in the message would be much more robust and future-proof.

@bifurcation
Copy link
Collaborator

@bifurcation bifurcation commented Feb 10, 2022

Discussion on virtual interim:

  • Why is MLSPlaintext hashed?
    • To capture the sender in the transcript, important for Update
    • Probably want some struct that captures (wire format, MLSMessageContent, MLSMessageAuth)
  • Changing option to select
    • Already have the cross-message problem in MLSCiphertextContent
    • Could do a template-like thing, à la C++
    • Having the cross-message dependency seems OK
    • For transcript hash input of MLSMessageAuth, just don't have optional/select
  • Sign wire format? Replace labeled signature?
    • Leave this for later, including leaving inconsistency between wire format and label as distinguishers
    • See #589
  • Protocol version in MLSMessage?
    • Labeled signing already adds MLS version
    • Might need an explicit version, e.g., to aid in parsing, could replace the version in the labeled signing
    • Filed as #590
  • TODO(@TWal) Update to use select with cross-message dependency, then clear to merge

@TWal
Copy link
Contributor Author

@TWal TWal commented Feb 11, 2022

I think this is now mergable, I introduced a new structure MLSMessageContentAuth for ProposalRef.

@bifurcation bifurcation merged commit 5c43b59 into mlswg:main Feb 11, 2022
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants