Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mime_type object before application_data #605

Closed
wants to merge 1 commit into from

Conversation

rohan-wire
Copy link
Contributor

This allows applications to express the MIME type of the application_data if desired.

@rohan-wire
Copy link
Contributor Author

In case you are wondering how you would negotiate mime types used by an MLS client, here is a WIP very rough preview:
https://github.com/rohan-wire/ietf-drafts/blob/main/mahy-dispatch-immi-mls-mime/draft-mahy-dispatch-immi-mls-mime.md

@bifurcation
Copy link
Collaborator

@rohan-wire why should this not just go inside of application_data? HTTPS doesn't specify Content-Type at the TLS layer. I realize there's probably a closer match between MLS messages and objects than with TLS frames, but still, it seems simpler to keep this layer as a transport for bytes, and do content negotiation within the byte stream.

@rohan-wire
Copy link
Contributor Author

@rohan-wire why should this not just go inside of application_data? HTTPS doesn't specify Content-Type at the TLS layer.

TLS is usually running directly on IPv4 or IPv6 and was reached by a specific port or started inside an Application protocol with STARTTLS. The format of the thing inside TLS is known from the port number or the outer protocol. Most of the protocols running inside TLS (SMTP, HTTP, IMAP, SIP) have their own way of advertising content, most of them using MIME headers. All of them are 1:1 and most are client-to-server. None of the protocols running inside TLS are 1:N protocols.

Now take the case of MLS. MLS does not know how to run on top of IP. You could write a profile to do that, but it would be a lot of work to make a standardized protocol that covers all the corner cases. It would be useful for some applications but almost certainly not for the basic messaging applications which motivate it. It needs to run on some other protocol. See my map below. So, what goes inside an MLS message?

A: content goes here
P: MLS <- You are here
S: HTTP
T: TLS <- not here
N: IPv4 or v6
D: 802.3
P: 1000BASE-T

In a proprietary application you can just have all the MLS clients agree, but what about if you want to support an interoperable IM system, or an interoperable Pub/Sub messaging bus for IoT devices? What are you going to use to communicate the content of the message? HTTP won't work for 1:N messages. Do you really want to require an ASCII Content-Type header at the top of each message just so you know what is inside it? That's a pretty fundamental thing to know.

For me this is a no-brainer. If you don't need it, you lose 1 octet to carry a zero-length MIME type. If you need it, you have it built-in. You don't need messy content-encodings, line folding, optional whitespace, header capitalization arguments, etc which have dogged IETF text-based protocols for decades. If the MIME type of your MLS application message is application/vnd.foo-pubsub, application/cbor, text/html, application/geo+json, message/cpim, or message/vnd.bar-protocol you don't have to guess. You can easily migrate from one format to another. You can even use multipart/alternative to send a vendor-proprietary version usable to some clients in a group and a standardized interoperable version to others.

@bifurcation
Copy link
Collaborator

I think I've got a different view of the protocol stack:

1. Actual user content
2. Message framing
3. MLS as used in context
4. Transport

In order to get to an interoperable system, there's going to have to be work on (2), (3 "as used"), and (4). (And I expect the work-split among those layers to be a point of significant debate!) My point is that any of those could handle the content negotiation:

  • Transport says "this is the content type inside the MLS-protected thing"
  • "As used" specifies a format for authenticated_data that specifies the content type
  • Inner message framing specifies concrete type

All of that is possible without this PR. So in other words, I think merging this PR would be getting ahead of ourselves in the full-stack interoperability discussion.

@rohan-wire
Copy link
Contributor Author

Renumbering your stack:

7 1/2. Actual user content
7. Message framing ??
6. MLS as used in context
4 + 5. Transport + Session

In order to get to an interoperable system, there's going to have to be work on (7), (6 "as used"), and (4+5). (And I expect the work-split among those layers to be a point of significant debate!) My point is that any of those could handle the content negotiation:

I don't think we need to define 4 + 5 and probably should not because we want to allow multiple protocols with gateways. In my example TLS and HTTP work just fine without modification, but you could use any number of protocols here. What these protocols should say about their contents is that the contents are message/mls. Saying anything more is likely to leak private data.

We shouldn't need to define a lot of new things for MLS either. In fact I don't see any reason you couldn't implement a basic instant messaging system with MLS using the base spec.

As for what goes inside MLS, whatever it is either has or should have a MIME type. IETF protocols provide a way to specify what is in the next layer. The next layer cannot be responsible for describing itself. Unless there is only a single message framing layer that an implementer is allowed to run inside MLS, then MLS needs to describe what runs inside.

As for a definition of 7 as a separate layer, I think you are artificially mandating a specific number of layers on top of MLS. I already made a concrete proposal for how to do common instant messaging features on top of MLS in my draft:
https://www.ietf.org/archive/id/draft-mahy-dispatch-immi-content-00.html

@martinthomson
Copy link
Contributor

My sense is that what Richard is describing is probably better than this.

Let's say that you use CPIM. There, you define that the payload of the message is MIME, which includes Content-Type. The Content-Type header is then found in the payload of the message along with other MIME header fields.

This design is better in the sense that you can switch hit between outer content types without any risk of ambiguity/confusion attack, but that is a feature that is rarely needed, and building it within the payload is probably sensible. Many applications will have a single message format with other means of unambiguously indicating its type. For those that do want different interpretations, MIME media types aren't the only way in which they might want to negotiate the type of content. @bemasc suggested at the DISPATCH meeting the possibility that the label might be ALPN instead (though that assumes a very different context. so I'm not sure if that makes sense).

At a minimum, I would suggest that this would need to be a negotiated extension to MLS.

@bemasc
Copy link

bemasc commented Mar 21, 2022

What I meant by "ALPN" was just that this "type signal" describes something interactive (more like a protocol than a format) and needs negotiation. Having more than two parties makes it different from ALPN.

I'd like to see some more thought to agility and forward-compatibility. How can we gracefully upgrade the group across both small (e.g. new text formatting option) and large (e.g. HTTP/1.1 -> H2) transitions? This seems like a major challenge.

@rohan-wire
Copy link
Contributor Author

I just submitted a draft that proposes to do content negotiation when it's GroupContext extension is present.

My sense is that what Richard is describing is probably better than this.

Let's say that you use CPIM. There, you define that the payload of the message is MIME, which includes Content-Type. The Content-Type header is then found in the payload of the message along with other MIME header fields.

This design is better in the sense that you can switch hit between outer content types without any risk of ambiguity/confusion attack, but that is a feature that is rarely needed, and building it within the payload is probably sensible. Many applications will have a single message format with other means of unambiguously indicating its type. For those that do want different interpretations, MIME media types aren't the only way in which they might want to negotiate the type of content. @bemasc suggested at the DISPATCH meeting the possibility that the label might be ALPN instead (though that assumes a very different context. so I'm not sure if that makes sense).

At a minimum, I would suggest that this would need to be a negotiated extension to MLS.

Still I think replying to two specific things:

Many applications will have a single message format with other means of unambiguously indicating its type.
Yes, in which case this PR would have allowed the mime_type to be an empty string. This would have result with one extra byte for the length (0) for folks that don't need it.

but that is a feature that is rarely needed
members in a group who want to upgrade their communications format (which happens all the time) would definitely be able to take advantage of this feature. I have rarely seen a system that doesn't end up wanting to make some incremental non-backwards compatible format change. It's really nice when you can do that without having to create a new group or have a flag-day upgrade.

@rohan-wire
Copy link
Contributor Author

Just submitted the following:
https://www.ietf.org/archive/id/draft-mahy-mls-content-neg-00.html

@rohan-wire
Copy link
Contributor Author

Also, there is a PR for the architecture draft which covers the general advise here:
mlswg/mls-architecture#94

@rohan-wire
Copy link
Contributor Author

Closing as the changes have all moved elsewhere.

@rohan-wire rohan-wire closed this May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants