Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3079: Low Bandwidth CS API #3079

Open
wants to merge 19 commits into
base: old_master
Choose a base branch
from
Open

MSC3079: Low Bandwidth CS API #3079

wants to merge 19 commits into from

Conversation

kegsay
Copy link
Member

@kegsay kegsay commented Mar 30, 2021

@kegsay kegsay changed the title Low Bandwidth CS API MSC3079: Low Bandwidth CS API Mar 30, 2021
@turt2live turt2live added client-server Client-Server API proposal A matrix spec change proposal proposal-in-review kind:feature MSC for not-core and not-maintenance stuff labels Mar 30, 2021
which low bandwidth features are supported on this server. This object looks like:
```
"m.low_bandwidth": {
"dtls": 8008, // advertise that this server runs a UDP listener for DTLS here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"dtls": 8008, // advertise that this server runs a UDP listener for DTLS here.
"dtls_port": 8008, // advertise that this server runs a UDP listener for DTLS here.

Also, couldn't it be possible here to specify a host+port for the endpoint seperately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating if that information should live in .well-known as well... Not sure.

"cbor_enum_version": 1, // which table is used for integer keys. This proposal is version 1. Omission indicates no support.
"coap_enum_version": 1, // which table is used for coap path enums. This proposal is version 1. Omission indicates no support.
"observe": 1, // Set to 1 if /sync supports OBSERVE. Omission indicates no support for OBSERVE.
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add enum shorthands for these fields as well?

Copy link
Member Author

@kegsay kegsay Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's tricky, because if the client doesn't support the enums then they can't read the fields to determine the versions of things they may support e.g CoAP path enums. So probably not, though that would be a good reason to make these keys shorter!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the enum is backwards compatible and these values are defined in the first version, it should work I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client could cache the response of this for longer time, if low-bandwidth is enabled and it isn't getting errors back from the server; so trying to save every single byte on the version response doesn't make much sense to soru

@chrysn
Copy link

chrysn commented Sep 12, 2021

Have alternatives to the dedicated CoAP option for the bearer token been considered? As far as I understand the common CoAP practices, bearer tokens (if used at all; they're often discouraged but probably unavoidable here) would rather be associated with the security context (not sure how in a DTLS context; in OSCORE they'd be exchanged in the EDHOC phase) and then not transported in every message any more at all. (Clients that really do use two accounts would then use distinct secure connections for them).

@kegsay
Copy link
Member Author

kegsay commented Sep 13, 2021

There's not many options available to us as you've pointed out. I'm effectively associating the bearer token with the connection as it's sticky:

In addition, this Option MAY be omitted on subsequent requests, in which case the server MUST use the last used value for this Option. This reduces bandwidth usage by only sending the access token once per connection.

https://github.com/matrix-org/matrix-doc/blob/kegan/low-bandwidth/proposals/3079-low-bandwidth-csapi.md#access-tokens

Which I feel meets the security properties you're concerned about.

@chrysn
Copy link

chrysn commented Sep 13, 2021

The stickiness does alleviate the message overhead, but the message would need to have a different number then because it's not safe-to-forward any more; its least significant bits should probably be 0b11 (see RFC7252 Section 5.4; also setting the Critical bit as a message with this option number couldn't be successfully processed by a server that doesn't know the option). The reviewers might ask you to pick a number a bit whole lot higher than 256, as anything below probably 560 is a bit more precious because applications with known used options can make a bit more efficient use of these options. (Exemplary rationale).

As for alternatives (which I'd not rule out yet, and also because this is a topic that comes up often enough that it shoud find a place in the CoAP FAQ), I've kicked off a thread at the CoRE list.

Just to be sure I'm not missing an obvious option, would it be an option to get a CWT / JWT rather than a bearer token? (That could possibly be fed right into the DTLS or EDHOC setup; it'd break the gateway deployment of CoMatrix but that could be fixed elsewhere, and that'd even be advantageous because it'd curb MITM'ing even when the attacker can get a valid certificate as home server).


## Unstable prefix

The `/versions` response should use `org.matrix.msc3079.low_bandwidth` instead of `m.low_bandwidth` whilst the proposal is in review.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about also moving that dict into unstable_features?


## Security considerations

DTLS connections should have the same security guarantees as TLS connections. Weak ciphers should not be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely vague, some examples or links to articles which give a better opinion on this would be appreciated.

Copy link
Contributor

@ShadowJonathan ShadowJonathan Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently came across the official IANA TLS Parameters register, which provides a section of cipher suites, which are then annotated with "DTLS Readiness" and "Recommendation", maybe referencing that, and/or providing one or two "must support" ciphers, could work.

Personally, for those "must support" ciphers, i'll recommend;

TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

But i'll also mention TLS_ECDHE_ECDSA_WITH_AES_128_CCM for it's ease of implementation and performance on embedded devices. Note however that it comes with a decreased security guarantee due to the use of CCM, which has been regarded as weak.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended cipher suites change over time which is why I'm not naming any specific ones. I don't mind indirecting to say "if it's on this list then you should probably support it" but I'm wary about mandating any one cipher suite in this document as that is really not the place of this MSC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not mandating, but giving implementation suggestions to start off with for interoperability, such as those specific cipher suites.

Though, maybe this is going too much into detail, I think there's not a lot of DTLS implementations out there to really have this matter all that much, if one could potentially know all of them at once at this point.

proposals/3079-low-bandwidth-csapi.md Show resolved Hide resolved

### HTTP -> CoAP

CoAP is defined in [RFC 7252: Constrained Application Protocol](https://tools.ietf.org/html/rfc7252)
Copy link
Contributor

@ShadowJonathan ShadowJonathan Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoAP defines an NSTART, a max-amount-of-packets-in-flight parameter, as 1.

This would effectively limit lb to a packet rate of 1 / RTT, which may make lb not very useful for environments which have a wide long link (think geostationary satellite links, or even moon-to-earth)

Would there be any consideration for a different NSTART? or even a dynamic one? Should this spec allow clients and servers to experiment with this?

As an example, taking a block-wise transfer, with a SZX (size) of 1024 bytes, and an RTT of 300ms, would get me a theoretical transmission rate of 3.5KBps.

Related issue: ruma/lb#6

Copy link
Contributor

@ShadowJonathan ShadowJonathan Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal would be that the client dictates this variable dynamically in the session, with an option at 258 with a uint value, which, when received, changes the NSTART at the server level. This option is Elective (as it is an even number), and thus ignorable by the server.

This would allow the client to detect and adjust the rate at its own pace, knowing its own dynamic network conditions (which are likely to be more variable towards the client than the server, realistically).

A value of 0 could indicate "just send any packet you have in the buffer", this would be useful for already-ordered-and-reliable connections such as websocket, TCP, or others, where the underlying transport already has this covered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra context on this; A IETF Draft from 2013 provides some context and additional information around this problem: draft-bormann-core-congestion-control-02 section 5.2, which mentions the then-draft for block-wise transfer, but also RFC 5348, this document contains some very sophisticated methods of keeping congestion under control, so it may not be practical to spec.

Though, i'll experiment on my own, and try to see if there's a simpler formula able to be applied here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So congestion control for CoAP is a problem we're investigating as there are a few problems with CoAP for Matrix as-is:

  • NSTART as you mention (which would actually prevent Matrix clients working correctly if they were long-polling /sync) as then you would have 2 in-flight conns when you POST/PUT something. This isn't the case for OBSERVE though.
  • Blockwise transfer for large request/response bodies. The fact that you don't have any equivalent concept of a TCP receive window means it takes a long time to send lots of data as you have to ACK each packet before the next packet is sent.

These problems aren't entirely unique to low bandwidth Matrix, so there are papers which try to address this such as:

  • CoAP Simple Congestion Control/Advanced (CoCoA)
  • pCoCoA: A precise congestion control algorithm for CoAP
  • Many others which don't have catchy names.

A lot of this research is ongoing and recent, so at present this MSC is silent on which congestion control algorithm to use. When a good algorithm (for Matrix use cases) is proven, and there is sufficient library support, then we can probably mandate one.

Comment on lines +127 to +129
the option ID 256 as `Authorization` in accordance with the [CoAP Option Registry](https://tools.ietf.org/html/rfc7252#section-12.2)
which mandates that options IDs 0-255 are reserved for IETF review. This Option should have the same Critical,
UnSafe, NoCacheKey, Format and Length values as the `Uri-Query` Option. The `Authorization` value SHOULD omit
Copy link
Contributor

@ShadowJonathan ShadowJonathan Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect and not compliant with the RFC, which define that only odd numbers can be Critical, as Uri-Query is;

An Option is identified by an option number, which also provides some additional semantics information, e.g., odd numbers indicate a critical option, while even numbers indicate an elective option.
Note that this is not just a convention, it is a feature of the protocol: Whether an option is elective or critical is entirely determined by whether its option number is even or odd.

https://datatracker.ietf.org/doc/html/rfc7252#section-5.4.6

Thus, this should probably be 257 instead.

Comment on lines +215 to +216
value for clients to detect a dead connection. This can be shortened by modifying DTLS to send `CloseNotify` alerts when
the UDP socket receives unrecognised data.
Copy link
Contributor

@ShadowJonathan ShadowJonathan Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a few security implications, problems, and nuances, as I note over at ShadowJonathan/DusTLS#7.

There are two issues here, both linked to UDP address spoofing (which is very easy to do);

First, an agent could send garbage data, modifying their own source address, to effectively get an amplification attack out of the server, if it continually sends CloseNotify's to the address it believes is responsible for the garbage data. This can be somewhat mitigated by having a timeout on retransmission of the CloseNotify, but this could make matrix servers (even if so slightly) a vector for DDoS amplification.

Second, this would make MITM resetting connections extremely easy, as a server needs to have a "security context" (active encryption on the connection) to be able to send alerts (of which CloseNotify is one of) securely, this would effectively allow an unsecured CloseNotify packet to control a connection, allowing DoS, combined with the ease of UDP address spoofing, this would be a major vector. This could maybe be mitigated by smart application-level handling of closed connections, and a timeout on how quickly plaintext CloseNotifys can control a connection. The connection itself could be transiently not interrupted by immediately sending a TLS Resume to the server, though then that would become a requirement for any DTLS implementation.

While both could be mitigated, I want to urge that these effectively turn the underlying DTLS layer into an attack vector, so these should really be also noted in the Security Considerations section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make some good points. I don't really see an alternative here though. When the server gets restarted, the client doesn't know this. The client sends encrypted packets and then gets no response. Is that because of network issues or is that because the server has no idea what you're talking about anymore? When does the client decide to send a TLS Resume? Waiting for the timeout takes a very long time, which degrades the user experience significantly.

In addition, OBSERVE subscriptions are lost when the connection is interrupted in this way (as the subscriptions are all in-memory) so it becomes difficult to know if a resubscription is required or not. I don't have any good solutions here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay to opt-in to a "have DTLS implementations close connections upon receiving a CloseNotify from the server", but the tradeoffs should be weighed, so i'm basically only asking for this security angle to be repeated in the security considerations sector, with some warnings to implementations to not allow someone to spam CloseNotifys without any realistic interval in between, then i believe this has a proper tradeoff with the problem.

@kegsay kegsay removed the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Dec 17, 2021
@witchent
Copy link

Is this abandoned? I remember it being nearly ready a few years back, yet everything related seems to be stale.

@Saiv46
Copy link

Saiv46 commented Feb 29, 2024

@witchent According to blog post:

In practice, further work on low-bandwidth Matrix is dependent on finding a sponsor who's willing to fund the team to focus on this, as otherwise it's hard to justify spending time here in addition to all the less exotic business-as-usual Matrix work that we need to keep the core of Matrix evolving (finishing 1.0, finishing E2E encryption, speeding up Synapse, finishing Dendrite, rewriting Riot/Android etc).

Since then Synapse is not much faster, and Dendrite is still in development, so there's still a lot of more important work. I think there are some obstacles for implementing and deploying this spec. For example: E2E messages are encrypted JSON that are encoded as Base64, which sounds not very efficient.

@kegsay
Copy link
Member Author

kegsay commented Mar 14, 2024

Realistically I suspect for fields which represent "bytes" (e.g any base64 unpadded fields) they would be represented literally in CBOR.

@Saiv46
Copy link

Saiv46 commented Mar 22, 2024

Realistically I suspect for fields which represent "bytes" (e.g any base64 unpadded fields) they would be represented literally in CBOR.

@kegsay That's still full-size JSON being encrypted. Anyway I think splitting this MCS (like matrix-org/matrix-spec#1736) would be beneficial as it gives time to evaluate better options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet