Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4016: Streaming E2EE file transfers with random access #4016

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented May 14, 2023

Rendered

  • Needs security review, especially on:
    • IV reuse risk (current proposal concatenates a 32-bit seqnum with a 96-bit IV to get an 128-bit IV per block)
    • Risk of removing hash linkage from Matrix events to content repository items
    • Whether it's adequate to use block IDs as AAD on GCM to avoid the content repository maliciously mislabelling blocks, or does it need something stronger?
    • Is 32-bits of seqnum space sufficient?

Implementations:

Solves matrix-org/matrix-spec#432.
Provides an alternative to MSC #3888.

@ara4n ara4n marked this pull request as draft May 14, 2023 14:32
@ara4n ara4n added proposal A matrix spec change proposal kind:feature MSC for not-core and not-maintenance stuff labels May 14, 2023
@ara4n
Copy link
Member Author

ara4n commented May 14, 2023

(if this happens, it probably obsoletes #3469, which i think only makes sense if you support the off-standard POST Content-Range headers)

@turt2live turt2live added client-server Client-Server API needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels May 14, 2023
* (The only reason Matrix currently uses AES-CTR is that native AES-GCM primitives weren’t widespread enough on Android back in 2016)
* To prevent against reordering attacks, each AES-GCM block has to include an encrypted block header which includes a sequence number, so we can be sure that when we request block N, we’re actually getting block N back.
* XXX: is there still a vulnerability here? Other approaches use Merkle trees to hash the AEADs rather than simple sequence numbers, but why?
* We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek while downloading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re @ara4n (from here):

(if this happens, it probably obsoletes #3469, which i think only makes sense if you support the off-standard POST Content-Range headers)

It does, iff this line implies that all matrix media endpoints must support HTTP Range headers on its media endpoints, which isn't clear from this wording.

(Opinion: the Range headers have more use beyond streamed uploading, as - for example - currently element web waits to download an entire video clip before it begins to play, needing to download the entire file before viewing it, while the backing <video> element and electron-chromium foundation would switch to chunked downloading and streaming automatically, but simply cannot do this, because Synapse / media providers dont support HTTP Range requests, which in the worst case makes it buffer the entire video in memory/temp files before playing it)

ara4n and others added 2 commits June 1, 2023 14:04
@@ -0,0 +1,99 @@
# WIP: MSC4016: Streaming E2EE file transfer with random access and zero latency
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thoughts:

  • Ideally, need to mark a transfer as complete or cancelled (via a relation?).
  • If cancelled, should delete the partial upload (but the partial contents will have already leaked to the other side, of course)

@ara4n ara4n changed the title [WIP] MSC4016: Streaming E2EE file transfers with random access MSC4016: Streaming E2EE file transfers with random access Dec 30, 2023
* As a result, relative to a dedicated file-copying system (e.g. scp) they feel sluggish. For instance, you can’t
incrementally view a progressive JPEG or voice or video file as it’s being uploaded for “zero latency” file
transfers.
* You can’t skip within them without downloading the whole thing (if they’re streamable content, such as an .opus file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(they can if your server supports Range requests - MMR supports this, all other homeservers don't)

Copy link
Member Author

@ara4n ara4n Dec 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMR doesn't support Range headers into downloads which are still being uploaded does it, though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but this point in the MSC implies that Range headers aren't supported anywhere.

Comment on lines +172 to +175
The actual file upload can then be streamed in the request body in the PUT (requires HTTP/2 in browsers). Similarly, the
download can be streamed in the response body. The download should stream as rapidly as possible from the media
server, letting the receiver view it incrementally as the upload happens, providing "zero-latency" - while also storing
the stream to disk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a financial limitation here, at least for media servers using a CDN. The CDN is primarily intended to reduce bandwidth costs, but if the server is being asked to download a piece of media that hasn't finished uploading yet, then the CDN could cache a partial file. The media server is then forced to serve the partial file itself from storage, which may incur additional bandwidth fees. Especially so if the storage is network-operated as well.

For small files (<100mb), the async upload endpoint is probably fine enough. There's no real need for zero latency file transfers because the files are already transferred pretty quickly. For larger files, it's more likely that the file doesn't need to be sent instantly between two parties as there is likely a delay in when the receiver even notices the file being uploaded. This affords the sender some time to finish the entire upload process.

Or in short: the cost of bandwidth outweighs the cost of a "slow" download, imo.

Copy link
Member Author

@ara4n ara4n Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that streaming file transfers don't play that nicely with CDNs (but might be okay - need to actually see how they interact). I don't think i follow the logic here, though: typical use cases I have in mind here are:

  • Send a 8MB photo to someone as a progressive JPEG. They instantly get a file event with a blurhash, and a thumbnail which then streams in as rapidly as it possibly can. If they're near each other, they get a real "wow, this thing is magically fast" warm fuzzy feeling.
  • Send a voice message in broadcast mode (say 1MB, which is 4 minutes of 32kbps opus). Folks can hop in and listen as it broadcasts (and randomly seek around it via Range headers)
  • Similarly for video broadcasts (say 150MB, which is 4 minutes of 5Mbps 720p H.264). Obviously this isn't going to work as well as a WebRTC-based conferencing platform, but i see it as being complementary (and could even interface with it in future, if the SFU published recordings of its streams as streaming files like this)
  • Send someone a big file (e.g. a 650MB ISO) - they can download it as it uploads.

Now, once any upload has completed, then normal CDN semantics can kick in. So, yes: it's possible CDNs won't be able to cache downloads which are still uploading. But I think the financial cost of supporting zero-transfer uploads could be worth it as a value-added feature for the edge-case where people download concurrently with the upload. And if the server admin doesn't want to risk that cost, they can simply turn it off on their media repo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure that I understand the use case for this. I mean, it's cool, but ...

  • For images, m.image already has the thumbnail. Even at small sizes like 800x600 (and 80% JPEG quality!) I struggle to notice much difference on a mobile device. (Admittedly, for younger humans with better eyesight and newer devices the threshold where a thumbnail is "good enough" is probably different, but it still exists.) And while waiting for the thumbnail itself to load, there is BlurHash or ThumbHash.

  • For video, I want something that I can hand off to the platform's native video player and say "Play this", without me as the client having to be intimately involved in wrangling every block of content. I'm not as familiar with audio players, but I assume it's the same? This is why I still want to write an MSC for "HLS over Matrix media". Store the (encrypted) files in the content repository, and in the event you put everything you need to reconstruct an HLS playlist. Hand that playlist to the player, and bam, you're done. The downside here is that HLS encryption kinda sucks -- IIRC it's CBC with no authentication, or something equally outdated -- but it's a security vs usability tradeoff.

  • For sending a big file, this makes sense. But do you often need both low latency and keeping the file forever on the server? For a low-latency point-to-point transfer, can't WebRTC do that?

Copy link
Member Author

@ara4n ara4n Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, these are all fair points. The use case is more: "file transfers appear instantly to the recipient on the sender hitting send, making the app feel magically fast (a bit like instagram's hack of proactively uploading files to the server in the background while the user's still typing the caption)". So not only would the blurhash pop up instantly, but the 800x600 thumbnail would the replace it as immediately as possible, even on crap networks, quite aside from the full-res transfer. Now, totally agreed this is completely finetuning perf and UX, but (amazingly) we're pretty much at the point where this level of UX snappiness and polish is where the battle's at.

For audio and video pseudo-streaming, we could absolutely send a series of M3U-esque playlist updates over instead, which is pretty much what MSC3888 does. However, this does feel a bit fiddly, and I'm not sure that the benefits of being able to format it as real M3U and pass it straight into an HLS player (complete with crappy unauthed CBC encryption) are really worth it. (I guess you could get support for variable bandwidth by including different stream resolutions, though - and you get the potential benefit of commitment hashes, as you mention below). In practice, simply being able to decrypt a single stream of data as per this MSC and pass it into an <audio/> or <video/> tag surprisingly works well for 'casual' streaming (having now done it pre-Matrix, and also in a private MSC4016 test jig)

For file transfer: sure, you could do WebRTC, but having the server relay means that you a) can do one-to-many transfer efficiently, b) you can do resumable uploads to a single place, c) you can do resumable downloads from a single place, d) you don't need the sender & recipient online at the same time, e) you hopefully get CDN for free, f) you don't need to worry about TURN, g) your client needs a webrtc stack. If plain old HTTP file transfers give you this for free already, why not use it?

I'll plonk this all into the alternatives tho.

TODO: We need a way to mark a transfer as complete or cancelled (via a relation?). If cancelled, the sender should
delete the partial upload (but the partial contents will have already leaked to the other side, of course).

TODO: While we're at it, let's actually let users DELETE their file transfers, at last.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this seems best as a dedicated MSC - feels too detached from streaming to include here)

Comment on lines +24 to +26
Relatedly, v2 MXC attachments can't be stream-transferred, even if combined with [MSC2246]
(https://github.com/matrix-org/matrix-spec-proposals/pull/2246), given you won't be able to send the hash in the event
contents until you've uploaded the media.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a v2 MXC attachment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ara4n ara4n marked this pull request as ready for review January 5, 2024 15:56
Comment on lines +129 to +154
* File header with a magic number of: 0x4D, 0x58, 0x43, 0x03 ("MXC" 0x03) - just so `file` can recognise it.
* 1..N blocks, each with a header of:
* a 32-bit field: 0xFFFFFFFF (a registration code to let a parser handle random access within the file
* a 32-bit field: block sequence number (starting at zero, used to calculate the IV of the block, and to aid random
access)
* a 32-bit field: the length in bytes of the encrypted data in this block.
* a 32-bit field: a CRC32 checksum of the block, including headers. This is used when randomly seeking as a
consistency check to confirm that the registration code really did indicate the beginning of a valid frame of
data. It is not used for cryptographic integrity.
* the actual AES-GCM bitstream for that block.
* the plaintext block size can be variable; 32KB is a good default for most purposes.
* Audio streams may want to use a smaller block size (e.g. 1KB blocks for a CBR 32kbps Opus stream will give
250ms of streaming latency). Audio streams should be CBR to avoid leaking audio waveform metadata via block
size.
* The block is encrypted using an IV formed by concatenating the block sequence number of the `file` block with
the IV from the `file` block (forming a 128-bit IV, which will be hashed down to 96-bit again within
AES-GCM). This avoids IV reuse (at least until it wraps after 2^32-1 blocks, which at 32KB per block is
137TB (18 hours of 8k raw video), or at 1KB per block is 4TB (34 years of 32kbps audio)).
* Implementations MUST terminate a stream if the seqnum is exhausted, to prevent IV reuse.
* Receivers MUST terminate a stream if the seqnum does not sequentially increase (to prevent the server from
shuffling the blocks)
* XXX: Alternatively, we could use a 64-bit seqnum, spending 8 bytes of header on seqnums feels like a waste
of bandwidth just to support massive transfers. And we'd have to manually hash it with the 96-bit IV
rather than use the GCM implementation.
* The block is encrypted including the 32-bit block sequence number as Additional Authenticated Data, thus
stopping encrypted blocks from impersonating each other.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's basically AES-GCM with ESSIV, right? That looks pretty reasonable to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes... although i have to admit i hadn't come across ESSIV before :S. It looks to be the same idea though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a bit more thought on this, I think maybe you could do better.

Don't worry about fitting your identifiers directly into the 128 bits that GCM takes as input.

If you look at what ESSIV is actually doing, it's mixing in the key to provide a stronger source of randomness. It's using (a hash of) the key because it was made to operate in a mode where it didn't have a separate IV available -- that's why it needed a synthetic one. Then ESSIV computes a function of the secret and the block number, and uses that as the IV for the cipher.

You can do something similar here.

  • Let N be your nonce (the "iv" in the JSON) from the room event
  • Let IV be the "iv" for the GCM
  • Let b be the block sequence number

Let your nonce N be big enough to serve as a modern cryptographic key, like 256 or 512 bits. Then you can use it as a key to a pseudorandom function on the block sequence number, and use that output as your IV.

Let IV = F(N, b)

Here your PRF F could be HMAC-SHA256 or HMAC-SHA512, truncated to 128 bits to fit into GCM.

--

Is that materially better than what you have already? I have no idea. But it frees you to use an arbitrarily large block sequence number without any significant loss in security.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I suppose you could dispense with the nonce entirely, and use a hash of the key exactly like in ESSIV, in place of my N in the previous comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, thanks for laying out the counterproposal :) i agree it feels better and avoids the tradeoff around block size, although it does complicate the implementation slightly (whereas simply concatenating the 96 bit nonce + 32 block ID to create the 128-bit IV meant that your AES-GCM lib can do the hashing for you). But I guess it's not a big additional complexity and is probably worth it.

That said, I'm a bit more worried about whether i'm missing an attack in terms of no longer linking the attachment hashes to the matrix events - or in terms of the various aes-gcm chunks being independent (other than via their block ID).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I'm a bit more worried about whether i'm missing an attack in terms of no longer linking the attachment hashes to the matrix events

I think you're ok here? The secret random key is the link, right? You don't need a hash because you have something better - the GCM authentication tag thingy that can only be computed from the key.

If the adversary can trick you into accepting bogus data, then either (1) he has your key, (2) you re-used a key - shame on you, (3) you encrypted way too much data and overflowed your block ID - shame on you, or (4) he broke AES-GCM.

(This is the kind of thing where you really want to get someone else to check me on this.)

or in terms of the various aes-gcm chunks being independent (other than via their block ID).

Since the block ID is part of the authenticated data, I think you're OK here too. The adversary can't move a block from one location to another within a file, because the block ID is protected by GCM. The adversary can't move blocks around between files because each file has its own unique key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the adversary can trick you into accepting bogus data, then either (1) he has your key,

I think the attack that's worrying me is that the sender can now switch attachment on you in retrospect on an event - or serve different attachments to different users/servers. (I guess this might be seen as a desirable property in terms of deniability). So for instance, Oscar the opponent sends an m.image into a room, and serves an abusive image to users on matrix.org, but an innocent image to (say) element.io where the moderators live. They then switch back to an innocent image to everyone and claim they're being framed.

They could also try to do this today by sending different dangling events in the room DAG to the user server and the moderator server, but at least this leaves an audit trail (and well-behaved servers will propagate the dangling events to the full mesh when they next send a message).

With unlinked attachments, well-behaved servers could provide an audit trail by tracking a hash of the content they serve their users. However, it doesn't help with a malicious local server which conspires with Oscar to deliberately serve innocent content to mods and malicious content to users. But arguably such a malicious local server could equally spoof CS API traffic on behalf of Oscar, irrespective of DAG integrity, complete with false hashes. So perhaps it's okay?

Eitherway, I think we need to reason through the threat model more carefully...

Copy link
Member Author

@ara4n ara4n Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in terms of the various aes-gcm chunks being independent (other than via their block ID).

The attacks on this one i've been thinking about are:

  1. a malicious server conspiring with Oscar can fabricate different chunks to different users at different times (similar to the previous attack)
  2. a malicious server can serve any block it likes to the user if the user randomly seeks. So, if the user seeks with a Range header to offset 10MB, the server could return some arbitrary high block index, which depending on the file format could be quite misleading - e.g. pretending that a hunk of voice message never happened by refusing to ever serve its blocks. Or even editing out chunks of conversation to give it a completely different meaning, by censoring blocks. Obviously the client could spot these discontinuities, though, so perhaps it's not that problematic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the attack that's worrying me is that the sender can now switch attachment on you in retrospect on an event - or serve different attachments to different users/servers.

Ah, right. Yikes. I was thinking more of the basic "Alice and Bob" model, where the two ends of the conversation are trusted.

So previously the hash in the m.file structure was not just for basic integrity checking. It was also serving as a sort of basic commitment protocol to prevent these Oscar attacks.

For the live-streaming media case, it's hard to see how we could make Oscar commit to some data, when that data maybe doesn't even exist yet.


For comparison, my idea for HLS was pretty janky but I think it avoids this issue. Define some new msgtype like m.video.hls that contains a JSON version of the M3U playlist, with the mxc:// URLs for each media segment. The sender sends the first event containing URLs to the initial media. Then as the stream progresses, the sender uploads the new media segments and sends new m.video.hls events containing the full new playlist, with relation of m.replace pointing back to the original event.

Like I said, janky. But each room event commits to the full set of media content at the time.

Maybe you could do something similar here? Replace the room event every so often with some sort of commitment to the media content?

Comment on lines +150 to +152
* XXX: Alternatively, we could use a 64-bit seqnum, spending 8 bytes of header on seqnums feels like a waste
of bandwidth just to support massive transfers. And we'd have to manually hash it with the 96-bit IV
rather than use the GCM implementation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need a 96-bit "IV" for the file? I am really asking here -- this is a bit beyond the level of my expertise.

(Really it's more of a nonce anyway, but the "iv" key is already there in the existing JSON structure so whatever...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

96-bit is the default for GCM - anything bigger gets hashed down internally to 96-bits. The proposed actual IV used is H(96-bit nonce || 32-bit block ID). I guess we could use H(64-bit nonce || 64-bit block ID) to further reduce chance of IV reuse by block ID wrapping around, but means that the entropy in the IV is significantly reduced to 64-bit. I'm not sure what the best tradeoff is. It always felt questionable that v2 attachments only used 64-bits of entropy in the IV, and this seems like an opportunity to do better.

(i'm calling it an IV even though it's a nonce 'cos that's what webcrypto calls it too :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the block ID's repeat for every file, right? So you don't ever want to have the main "IV" be repeated for two different files.

With 96 bits you should be fine. The birthday bound says you can do up to about 2^48 files with that.

The only question is whether you want to push your luck by allocating a few more bits to the block ID and a few less to the main IV/nonce. Like 80 / 48 instead of 96 / 32. Then you could do up to 2^40 files.

Again, probably not worth bothering about it. 4 TB should be enough for anybody.

Comment on lines +248 to +249
* Variable size blocks could leak metadata for VBR audio. Mitigation is to use CBR if you care about leaking voice
traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. For audio, it's hard to say for sure, but at the window sizes we're talking about here -- like 250 ms -- it's not going to be nearly as much of an issue as when the attacker can see the raw stream of packets of an SRTP stream. In the original attack paper we were looking at like 20ms of data in each Speex VBR frame. But yeah, it's always safer not to chance it.

  2. This is also a concern for VBR video. There was a paper 10-15 years ago where they could identify major movies in encrypted streams just by watching the bit rate. Things like explosions take a lot of data to encode, so they cause a burst in the data rate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for audio i've been using a window size of 20ms, to keep it nice and low latency :) good point for VBR video


The file is uploaded asynchronously using [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246).

The proposed v3 `EncryptedFile` block looks like:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question, but doesn't this mean I'm allowed to stream /dev/null to my friend to fill up their hard drive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you'll have more joy streaming /dev/zero or /dev/random than /dev/null - but yes ;) Just as I could send you an HTTP link right now to a CGI script which cats /dev/zero at you to fill up your HDD :) I guess the point is that sending & receiving servers (and receiving clients) might want to enforce an limit on file transfer size to stop this sort of silliness - will add to the caveats; thanks.

* tus looks to be under consideration by the IETF HTTP working group, so we're hopefully picking the right protocol for
resumable uploads.

## Limitations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another limitation here is that the custom file format means that interop with other E2EE systems (e.g. Android Messages) becomes significantly harder and less likely to work out of the box. At least using a preexisting format like AES-CTR means that interop is easier rather than trying to persuade WhatsApp or Signal or whoever to adopt an entirely novel format like MSC4016.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants