Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4083: Delta-compressed E2EE file transfers #4083

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Dec 16, 2023

A rough proposal for delta-compressing file transfers, originally written for Third Room, but apparently i never committed it at the time - so submitting it as a MSC for posterity, in the hope that it saves some time in future next time someone wants to do incremental binary updates against a file in Matrix. (@hughns: should we ever get back to the Matrix Files SDK, this might be of interest)

Rendered

A rough proposal for delta-compressing file transfers, originally written
for Third Room, but apparently i never committed it at the time.
@ara4n ara4n added proposal A matrix spec change proposal client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Dec 16, 2023

`GET /_matrix/media/v3/download/matrix/org/n3wv3rs10n?delta_base=mxc://matrix.org/b4s3v3rs10n`

This would return an ordered multipart download of the deltas (once unencrypted, if needed) to apply to the base-version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For encrypted files, how do clients discover the encryption keys for each delta and the base file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just realised the same thing :) i guess this pushes it back towards putting the delta links on the m.file events rather than the content repository, and using aggregations perhaps as a way to grab all the events needed to download a given file.

Copy link
Member Author

@ara4n ara4n Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could be evil and specify the same IV & Key for every event which is a diff on a given file - but calculate the actual IV used to encrypt/decrypt the diff as IV' = H(IV, $content_id). This would mean that diffs have to be created as async uploads so you know their content_id before they can be encrypted by the client though; and the multipart download would have to include content IDs.

I'm not convinced this is better than using an aggregation API to say "give me all the events for the diffs needed to construct this $event_id", and then firing off a tonne of parallel reqs to the media repo to grab the required media files (which is arguably only 2 roundtrips too). But it avoids having to fiddle around with events at all.


* `delta_base` is the mxc URL of the content the delta applies to
* `delta_format` is the file format of the binary diff
* This MSC defines `m.vcdiff.v1.gzip` to describe gzipped RFC3284 compatible binary VCDIFF payloads, picked for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This MSC defines `m.vcdiff.v1.gzip` to describe gzipped RFC3284 compatible binary VCDIFF payloads, picked for
* This MSC defines `m.vcdiff.v1.gzip` to describe gzipped [RFC3284](https://datatracker.ietf.org/doc/html/rfc3284) compatible binary VCDIFF payloads, picked for

computation efficiency rather than patch size (whereas bsdiff + bzip might provide better patch size at worse
computation complexity; other MSCs are welcome to propose different diff formats).

Clients should upload a new snapshot of a piece of content if the sum of the deltas relative to the last snapshot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients must also upload a new snapshot when needed to ensure that secrecy is preserved in encrypted rooms. e.g. if a new user joins, a new snapshot must be uploaded, otherwise the new user would need to be able to decrypt the file state from before they joined the room.

file, and then want to express a small change to it (e.g. using the editor to transform part of the scene graph). Or
you might want to store a change to a markdown or HTML file.

Currently, your only option is to save a whole new copy of the file - or invent your own delta-compression scheme at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you also use an existing delta format like the one used by OTA updates on android and encrypt that separately here? Or is the concern that due to e2ee shenanigans, intermediate deltas are lost here? (I am not saying that this a good approach. Just an alternative that also came to mind for me)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this proposal does propose using an existing delta format (vcdiff - rfc3284) and encrypting the diffs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have been asleep while writing this comment I guess 😱 sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants