MSC3910: Content tokens for media #3910

richvdh · 2022-10-14T21:13:49Z

richvdh · 2022-10-14T21:27:25Z

proposals/3910-content-tokens-for-media.md

+* There is currently no way to delete
+  media. [matrix-spec#226](https://github.com/matrix-org/matrix-spec/issues/226)
+* If a user requests GDPR erasure, their media remains visible to all.
+* When all users leave a room, their media is not deleted from the server.


@turt2live says: fwiw, while the spec allows servers to GC rooms, Synapse currently doesn't (no server does, I believe)

@richvdh says: yes, this is matrix-org/synapse#4720.

richvdh · 2022-10-14T21:28:27Z

proposals/3910-content-tokens-for-media.md

+       }
+     }
+     ```
+   * Encrypted media messages must include the `content_token` in the cleartext:


@turt2live says:

For clarity, would we do the same as relations and say that if the payload also has a content_token that it is ignored? ie: Client sends content_token in payload but not cleartext, or sends it in both.

@richvdh says:

Same as relations, surely.

richvdh · 2022-10-14T21:30:04Z

proposals/3910-content-tokens-for-media.md

+   | [`GET /_matrix/media/v3/download/{serverName}/{mediaId}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3downloadservernamemediaid)            | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}`            | `GET /_matrix/federation/v1/media/download/{serverName}/{mediaId}`  |
+   | [`GET /_matrix/media/v3/download/{serverName}/{mediaId}/{fileName}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3downloadservernamemediaid) | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}/{fileName}` | N/A                                                                 |
+   | [`GET /_matrix/media/v3/thumbnail/{serverName}/{mediaId}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3thumbnailservernamemediaid)          | `GET /_matrix/client/v1/media/thumbnail/{serverName}/{mediaId}`           | `GET /_matrix/federation/v1/media/thumbnail/{serverName}/{mediaId}` |


@turt2live says:

(origin,mediaId) tuple - media ID is not globally unique on its own.

... but I can't figure out what he's referring to.

somewhere in the doc it said "look up the media in the cache by media ID" - I think HackMD was trying to be helpful and put the comment in a weird place.

richvdh · 2022-10-14T21:31:19Z

proposals/3910-content-tokens-for-media.md

+   [Question: should we move `/config`, `/upload` and `/preview_url` while
+   we're at it, per [MSC1902](https://github.com/matrix-org/matrix-spec-proposals/pull/1902)?]


@turt2live says:

yes please, though we should probably leave splitting endpoints out to its own MSC anyways - this to-be MSC could depend on MSC1902.

Context: The title of the MSC imposes "problems" on what can be included in that MSC. Though relevant to the MSC itself, it's not obvious from the title that we'd also be changing where the endpoints live (and making the title a paragraph is really just crap). Suggestion is always to just split out functionality when one would add "and" to the title.

@richvdh says:

The problem with having separate MSCs is that, if we land MSC1902 without the auth changes, then we need another way to distinguish clients that support auth from those that don't.

In this proposal, we can tell when it's safe to turn off the backwards-compatibility "you can get media without auth" behaviour just by looking at the access logs and seeing who is hitting the old endpoint. If we extend existing endpoints, that gets way harder.

richvdh · 2022-10-14T21:31:53Z

proposals/3910-content-tokens-for-media.md

+
+   **Two** authorization headers must be given when calling these endpoints:
+
+   * `Authorization` must be set the same as for any other CSAPI or federation


@turt2live says:

nit: clients can still auth with ?access_token, but obviously discouraged. We should probably just kill that feature in the spec (independent of this MSC)

richvdh · 2022-10-14T21:32:51Z

proposals/3910-content-tokens-for-media.md

+   `X-Matrix-Content-Token` header, and returns either the media or a 403
+   error.
+
+   If the media is uncached, the user's server makes a request to `GET


@uhoreg says:

Would it make sense to first check that you have an event that references that content_token before downloading remote media?

@richvdh says:

🤔 I guess it might

richvdh · 2022-10-14T21:33:44Z

proposals/3910-content-tokens-for-media.md

+The bridge might also choose to embed information such as the room that
+referenced the media, and the time that the link was generated, in the URL.
+(Such information would require an HMAC to prevent tampering.) This could be
+used to help control access to the media.


@turt2live says:

Sending a meta event into the room would be important for the server to flag a "usage" of the content, otherwise it'd be subject to expiration/GC.

richvdh · 2022-10-14T21:34:35Z

proposals/3910-content-tokens-for-media.md

+(This was previously discussed in
+[MSC2858](https://github.com/matrix-org/matrix-spec-proposals/pull/2858#discussion_r543513811).)
+
+## Potential issues


@turt2live says:

With the proposal above, the origin server for a piece of media could count a piece of media (it's original copy) as "unreferenced" if all visible copies are dereferenced. Specifically, if a user on alice.com sends something to a room and a user from bob.com forwards that to a room visible to charlie.com, alice.com could eventually consider the media unreferenced and charlie.com users wouldn't be able to download it.

richvdh · 2022-10-14T21:35:28Z

proposals/3910-content-tokens-for-media.md

+(This was previously discussed in
+[MSC2858](https://github.com/matrix-org/matrix-spec-proposals/pull/2858#discussion_r543513811).)
+
+## Potential issues


@turt2live says:

Issue 2: Custom emoji and such rely on <img /> tags in the message HTML - clients (and the spec) generally require that the img src be an MXC URI - presumably we just add a mx_content_token attribute or something to fix this case.

richvdh · 2022-10-14T21:36:16Z

proposals/3910-content-tokens-for-media.md

+   b. Backwards compatibility with older clients and federating servers:
+   servers may for a short time choose to allow unauthenticated access via the
+   deprecated endpoints, even for media with an associated `content_token`.


@turt2live says:

I'm not sure this would be needed? The origin server (over federation) should still support the request if it also produced a content_token.

Consider a situation where @alice:orgin.com is using a new server, and has uploaded some media. The origin.com server now needs to deal with two sets of legacy implementations:

@bob:origin.com, who is using an old client which doesn't know how to use content tokens.

@charlie:remote.com, where remote.com is an old server implementation. Whether or not charlie's client understands content tokens, remote.com isn't going to forward them.

In either case, origin.com will receive a request to the deprecated endpoint, and may choose to honour it (for now).

Somewhere it was clarified that this was talking about old clients talking to old servers in rooms where "new" media is posted - this wasn't immediately clear from the text, so I think it just needs a relatively minor clarification?

ghost · 2022-10-16T17:12:53Z

proposals/3910-content-tokens-for-media.md

+use the server) *and* the `content_token` (to prove they have the right to see
+the media in question).
+
+Servers can count the number of references to a given `content_token`, and


Suggested change

Servers can count the number of references to a given `content_token`, and

Servers should count the number of references to a given `content_token`, and

This should hopefully make things better for every server?

ghost

Could we patch the metadata leaks later with another MSC for example?
Because with P2P we will eventually want to minimize the metadata leaked by the protocol

Is more metadata leaked than before this MSC?

richvdh · 2022-10-16T21:15:59Z

proposals/3910-content-tokens-for-media.md

+   * And `m.room.member` must include an `avatar_content_token`:
+     ```json
+     {
+       "type": "m.room.member",
+       "state_key": "@alice:example.org",
+       "content": {
+         "avatar_url": "mxc://example.org/AQwafuaFswefuhsfAFAgsw",
+         "avatar_content_token": "hunter2",
+         "displayname": "Alice Margatroid",
+         "membership": "join"
+       }
+     }
+     ```


also m.presence

richvdh · 2022-10-16T22:31:49Z

proposals/3910-content-tokens-for-media.md

+         "body": "filename.jpg",
+         "msgtype": "m.image",
+         "url": "mxc://example.com/AQwafuaFswefuhsfAFAgsw",
+         "content_token": "hunter2"


Spotted by @uhoreg: a server doesn't, in general, know which event types reference media, and where in their structure the media url would be.

There might also be more than one media item. So instead, we have to define content_token to be a map at the top level of the event, with keys being the hash of an mxc url (to uniquely identify the media, without leaking the mxc uri outside encrypted events).

this doesn't neccesarily solve the concern about media embedded into the content of an event, like custom emoji: #3910 (comment)

(there's no key on the json for such media references)

I don't think that's a problem. It doesn't need a key in the json for the media references; we can still include a content_token map with inline images. e.g.

{ "body": "some fallback", "formatted_body": "<img src=\"mxc://foo/bar\">", "content_token": { "hash_of_mxc://foo/bar": "token" } }

richvdh · 2022-10-16T22:52:54Z

proposals/3910-content-tokens-for-media.md

+* Rather than trust users not to pass the content_token around, we could tie the media more
+  strongly to an event, and then check that the user has access to the event. For example:
+
+  * Uploading does not itself make the content visible by *anyone*; it does return a nonce
+  * Sending events takes a `media_nonce` param. The media gets associated with
+    the event and becomes visible to anyone who can see the event. The nonce
+    gets invalidated.
+  * For profile pics, the media gets duplicated for each m.room.member event it
+    ends up in.
+  * When the media is served over federation, we check if anyone on the server
+    has a right to see it (as we would with its associated event), and if so,
+    include the associated event in the response. The requesting server then
+    filters further.
+
+  (credit: based on ideas at
+  https://cryptpad.fr/code/#/2/code/view/oWjZciD9N1aWTr1IL6GRZ0k1i+dm7wJQ7juLf4tJRoo/)
+
+  Problems with this approach:
+   * Forwarded media would require either a re-upload, or a server-side duplication.
+     (Note that matrix-media-repo has a [LocalCopy](https://github.com/turt2live/matrix-media-repo/blob/master/api/unstable/local_copy.go#L19) implementation for this.)


I've now written up something much like this at MSC3911.

proposals/3910-content-tokens-for-media.md

dkasak · 2022-11-03T16:47:58Z

proposals/3910-content-tokens-for-media.md

+* Put the `content_token` within the encrypted content, for encrypted data.
+  This reduces the metadata leak, but means that none of the automated media
+  removal functionality works.


I don't think putting the content_token in the cleartext actually solves the problem for automated media removal. And I don't think it can, fundamentally, because we can't force clients to cooperate in the metadata leak.

A non-cooperating client can always:

Implement support for content_token within the encrypted payload.

Upload the media.

Reference it from one or more junk catch-all rooms in the cleartext, to ensure the server considers it referenced.

Put content_token in the encrypted payload for all real references.

Implement support for content_token within the encrypted payload.

Well, that won't help the non-cooperating client if it wants to talk to cooperating clients.

True... Which seems like a mess? Game theoretically speaking, the cooperating clients would lose out on some messages compared to non-cooperating clients worried about metadata leaks, so the equilibrium would tend towards all clients supporting both methods.

Guess so.

Reference it from one or more junk catch-all rooms in the cleartext, to ensure the server considers it referenced.

this won't help unless both rooms have similar servers in, though?

... Anyway, I think MSC3911 was a better idea than this msc. It's probably not worth spending too much time on.

richvdh · 2022-11-04T15:04:31Z

I'm going to close this as I think MSC3911/MSC3916 is a better solution.

Content tokens for media

6d0c962

prior art

e43ce00

richvdh commented Oct 14, 2022

View reviewed changes

fix prior art

221f56a

ghost reviewed Oct 16, 2022

View reviewed changes

richvdh commented Oct 16, 2022

View reviewed changes

turt2live added the hacktoberfest-accepted label Oct 16, 2022

richvdh commented Oct 16, 2022

View reviewed changes

deepbluev7 reviewed Oct 20, 2022

View reviewed changes

proposals/3910-content-tokens-for-media.md Show resolved Hide resolved

dkasak reviewed Nov 3, 2022

View reviewed changes

richvdh closed this Nov 4, 2022

turt2live added the obsolete A proposal which has been overtaken by other proposals label Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSC3910: Content tokens for media #3910

MSC3910: Content tokens for media #3910

richvdh commented Oct 14, 2022 •

edited

Loading

richvdh Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

turt2live Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

richvdh Oct 14, 2022

turt2live Oct 14, 2022

ghost Oct 16, 2022

ghost left a comment •

edited by ghost

Loading

richvdh Oct 16, 2022

richvdh Oct 16, 2022

turt2live Oct 16, 2022

uhoreg Oct 18, 2022

richvdh Oct 16, 2022

dkasak Nov 3, 2022

richvdh Nov 4, 2022

dkasak Nov 4, 2022

richvdh Nov 4, 2022

richvdh commented Nov 4, 2022 •

edited

Loading

		[Question: should we move `/config`, `/upload` and `/preview_url` while
		we're at it, per [MSC1902](https://github.com/matrix-org/matrix-spec-proposals/pull/1902)?]


		Two authorization headers must be given when calling these endpoints:

		* `Authorization` must be set the same as for any other CSAPI or federation

	Servers can count the number of references to a given `content_token`, and
	Servers should count the number of references to a given `content_token`, and

MSC3910: Content tokens for media #3910

MSC3910: Content tokens for media #3910

Conversation

richvdh commented Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost left a comment • edited by ghost Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh commented Nov 4, 2022 • edited Loading

richvdh commented Oct 14, 2022 •

edited

Loading

ghost left a comment •

edited by ghost

Loading

richvdh commented Nov 4, 2022 •

edited

Loading