Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3910: Content tokens for media #3910

Closed
wants to merge 3 commits into from

Conversation

richvdh
Copy link
Member

@richvdh richvdh commented Oct 14, 2022

@turt2live turt2live added proposal A matrix spec change proposal s2s Server-to-Server API (federation) client-server Client-Server API security kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Oct 14, 2022
* There is currently no way to delete
media. [matrix-spec#226](https://github.com/matrix-org/matrix-spec/issues/226)
* If a user requests GDPR erasure, their media remains visible to all.
* When all users leave a room, their media is not deleted from the server.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says: fwiw, while the spec allows servers to GC rooms, Synapse currently doesn't (no server does, I believe)

@richvdh says: yes, this is matrix-org/synapse#4720.

}
}
```
* Encrypted media messages must include the `content_token` in the cleartext:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

For clarity, would we do the same as relations and say that if the payload also has a content_token that it is ignored? ie: Client sends content_token in payload but not cleartext, or sends it in both.

@richvdh says:

Same as relations, surely.

Comment on lines +106 to +108
| [`GET /_matrix/media/v3/download/{serverName}/{mediaId}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3downloadservernamemediaid) | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}` | `GET /_matrix/federation/v1/media/download/{serverName}/{mediaId}` |
| [`GET /_matrix/media/v3/download/{serverName}/{mediaId}/{fileName}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3downloadservernamemediaid) | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}/{fileName}` | N/A |
| [`GET /_matrix/media/v3/thumbnail/{serverName}/{mediaId}`](https://spec.matrix.org/v1.4/client-server-api/#get_matrixmediav3thumbnailservernamemediaid) | `GET /_matrix/client/v1/media/thumbnail/{serverName}/{mediaId}` | `GET /_matrix/federation/v1/media/thumbnail/{serverName}/{mediaId}` |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

(origin,mediaId) tuple - media ID is not globally unique on its own.

... but I can't figure out what he's referring to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhere in the doc it said "look up the media in the cache by media ID" - I think HackMD was trying to be helpful and put the comment in a weird place.

Comment on lines +110 to +111
[Question: should we move `/config`, `/upload` and `/preview_url` while
we're at it, per [MSC1902](https://github.com/matrix-org/matrix-spec-proposals/pull/1902)?]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

yes please, though we should probably leave splitting endpoints out to its own MSC anyways - this to-be MSC could depend on MSC1902.

Context: The title of the MSC imposes "problems" on what can be included in that MSC. Though relevant to the MSC itself, it's not obvious from the title that we'd also be changing where the endpoints live (and making the title a paragraph is really just crap). Suggestion is always to just split out functionality when one would add "and" to the title.

@richvdh says:

The problem with having separate MSCs is that, if we land MSC1902 without the auth changes, then we need another way to distinguish clients that support auth from those that don't.

In this proposal, we can tell when it's safe to turn off the backwards-compatibility "you can get media without auth" behaviour just by looking at the access logs and seeing who is hitting the old endpoint. If we extend existing endpoints, that gets way harder.


**Two** authorization headers must be given when calling these endpoints:

* `Authorization` must be set the same as for any other CSAPI or federation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

nit: clients can still auth with ?access_token, but obviously discouraged. We should probably just kill that feature in the spec (independent of this MSC)

`X-Matrix-Content-Token` header, and returns either the media or a 403
error.

If the media is uncached, the user's server makes a request to `GET
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uhoreg says:

Would it make sense to first check that you have an event that references that content_token before downloading remote media?

@richvdh says:

🤔 I guess it might

Comment on lines +231 to +234
The bridge might also choose to embed information such as the room that
referenced the media, and the time that the link was generated, in the URL.
(Such information would require an HMAC to prevent tampering.) This could be
used to help control access to the media.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

Sending a meta event into the room would be important for the server to flag a "usage" of the content, otherwise it'd be subject to expiration/GC.

(This was previously discussed in
[MSC2858](https://github.com/matrix-org/matrix-spec-proposals/pull/2858#discussion_r543513811).)

## Potential issues
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

With the proposal above, the origin server for a piece of media could count a piece of media (it's original copy) as "unreferenced" if all visible copies are dereferenced. Specifically, if a user on alice.com sends something to a room and a user from bob.com forwards that to a room visible to charlie.com, alice.com could eventually consider the media unreferenced and charlie.com users wouldn't be able to download it.

(This was previously discussed in
[MSC2858](https://github.com/matrix-org/matrix-spec-proposals/pull/2858#discussion_r543513811).)

## Potential issues
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

Issue 2: Custom emoji and such rely on <img /> tags in the message HTML - clients (and the spec) generally require that the img src be an MXC URI - presumably we just add a mx_content_token attribute or something to fix this case.

Comment on lines +178 to +180
b. Backwards compatibility with older clients and federating servers:
servers may for a short time choose to allow unauthenticated access via the
deprecated endpoints, even for media with an associated `content_token`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live says:

I'm not sure this would be needed? The origin server (over federation) should still support the request if it also produced a content_token.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a situation where @alice:orgin.com is using a new server, and has uploaded some media. The origin.com server now needs to deal with two sets of legacy implementations:

  • @bob:origin.com, who is using an old client which doesn't know how to use content tokens.
  • @charlie:remote.com, where remote.com is an old server implementation. Whether or not charlie's client understands content tokens, remote.com isn't going to forward them.

In either case, origin.com will receive a request to the deprecated endpoint, and may choose to honour it (for now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere it was clarified that this was talking about old clients talking to old servers in rooms where "new" media is posted - this wasn't immediately clear from the text, so I think it just needs a relatively minor clarification?

use the server) *and* the `content_token` (to prove they have the right to see
the media in question).

Servers can count the number of references to a given `content_token`, and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Servers can count the number of references to a given `content_token`, and
Servers should count the number of references to a given `content_token`, and

This should hopefully make things better for every server?

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we patch the metadata leaks later with another MSC for example?
Because with P2P we will eventually want to minimize the metadata leaked by the protocol

Is more metadata leaked than before this MSC?

Comment on lines +83 to +95
* And `m.room.member` must include an `avatar_content_token`:
```json
{
"type": "m.room.member",
"state_key": "@alice:example.org",
"content": {
"avatar_url": "mxc://example.org/AQwafuaFswefuhsfAFAgsw",
"avatar_content_token": "hunter2",
"displayname": "Alice Margatroid",
"membership": "join"
}
}
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also m.presence

"body": "filename.jpg",
"msgtype": "m.image",
"url": "mxc://example.com/AQwafuaFswefuhsfAFAgsw",
"content_token": "hunter2"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted by @uhoreg: a server doesn't, in general, know which event types reference media, and where in their structure the media url would be.

There might also be more than one media item. So instead, we have to define content_token to be a map at the top level of the event, with keys being the hash of an mxc url (to uniquely identify the media, without leaking the mxc uri outside encrypted events).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't neccesarily solve the concern about media embedded into the content of an event, like custom emoji: #3910 (comment)

(there's no key on the json for such media references)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a problem. It doesn't need a key in the json for the media references; we can still include a content_token map with inline images. e.g.

{
  "body": "some fallback",
  "formatted_body": "<img src=\"mxc://foo/bar\">",
  "content_token": {
    "hash_of_mxc://foo/bar": "token"
  }
}

Comment on lines +293 to +312
* Rather than trust users not to pass the content_token around, we could tie the media more
strongly to an event, and then check that the user has access to the event. For example:

* Uploading does not itself make the content visible by *anyone*; it does return a nonce
* Sending events takes a `media_nonce` param. The media gets associated with
the event and becomes visible to anyone who can see the event. The nonce
gets invalidated.
* For profile pics, the media gets duplicated for each m.room.member event it
ends up in.
* When the media is served over federation, we check if anyone on the server
has a right to see it (as we would with its associated event), and if so,
include the associated event in the response. The requesting server then
filters further.

(credit: based on ideas at
https://cryptpad.fr/code/#/2/code/view/oWjZciD9N1aWTr1IL6GRZ0k1i+dm7wJQ7juLf4tJRoo/)

Problems with this approach:
* Forwarded media would require either a re-upload, or a server-side duplication.
(Note that matrix-media-repo has a [LocalCopy](https://github.com/turt2live/matrix-media-repo/blob/master/api/unstable/local_copy.go#L19) implementation for this.)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now written up something much like this at MSC3911.

Comment on lines +332 to +334
* Put the `content_token` within the encrypted content, for encrypted data.
This reduces the metadata leak, but means that none of the automated media
removal functionality works.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think putting the content_token in the cleartext actually solves the problem for automated media removal. And I don't think it can, fundamentally, because we can't force clients to cooperate in the metadata leak.

A non-cooperating client can always:

  1. Implement support for content_token within the encrypted payload.
  2. Upload the media.
  3. Reference it from one or more junk catch-all rooms in the cleartext, to ensure the server considers it referenced.
  4. Put content_token in the encrypted payload for all real references.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement support for content_token within the encrypted payload.

Well, that won't help the non-cooperating client if it wants to talk to cooperating clients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True... Which seems like a mess? Game theoretically speaking, the cooperating clients would lose out on some messages compared to non-cooperating clients worried about metadata leaks, so the equilibrium would tend towards all clients supporting both methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess so.

Reference it from one or more junk catch-all rooms in the cleartext, to ensure the server considers it referenced.

this won't help unless both rooms have similar servers in, though?

... Anyway, I think MSC3911 was a better idea than this msc. It's probably not worth spending too much time on.

@richvdh
Copy link
Member Author

richvdh commented Nov 4, 2022

I'm going to close this as I think MSC3911/MSC3916 is a better solution.

@richvdh richvdh closed this Nov 4, 2022
@turt2live turt2live added the obsolete A proposal which has been overtaken by other proposals label Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API hacktoberfest-accepted kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. obsolete A proposal which has been overtaken by other proposals proposal A matrix spec change proposal s2s Server-to-Server API (federation) security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants