Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4095: Bundled URL previews #4095

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
261 changes: 261 additions & 0 deletions proposals/4095-bundled-url-previews.md
@@ -0,0 +1,261 @@
# Bundled URL previews
Currently, URL previews in Matrix are generated on the server when requested by
a client using the [`/_matrix/media/v3/preview_url`](https://spec.matrix.org/v1.9/client-server-api/#get_matrixmediav3preview_url)
endpoint. This is a relatively good approach, but a major downside is that the
user's homeserver gets all links the user's client wants to show a preview for,
which means using it in encrypted rooms will effectively leak parts of messages.

## Proposal
The proposed solution is allowing clients to bundle URL preview metadata inside
events.

A new field called `m.url_previews` is added. The field is an array of objects,
tulir marked this conversation as resolved.
Show resolved Hide resolved
where each object contains OpenGraph data representing a single URL to preview,
similar to what the `/preview_url` endpoint currently returns:

* `matrix:matched_url` - The URL that is present in `body` and triggered this preview
to be generated. This is optional and should be omitted if the link isn't
present in the body.
* `matrix:image:encryption` - An [EncryptedFile](https://spec.matrix.org/v1.9/client-server-api/#extensions-to-mroommessage-msgtypes)
object for encrypted thumbnail images. Similar to encrypted image messages,
the URL is inside this object, and not in `og:image`.
* `matrix:image:size` - The byte size of the image, like in `/preview_url`.
* `og:image` - An `mxc://` URI for unencrypted images, like in `/preview_url`.
* `og:url` - Standard OpenGraph tag for the canonical URL of the previewed page.
* Any other standard OpenGraph tags.

At least one of `matrix:matched_url` and `og:url` MUST be present. All other
fields are optional.

tulir marked this conversation as resolved.
Show resolved Hide resolved
### Extensible events
The definition of `matrix:matched_url` changes from "present in `body`" to
"present in `m.text`", but otherwise the proposal is directly compatible with
extensible events.

### Client behavior
#### Sending preview data
When sending previews to encrypted rooms, clients should encrypt preview images
and put them in the `matrix:image:encryption` field. Other `og:image:*` and the
`matrix:image:size` field can still be used for image metadata, but the
`og:image` field should be omitted for encrypted thumbnails.

If clients use the `/preview_url` endpoint as a helper for generating preview
data, they should reupload the thumbnail image (if there is one) to create a
persistent `mxc://` URI, as well as encrypt it if applicable. A future MSC
could also extend `/preview_url` with a parameter to request a persistent URI.

#### Receiving messages with `m.url_previews`
If an object in the list contains only `matrix:matched_url` and no other fields,
receiving clients should fall back to the old behavior of requesting a preview
using `/preview_url`. Clients may also choose to ignore bundled data and ask
the homeserver for a preview even if bundled data is present.

Clients should not search the `body` field for URLs if the `m.url_previews`
field is present, even if they fall back to the old behavior of requesting
preview data from the homeserver. Conversely, if the field is not present,
clients should fall back to the searching behavior.

The two above points effectively make this an alternative for
[MSC2385](https://github.com/matrix-org/matrix-spec-proposals/pull/2385).

### Examples
<details>
<summary>Normal preview</summary>

```json
{
"type": "m.room.message",
"content": {
"msgtype": "m.text",
"body": "https://matrix.org",
"m.url_previews": [
{
"matrix:matched_url": "https://matrix.org",
"matrix:image:size": 16588,
"og:description": "Matrix, the open protocol for secure decentralised communications",
"og:image": "mxc://maunium.net/zeHhTqqUtUSUTUDxQisPdwZO",
"og:image:height": 400,
"og:image:type": "image/jpeg",
"og:image:width": 800,
"og:title": "Matrix.org",
"og:url": "https://matrix.org/"
}
],
"m.mentions": {}
}
}
```

</details>
<details>
<summary>Preview with encrypted thumbnail image</summary>

```json
{
"type": "m.room.message",
"content": {
"msgtype": "m.text",
"body": "https://matrix.org",
"m.url_previews": [
{
"matrix:matched_url": "https://matrix.org",
"og:url": "https://matrix.org/",
"og:title": "Matrix.org",
"og:description": "Matrix, the open protocol for secure decentralised communications",
"matrix:image:size": 16588,
"og:image:width": 800,
"og:image:height": 400,
"og:image:type": "image/jpeg",
"matrix:image:encryption": {
"key": {
"k": "GRAgOUnbbkcd-UWoX5kTiIXJII81qwpSCnxLd5X6pxU",
"alg": "A256CTR",
"ext": true,
"kty": "oct",
"key_ops": [
"encrypt",
"decrypt"
]
},
"iv": "kZeoJfx4ehoAAAAAAAAAAA",
"hashes": {
"sha256": "WDOJYFegjAHNlaJmOhEPpE/3reYeD1pRvPVcta4Tgbg"
},
"v": "v2",
"url": "mxc://beeper.com/53207ac52ce3e2c722bb638987064bfdc0cc257b"
}
}
],
"m.mentions": {}
}
}
```

</details>
<details>
<summary>Message indicating it should not have any previews</summary>

```json
{
"type": "m.room.message",
"content": {
"msgtype": "m.text",
"body": "https://matrix.org",
"m.url_previews": [],
"m.mentions": {}
}
}
```

</details>
<details>
<summary>Message indicating a preview should be fetched from the homeserver</summary>

```json
{
"type": "m.room.message",
"content": {
"msgtype": "m.text",
"body": "https://matrix.org",
"m.url_previews": [
{
"matrix:matched_url": "https://matrix.org"
}
],
"m.mentions": {}
}
}
```

</details>
<details>
<summary>Preview in extensible event</summary>

```json
{
"type": "m.message",
"content": {
"m.text": [
{"body": "matrix.org/support"}
],
"m.url_previews": [
{
"matrix:matched_url": "matrix.org/support",
"matrix:image:size": 16588,
"og:description": "Matrix, the open protocol for secure decentralised communications",
"og:image": "mxc://maunium.net/zeHhTqqUtUSUTUDxQisPdwZO",
"og:image:height": 400,
"og:image:type": "image/jpeg",
"og:image:width": 800,
"og:title": "Support Matrix",
"og:url": "https://matrix.org/support/"
}
],
"m.mentions": {}
}
}
```

</details>

## Potential issues
### Fake preview data
The message sender can fake previews quite trivially. This is considered an
acceptable compromise to achieve non-leaking URL previews in encrypted rooms.

Clients may choose to ignore embedded preview data in unencrypted rooms and
always use the `/preview_url` endpoint.
tulir marked this conversation as resolved.
Show resolved Hide resolved

### More image uploads
Currently previews are generated by the server, which lets the server apply
caching and delete thumbnail images quickly. If the data was embedded in events
instead, the server would not be able to clean up images the same way.

### Web clients
Web clients likely can't generate previews themselves due to CORS and other
such protections.

Clients could use the existing URL preview endpoint to generate a preview and
bundle that data in events, which has the benefit of only leaking the link to
one homeserver (the sender's) instead of all servers. When doing this, clients
would have to download the preview image and reupload it to get a persistent
`mxc://` URI, and possibly encrypt it before uploading.
tulir marked this conversation as resolved.
Show resolved Hide resolved

Alternatively, clients could simply not include preview data at all and have
receiving clients fall back to the old behavior (meaning no previews in
encrypted rooms unless the receiver opts in).

## Security considerations
Fake preview data as covered in potential issues.
Copy link
Member

@ara4n ara4n Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth calling out some more explicit security concerns here:

If the sender doen't use its server's /preview_url endpoint as a helper:

  • This will leak the sending client's IP to the URL they are previewing.
  • The client will need to be careful not to let itself get pwned by malicious content at that URL (e.g. XML parsing exploits in the HTTP library; billion lol attacks...)
  • The client should be very careful not to preview URLs provided by other users - e.g. when replying to a message or quoting it, to stop an attacker sending a malicious URL to a user in order to discover their IP or otherwise pwn them.
    • Concretely, we don't want a world where you receive spam saying "Click reply to this message to win $20M!!! https://evil.com", where the act of replying generates a preview to https://evil.com which then harvests your IP and serves you a malformed image in its URL preview thumbnail which then pwns your app
    • Another concrete attack could be sending a user a malicious URL (hidden in a hyperlink, perhaps? or hidden by mangled UTF sequences) which hits an RFC1918 address on their network to attack them - https://192.168.0.1/ or whatever, and encouraging the user to reply to or quote the msg

One might also want to require an allowlist of IPs the sender's spider is allowed to hit anyway, to try to avoid disasters where users are social-engineered into sending malicious URLs in general, which they never click on, but still get 'clicked on' by the URL previewer, causing chaos.

I'm sure there are a bunch more attack vectors here too...


## Alternatives
tulir marked this conversation as resolved.
Show resolved Hide resolved
### Different generation methods
Previews could be generated by the receiving client, which both doesn't leak
links to the user's homeserver, and prevents fake previews. However, this would
leak the user's IP address to all links they receive, so it is not an
acceptable solution.

The original design notes for URL previews from 2016 also has a list of options
that were considered at the time: <https://github.com/matrix-org/matrix-spec/blob/main/attic/drafts/url_previews.md>.
Option 2 is what was implemented then, and this proposal adds option 4.
The combination of options 2 and 4 is also mentioned as the probably best
solution in that document.

The document also mentions the possibility of an AS or HS scanning messages and
injecting preview data, but that naturally won't function with encryption at all,
and is therefore not an alternative.

The fifth option mentioned in the document, a centralized previewing service
which is configured per-room, could technically work, but would likely be worse
than HS-generated previews in practice: users wouldn't know to configure a
different previewing service, so clients would probably have to automatically
pick one.

## Unstable prefix
Until this MSC is accepted, implementations should apply the following renames:

* `com.beeper.linkpreviews` instead of `m.url_previews`
* `beeper:image:encryption` instead of `matrix:image:encryption`
* `matched_url` instead of `matrix:matched_url`
tulir marked this conversation as resolved.
Show resolved Hide resolved
* note: this was implemented without a prefix before the MSC was made, which
is why the "unstable prefix" is no prefix in this case.