Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2278: Deleting attachments for expired and redacted messages #2278

Open
wants to merge 12 commits into
base: old_master
Choose a base branch
from
120 changes: 120 additions & 0 deletions proposals/2278-deleting-content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Proposal for deleting content for expired and redacted messages

## Overview

[MSC1763](https://https://github.com/matrix-org/matrix-doc/pull/1763) proposes
the `m.room.retention` state event for defining how aggressively servers
should purge old messages for a given room.

It originally also specified how media for purged events should be purged from
ara4n marked this conversation as resolved.
Show resolved Hide resolved
disk, however this was split out into a new MSC [by
request](https://github.com/matrix-org/matrix-doc/pull/1763#discussion_r320289119)
during review.

## Proposal

We handle encrypted & unencrypted rooms differently. Both require an API to
delete content from the local media repo (bug
[#790](https://github.com/matrix-org/matrix-doc/issues/790)), for which we
propose:

```
DELETE /_matrix/media/r0/download/{serverName}/{mediaId}
ara4n marked this conversation as resolved.
Show resolved Hide resolved
```
with a JSON dict as a request body.
ara4n marked this conversation as resolved.
Show resolved Hide resolved

The API would returns:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* `200 OK {}` on success
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* `403` with error `M_FORBIDDEN` if invalid access_token or not authorised to delete.
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* `404` with error `M_NOT_FOUND` if the content described in the URL does not exist on the local server.

The user must be authenticated via access_token or Authorization header as the
original uploader, or however the server sees fit in order to delete the content.

Servers may wish to quarantine the deleted content for some timeframe before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably spec the quarantine API (in a different MSC, in the future, eventually)

actually purging it from storage, in order to mitigate abuse.

XXX: We might want to provide an undelete API too to let users rescue
ara4n marked this conversation as resolved.
Show resolved Hide resolved
their content that they accidentally deleted, as you would get on a
typical desktop OS file manager. Perhaps `DELETE` with `{ undo: true }`?

XXX: We might also want to let admins quarantine rather than delete attachments
without a timelimit by passing `{ quarantine: true }` or similar.

Server admins may choose to mark some content as undeletable in their
implementation (e.g. for sticker packs and other content which should never be
deleted or quarantined.)

### Encrypted rooms

There is no way for server to know what events refer to which MXC URL, so we
leave it up to the client to DELETE any MXC URLs referred to by an event after
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this: assuming that there is more than one client in a given room, which has responsibility for making the DELETE request? I guess it has to be a client belonging to the original uploader, but what if they go away/stop watching the room/etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be the original uploader, but indeed, that doesn't help if someone else redacts their event for them and they don't come back and finish it off by deleting the media.

it expires or redacts its local copy of an event.

We rely on the fact that MXC URLs should not be reused between encrypted
events, as we expect each event to have different message keys to avoid
correlation. As a result, it should be safe to assume each attachment has
only one referring event, and so when a client deems that the event should
be deleted, it is safe to also delete the attachment without breaking any
other events.

It seems reasonable to consider the special case of forwarding encrypted
ara4n marked this conversation as resolved.
Show resolved Hide resolved
attachments between rooms as an a 'copy by reference' - if the original
ara4n marked this conversation as resolved.
Show resolved Hide resolved
event gets deleted, the others should too. If this isn't desired, then
the attachment should be reencrypted.

### Unencrypted rooms

It's common for MXC URLs to be shared between unencrypted events - e.g. reusing
sticker media, or when forwarding messages between rooms, etc. In this instance,
the homeserver (not media server) should count the references to a given MXC URL
ara4n marked this conversation as resolved.
Show resolved Hide resolved
ara4n marked this conversation as resolved.
Show resolved Hide resolved
by events which refer to it.

If all events which refer to it have been purged or redacted, the HS should delete
the attachment - either by internally deleting the media, or if using an
external media repository, by calling the DELETE api upon it.

If a new event is received over federation which refers to a deleted
attachment, then the server should operate as if it has never heard of that
attachment; pulling it in over federation from whatever the source server is.
This will break if a remote server sends an event referring to a local
MXC URL which may have been deleted, so don't do that - clients on servers
should send MXC URLs which refer to their local server, not remote ones.

This means that if the local server chooses to expire the source event sooner
than a remote server does, the remote server might end up not being able to
sync the media from the local server and so display a broken attachment.
This feels fairly reasonable; if you don't want people to end up with 404s
on attachments, you shouldn't go deleting things.

In the scenario of (say) a redacted membership event, it's possible that the
refcount of an unwanted avatar might be greater than zero (due to the avatar
being referenced in multiple rooms), but the room admin may want to still
purge the content from their server. This can be achieved by DELETEing the
content independently from redacting the membership events.

## Tradeoffs

Assuming that encrypted events don't reuse attachments is controversial but
hopefully acceptable. It does mean that stickers in encrypted rooms will end
up getting re-encrypted/decrypted every time, but is hopefully be acceptable
ara4n marked this conversation as resolved.
Show resolved Hide resolved
given the resulting improvement in privacy.

An alternative approach to solving the problem of attachment reuse could be to
expect clients to somehow 'touch' uploaded local attachments whenever they
send an event which refers to them - effectively renewing their retention
lifetime. However, in E2EE rooms this ends up leaking which events refer to
which attachments (or at least claim to), and also gives a vector for abuse
where malicious client could bypass the retention schedule by repeatedly
retouching a file to keep it alive.

## Security considerations

Media repo implementations might want to use `srm` or a similar secure
deletion tool to scrub deleted data off disk.

If the same attachment is sent multiple times across encrypted events (even if
encrypted separately per event), it's worth noting that the size of the
encrypted attachment and associated traffic patterns will be an easy way to
identify attachment reuse (e.g. who's forwarding a sensitive file to each
other).