Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC1763: Proposal for specifying configurable message retention periods #1763

Open
wants to merge 37 commits into
base: old_master
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
687b650
first cut of MSC1763 for configurable event retention
ara4n Dec 30, 2018
f770440
ephemeral msging ended up in scope
ara4n Dec 30, 2018
b25367e
fix english
ara4n Dec 30, 2018
2aafa02
clarify this only applies to non-state events; fix retention JSON str…
ara4n Dec 30, 2018
64695ed
make conflict alg explicit for user retention settings
ara4n Dec 30, 2018
c493dbd
change max >= min invariant
ara4n Dec 30, 2018
0afc3af
spell out that self-destructing msgs need explicit RRs
ara4n Dec 30, 2018
7597e03
more validation on fields
ara4n Dec 30, 2018
7a8d204
spell out how the example server admin overrides would work
ara4n Dec 30, 2018
4646fcd
improve wording; spell out purge/redact dichotomy; add explicit alg
ara4n Dec 30, 2018
c55158d
clarify redaction semantic and default PL
ara4n Dec 30, 2018
6e33c2f
track max's idea of advertising retention per-server
ara4n Dec 30, 2018
28ea4e1
fix normatives
ara4n Dec 30, 2018
cca99dd
clarify client behaviour
ara4n Jan 4, 2019
a4974b6
make self_destruct set a timer in seconds rather than be binary.
ara4n Jan 4, 2019
c27394c
clarify warning about conflicts
ara4n Jan 5, 2019
f0553c0
Merge branch 'master' into matthew/msc1763
ara4n Aug 10, 2019
bdce6f1
remove per-message retention and self-destruct messages entirely to t…
ara4n Aug 10, 2019
a30a853
spell out that events will disappear from event streams when purged
ara4n Aug 10, 2019
c281420
add the 'why not nego?' tradeoff
ara4n Aug 10, 2019
ef215dd
clarify the intention to not default to finite message retention
ara4n Aug 10, 2019
0b6a209
spell out not to default to a max_lifetime
ara4n Aug 10, 2019
5c29779
incorporate review
ara4n Aug 11, 2019
032e63b
Apply suggestions from code review
ara4n Aug 11, 2019
1a4101e
link #2228
ara4n Aug 11, 2019
90b17d6
units
ara4n Aug 11, 2019
32f21ac
lifetimes in milliseconds
ara4n Aug 16, 2019
a1b8726
fix json number ranges
ara4n Aug 17, 2019
ee0a7ee
Update 1763-configurable-retention-periods.md
richvdh Aug 19, 2019
cabef48
Apply suggestions from code review
ara4n Aug 26, 2019
f5c3729
incorporate review
ara4n Aug 26, 2019
f8ceb97
spell out an example UI for warning about retention
ara4n Aug 26, 2019
8b1a0c3
clarify care & feeding of DAG
ara4n Aug 28, 2019
9357ec6
incorporate more @richvdh review
ara4n Aug 28, 2019
ac2f87e
Apply suggestions from code review
ara4n Sep 3, 2019
116c5b9
split out media attachment clean-up to #2278
ara4n Sep 3, 2019
f809087
Massively rewrite the proposal
babolivier Oct 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
388 changes: 388 additions & 0 deletions proposals/1763-configurable-retention-periods.md
@@ -0,0 +1,388 @@
# Proposal for specifying configurable per-room message retention periods.
ara4n marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sensing an innate conflict within this MSCs interests, one where it both wants to reduce server history in rooms, but where it also simultaneously expects to be able to fetch that history from thin air at any convenient time. I have a feeling it’s written with the underlying idea that large servers will carry all the events in the federation, with some servers being able to fetch from those at any time.

…however, this is mentioned nowhere in the MSC, where it skirts around these problems by putting these assumptions between the lines, while not thinking critically about what this means for the larger federation; more dependency on large servers.

With this, it does not bring a lucid solution to the problem of dealing with history retention, one where any server eventually has to face that it cannot fetch events it knows exist(ed), but are now expected to respond with them to a client’s query.

The semantic equivalent of HTTP Error 410 (“gone”) has to exist somewhere here, to be able to tell clients it’s unable to fetch a historical event due to history retention, and all sad and happy paths that spring from that. The current stance against this is “you’re SOL, have a 404 with no context”.


I don’t see this MSC deal with the reality that it is deleting events, I don’t see a coherent solution to allow some servers to “archive” history, and make that explicit (also in the rooms, for privacy concerns, for people who wanna know which servers are ignoring retention rules and archiving anyways)

Servers ignoring retention rules does have a basis, namely one of actually archiving historic conversations, in a similar philosophy as The Internet Archive. If this MSC were to go through as-is, then we’d have a similar situation as the general internet, namely one where all history is lost to time due to individual retention strategies.

While reliance on large servers isn’t what a federation would want, an explicit form of mentioning where at least people are aware which servers are backing up, and which ones aren’t, would help this MSC greatly in the long run.


A major shortcoming of Matrix has been the inability to specify how long events
should stored by the servers and clients which participate in a given room.

This proposal aims to specify a simple yet flexible set of rules which allow
users, room admins and server admins to determine how long data should be stored
for a room, from the perspective of respecting the privacy requirements of that
room (which may range from a "burn after reading" ephemeral conversation,
through to FOIA-style public record keeping requirements).

As well as enforcing privacy requirements, these rules provide a way for server
administrators to better manage disk space (e.g. to enforce rules such as "don't
store remote events for public rooms for more than a month").

This proposal originally tried to also define semantics for per-message
retention as well as per-room; this has been split out into
[MSC2228](https://github.com/matrix-org/matrix-doc/pull/2228) in order to get
the easier per-room semantics landed.


## Problem

Matrix is inherently a protocol for storing and synchronising conversation
history, and various parties may wish to control how long that history is stored
for.

Room administrators, for instance, may wish to control how long a message can be
stored (e.g. to comply with corporate/legal requirements to store message
history for at least a specific amount of time), or how early a message can be
deleted (e.g. to address privacy concerns of the room's members, to avoid
messages staying in the public record forever, or to comply with corporate/legal
requirements to only store specific kinds of information for a limited amount of
time).

Additionally, server administrators may also wish to control how long message
history is kept in order to better manage their server's disk space, or to
enforce corporate/legal requirements for the organisation managing the server.

We would like to provide this behaviour whilst also ensuring that users
generally see a consistent view of message history, without lots of gaps and
one-sided conversations where messages have been automatically removed.

We would also like to set the expectation that rooms typically have a long
message retention - allowing those who wish to use Matrix to act as an archive
of their conversations to do so. If everyone starts defaulting their rooms to
finite retention periods, then the value of Matrix as a knowledge repository is
broken.

This proposal does not try to solve the problems of:
* GDPR erasure (as this involves retrospectively changing the lifetime of
messages)
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* Bulk redaction (e.g. to remove all messages from an abusive user in a room,
as again this is retrospectively changing message lifetime)
* Specifying history retention based on the number of messages (as opposed to
their age) in a room. This is descoped because it is effectively a disk space
management problem for a given server or client, rather than a policy
problem of the room. It can be solved as an implementation specific manner, or
a new MSC can be proposed to standardise letting clients specify disk quotas
per room.
* Per-message retention (as having a mix of message lifetime within a room
complicates implementation considerably - for instance, you cannot just
purge arbitrary events from the DB without fracturing the DAG of the room,
and so a different approach is required)


## Proposal

### Per-room retention

We introduce a `m.room.retention` state event, which room admins or moderators
can set to mandate the history retention behaviour for a given room. It follows
the default PL semantics for a state event (requiring PL of 50 by default to be
set). Its state key is an empty string (`""`).

The following fields are defined in the `m.room.retention` contents:

* `max_lifetime`: the maximum duration in milliseconds for which a server must
store events in this room. Must be null or an integer in range [0,
2<sup>53</sup>-1]. If absent or null, should be interpreted as not setting an
upper bound to the room's retention policy.

* `min_lifetime`: the minimum duration in milliseconds for which a server should
store events in this room. Must be null or an integer in range [0,
2<sup>53</sup>-1]. If absent or null, should be interpreted as not setting a
lower bound to the room's retention policy.

In the instance of both `max_lifetime` and `min_lifetime` being provided,
`max_lifetime` must always be higher or equal to `min_lifetime`.


For instance:

```json
{
"max_lifetime": 86400000
}
```

The above example means that servers receiving messages in this room should
store the event for only 86400000 milliseconds (1 day), as measured from that
event's `origin_server_ts`, after which they MUST purge all references to that
event (e.g. from their db and any in-memory queues).

We consciously do not redact the event, as we are trying to eliminate metadata
and save disk space at the cost of deliberately discarding older messages from
the DAG.

```json
{
"min_lifetime": 2419200000
}
```

The above example means that servers receiving this message SHOULD store the
event forever, but can choose to purge their copy after 28 days (or longer) in
order to reclaim diskspace.

```json
{
"min_lifetime": 2419200000,
"max_lifetime": 15778800000
}
```

The above example means that servers SHOULD store their copy of the event for at least 28
days after it has been sent, and MUST delete it at the latest after 6 months.


## Server-defined retention

Server administrators can benefit from a few capabilities to control how long
history is stored:

* the ability to set a default retention policy for rooms that don't have a
retention policy defined in their state
* the ability to override the retention policy for a room
* the ability to cap the effective `max_lifetime` and `min_lifetime` of the rooms the
server is in

The implementation of these capabilities in the server is left as an
implementation detail.

We introduce the following authenticated endpoint to allow clients to enquire
about how the server implements this policy:


```
GET /_matrix/client/v3/retention/configuration
```

200 response properties:

* `policies` (required): An object mapping room IDs to a retention policy. If
the room ID is `*`, the associated policy is the default policy. Each policy
follows the format for the content of an `m.room.retention` state event.
* `limits` (required): An object defining the limits to apply to policies
defined by `m.room.retention` state events. This object has two optional
properties, `min_lifetime` and `max_lifetime`, which each define a limit to
the equivalent property of the state events' content. Each limit defines an
optional `min` (the minimum value, in milliseconds) and an optional `max` (the
maximum value, in milliseconds).

If both `policies` and `limits` are included in the response, the policies
specified in `policies` __must__ comply with the limits defined in `limits`.

Example response:

```json
{
"policies": {
"*": {
"max_lifetime": 15778800000
},
"!someroom:test": {
"min_lifetime": 2419200000,
"max_lifetime": 15778800000
}
},
"limits": {
"min_lifetime": {
"min": 86400000,
"max": 172800000
},
"max_lifetime": {
"min": 7889400000,
"max": 15778800000
}
}
}
```

In this example, the server is configured with:

* a default policy with a `max_lifetime` of 6 months and no `min_lifetime` (i.e. messages
can only be kept up to 6 months after they have been sent)
* an override for the retention policy in room `!someroom:test`
* limits on `min_lifetime` that

Example response with no policy or limit set:

```json
{
"policies": {},
"limits": {}
}
```

Example response with only a default policy and an upper limit on `max_lifetime`:

```json
{
"policies": {
"*": {
"min_lifetime": 86400000,
"max_lifetime": 15778800000
}
},
"limits": {
"max_lifetime": {
"max": 15778800000
}
}
}
```

### Defining the effective retention policy of a room

In this section, as well as in the rest of this document, we define the
"effective retention policy" of a room as the retention policy that is used to
determine whether an event should be deleted or not. This may be the policy
determined by the `m.room.retention` event in the state of the room, but it
might not be depending on limits set by the homeserver.

The algorithm implementation must implement to determine the effective retention
policy of a room is


* if the homeserver defines a specific retention policy for this room, then use
this policy as the effective retention policy of the room.
* otherwise, if the state of the room does not include a `m.room.retention`
event with an empty state key:
* if the homeserver defines a default retention policy, then use this policy
as the effective retention policy of the room.
* if the homeserver does not define a default retention policy, then don't
apply a retention policy in this room.
* otherwise, if the state of the room includes a `m.room.retention` event with
an empty state key:
* if no limit is set by the homeserver use the policy in the state of the
room as the effective retention policy of the room.
* for `min_lifetime` and `max_lifetime`:
* if there is no limit for the property, use the value specified in the
room's state for the effective retention policy of the room (if any).
* if there is a limit for the property:
* if the value specified in the room's state complies with the
limit, use this value for the effective retention policy of the
room.
* if the value specified in the room's state is lower than the
limit's `min` value, use the `min` value for the effective
retention policy of the room.
* if the value specified in the room's state is greater than the
limit's `max` value, use the `max` value for the effective
retention policy of the room.
* if there is no value specified in the room's state, use the
limit's `min` value for the effective retention policy of the
room (which can be null or absent).
* otherwise, don't apply a retention policy in this room.

So, for example, if a homeserver defines a lower limit on `max_lifetime` of
`86400000` (a day) and no limit on `min_lifetime`, and a room's retention policy
is the following:

```json
{
"max_lifetime": 43200000,
"min_lifetime": 21600000
}
```

Then the effective retention policy of the room is:

```json
{
"max_lifetime": 86400000,
"min_lifetime": 21600000
}
```


## Enforcing a retention policy

Retention is only considered for non-state events. Retention is also not
considered for the most recent event in a room, in order to allow a new event
sent to that room to reference it in its `prev_events`.

When purging events in a room, only the latest retention policy state event in
that room is considered. This means that in a room where the history looks like
the following (oldest event first):

1. Retention policy A
2. Event 1
3. Event 2
4. Retention policy B

Then the retention policy B is used to determine the effective retention that
defines whether events 1 and 2 should be purged, even though they were sent when
the retention policy A was in effect. This is to avoid creating wholes in the
room's DAG caused by events in the middle of the timeline being subject to a
lower `max_lifetime` than other events being sent before and after them. Such
holes would make it more difficult for homeservers to calculate room timelines
when showing them to clients. They would also force clients to display
potentially incomplete or one-sided conversations without being able to easily
tell which parts of the conversation is missing.

Servers decide whether an event should or should not be purged by calculating
how much time has passed since the event's `origin_server_ts` property, and
comparing this duration with the room's effective retention policy.

Note that, for performance reasons, a server might decide to not purge an event
the second it hits the end of its lifetime (e.g. so it can batch several events
together). In this case, the server must make sure to omit the expired events
from reponses to client requests. Similarly, if the server is sent an expired
event over federation, it must omit it from responses to client requests (and
ensure it is eventually purged).

## Tradeoffs

This proposal specifies that the lifetime of an event is defined by the latest
retention policy in the room, rather than the one in effect when the event was
sent. This might be controversial as, in Matrix, the state that an event is
subject to is usually the state of the room at the time it was sent. However,
there are a few issues with using the retention that was in effect at the time
the event was sent:

* it would create holes in the DAG of a room which would complexify the
server-side handling of the room's history
* malicious servers could potentially make an event evade retention policies by
selecting their event's `prev_events` and `auth_events` so that the event is
on a portion of the DAG where the policy does not exist
* it would be difficult to translate the configuration of retention policies
into a clear and easy to use UX (especially considering server-side
configuration applies to the whole history of the room)
* it would not allow room administrators to retroactively update the lifetime of
events that have already been sent (e.g. if the context of a room administered
by an organisation which requirements for data retention change over time)

This proposal does not cover per-message retention (i.e. the ability to set
different lifetimes to different messages). This has been split out into
[MSC2228](https://github.com/matrix-org/matrix-spec-proposals/pull/2228) to
simplify this proposal.

This proposal does also not cover the case where a room's administrator wishes
to only restrict the lifetime of a specific section of the room's history. This
is left to be covered by a separate MSC, possibly built on top of MSC2228.

## Security considerations

In a context of open federation, it is worth keeping in mind the possibility
that not all servers in a room will enforce its retention policy. Similarly,
different servers will likely enforce different server-side configuration, and
as a result calculate different lifetimes for a given event. This proposal aims
at trying to compromise between finding an absolute consensus on an event's
lifetime and working within the constraints of a server's operator in terms of
data retention.

In a kind of contradictory way with the previous paragraph, a server may keep an
expired event in its database for some time after its expiration, while not
sharing it with clients and federating servers. This is in order to prevent
abusers from using low lifetime values in a room's retention policy in order to
erase any proof of such abuse and avoid being investigated.

Basing the expiration time of an event on its `origin_server_ts` is not ideal as
this field can be falsified by the sending server. However, there currently
isn't a more reliable way to certify the send time of an event.

As mentioned previously in this proposal, servers might store expired events for
longer than their lifetime allows, either for performance reason or to mitigate
abuse. This is considered acceptable as long as:

* an expired event is not kept permanently
* an expired event is not shared with clients and federated servers

## Unstable prefixes

While this proposal is under review, the `m.room.retention` event type should be
replaced by the `org.matrix.msc1763.retention` type.

Similarly, the `/_matrix/client/v3/retention/configuration` path should be replaced with `/_matrix/client/unstable/org.matrix.msc1763/retention/configuration`.