Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3713: Alleviating ACL exhaustion with ACL Slots #3713

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
129 changes: 129 additions & 0 deletions proposals/3713-alleviating-acl-exhaustion-with-acl-slots.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# MSC3713: Alleviating ACL exhaustion with ACL Slots

## Introduction

This MSC exists for a very simple reason. ACL event capacity exhaustion is a real danger that is
realistic or at least seems so based on the context of the events that started around 2022-02-03.
The events in question are not needed to go into but the result is worthy of mention. In a single day the
feline.support perspective of certain matrix.org rooms reported hundreds of ACL revisions and reports of
ACL capacity starting to be a future concern.

What does this MSC do to address these concerns about a potential ACL exhaustion situation happening in
a room. Well simple it creates the ACL Slots system. This system is intended as a measure that is
acceptable even if it becomes a long term fix. This MSC will not address the flaws of the ACL system
it will only aim to alleviate this singular problem. A future MSC is perfectly welcome to fix any flaws
in the ACL system that it identifies.

This system works by sending `m.room.server_acl` events with the state key of `m.room.server_acl.slot.x`
X represents the slot number used by a given ACL event. The exact number of slots is tied to the
room version in use and this MSC suggests that a future room version augments the auth rules to make it
illegal to send a ACL with a key that is outside of the allowed slot range for the room version.
The reason that its tied to room versions is simple. All servers should be able to know the max amount
ACL events that they need to keep track of for a given room and it should not change therefore tie it to
the room version the room uses. For existing rooms the auth rules wont help us but instead homeservers
will just ignore any event that is outside the accepted range.


## Proposal

As stated in the introduction this system works by sending a modified variant of the `m.room.server_acl`
event. This modified version has only a single change. We put the state key to `m.room.server_acl.slot.x`
the X is the decimal value for the ACL in question inside of the range of 0 - Room Version MAX Value.

For existing room versions at the time of writing this MSC aka Room Versions 1-9 a max value of 512 ACL
FSG-Cat marked this conversation as resolved.
Show resolved Hide resolved
events is set. Future room versions are allowed to change this value and are encouraged to if a need or
desire exists. The reason for 512 is well simple its an arbitrary number that Cat thought sounded good.
The number is big enough to make it very hard to exhaust and that is the important part. But also small
enough as to not be completely unreasonable in size.

The process for applying would be the same as today but you combine all the ACL slots contents that are
in the same field so all the contents of all the `"allow"` field gets combine and the same goes for
`"deny"` the `"allow_ip_literals"` attribute is only defined inside of the `m.room.server_acl` event with
a state key `m.room.server_acl.slot.0`.

For backward compatibility under this MSC the `m.room.server_acl` state event with a blank key would still
be useable as a fall back. Homeservers that implement this MSC should upon detecting any `m.room.server_acl`
with a slot state key not apply the contents of the `m.room.server_acl` with a blank key if a
`m.room.server_acl` with a key of `m.room.server_acl.slot.0` exists.

The ACL event with the key of `m.room.server_acl.slot.0` is special due to that its recommended that its
always a clone of the non slot `m.room.server_acl` event as to maintain an ACL list that is backwards
compatible even if the list is incomplete when in this mode.

Example event for Slot 0
```
{
"content": {
"allow": [
"*"
],
"allow_ip_literals": false,
"deny": [
"*.evil.com",
"evil.com"
]
},
"event_id": "$example0:example.org",
"origin_server_ts": 1432735824653,
"room_id": "!example_room:example.org",
"sender": "@example:example.org",
"state_key": "m.room.server_acl.slot.0",
"type": "m.room.server_acl",
"unsigned": {
"age": 1234
}
}
```
Example Event for Slot 1
```
{
"content": {
"allow": [
"*"
],
"deny": [
"*.evil.org",
"evil.org"
]
},
"event_id": "$example1:example.org",
"origin_server_ts": 1432735824653,
"room_id": "!example_room:example.org",
"sender": "@example:example.org",
"state_key": "m.room.server_acl.slot.1",
"type": "m.room.server_acl",
"unsigned": {
"age": 1234
}
}
```

## Potential issues

Potential issues well that this needs support is an obvious one but the author is not aware of that many
issues with this MSC that aren't already issues known about the ACL system it self. This MSC after all
aims to only alleviate exhaustion as a potential concern and not fix any problems of the ACL system.

## Alternatives

There are other ideas floating around the author is aware of but considers completely out of the question
like the idea to increase the event size just to accommodate ACL. This MSC exists to serve as an
alternative to this idea.

## Security considerations

By limiting the slot count the attack of just consuming a completely obscene amount of ram is somewhat
Copy link
Contributor

@Gnuxie Gnuxie Feb 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just considerable amounts of RAM. There may be a cost in some implementations to compile each regex (Caches up to 512) and there will be a cost to run 512x500~ regexes against federating servers. So someone could probably put considerable load on the server just by abusing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a lower limit might be considered as a means of mittigating the abuse potential. Tho if i might add. Its already possible to use ACL to create server load but yes i do agree this MSC makes it ludicrously easier to do since it will expand the amount of server load a single room can create by a factor of atleast 512 if we choose to make 512 slots the standard.

I am completetely open to defining the v1-9 max at a lower number in the 32-128 range to limit the abuse potential and still keep the benefits of this MSC. The max is tied to room versions after all so we can change it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concern should be alleviated since the slot count is now at a max of 32. It can be made even lower if we think that is desired since a slot count as low as 4-8 still has a massive impact on how much ACL capacity we have.

mitigated but yes its a threat that ACL can eat a copious amount of ram when loaded into ram.

The Author is not aware of any additional new problems this MSC introduces that don't already exist with
todays ACL system.

## Unstable prefix

Unstable implementations should use the event type of `support.feline.msc3713.rev1.room.server_acl` and
`support.feline.msc3713.rev1.room.server_acl.slot.X` state keys.

## Dependencies

The author of this MSC is not aware of this MSC having any MSCs pending merging into the spec as
dependencies.