Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Document easy room purge benefit of using (room_id, event_id) #13771

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13771.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Document easy room purge benefit of using `(room_id, event_id)` in our database schemas.
8 changes: 5 additions & 3 deletions docs/development/database_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,10 +208,12 @@ But hash collisions are still possible, and by treating event IDs as room
scoped, we can reduce the possibility of a hash collision. When scoping
`event_id` in the database schema, it should be also accompanied by `room_id`
(`PRIMARY KEY (room_id, event_id)`) and lookups should be done through the pair
`(room_id, event_id)`.
`(room_id, event_id)`. Another benefit of scoping `event_ids` to the room is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow this reasoning. Is the point that we can do this with a single DELETE FROM ... WHERE ... rather than having to use a subselect or similar which joins to the events table?

Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DMRobertson Seems to be the case so we don't have to do this:

# Now we delete tables which lack an index on room_id but have one on event_id
for table in (
"event_auth",
"event_edges",
"event_json",
"event_push_actions_staging",
"event_relations",
"event_to_state_groups",
"event_auth_chains",
"event_auth_chain_to_calculate",
"redactions",
"rejections",
"state_events",
):
logger.info("[purge] removing %s from %s", room_id, table)
txn.execute(
"""
DELETE FROM %s WHERE event_id IN (
SELECT event_id FROM events WHERE room_id=?
)
"""
% (table,),
(room_id,),
)

From the chapter sync, it was one of the useful benefits people liked from pairing up (room_id, event_id).

Added a note on this part of why it's easier ⏩

that it makes it very easy to find and clean up everything in a room when it
needs to be purged (no need to use sub-`select` query or join from the `events`
table).

There has been a lot of debate on this in places like
`event_id` global uniqueness has had a lot debate in places like
https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and
[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
has no resolution yet (as of 2022-09-01).