Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Document easy room purge benefit of using (room_id, event_id) #13771

Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13771.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Document easy room purge benefit of using `(room_id, event_id)` in our database schemas.
7 changes: 4 additions & 3 deletions docs/development/database_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,10 +208,11 @@ But hash collisions are still possible, and by treating event IDs as room
scoped, we can reduce the possibility of a hash collision. When scoping
`event_id` in the database schema, it should be also accompanied by `room_id`
(`PRIMARY KEY (room_id, event_id)`) and lookups should be done through the pair
`(room_id, event_id)`.
`(room_id, event_id)`. Another benefit of scoping `event_ids` to the room is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow this reasoning. Is the point that we can do this with a single DELETE FROM ... WHERE ... rather than having to use a subselect or similar which joins to the events table?

Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DMRobertson Seems to be the case so we don't have to do this:

# Now we delete tables which lack an index on room_id but have one on event_id
for table in (
"event_auth",
"event_edges",
"event_json",
"event_push_actions_staging",
"event_relations",
"event_to_state_groups",
"event_auth_chains",
"event_auth_chain_to_calculate",
"redactions",
"rejections",
"state_events",
):
logger.info("[purge] removing %s from %s", room_id, table)
txn.execute(
"""
DELETE FROM %s WHERE event_id IN (
SELECT event_id FROM events WHERE room_id=?
)
"""
% (table,),
(room_id,),
)

From the chapter sync, it was one of the useful benefits people liked from pairing up (room_id, event_id).

Added a note on this part of why it's easier ⏩

that it makes it very easy to find and clean up everything in a room when it
needs to be purged.

There has been a lot of debate on this in places like
`event_id` global uniqueness has had a lot debate in places like
https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and
[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
has no resolution yet (as of 2022-09-01).