Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Change display names/avatar URLs to None if they contain null bytes before storing in DB #11230

Merged
merged 7 commits into from
Nov 12, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions changelog.d/11230.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Fix a long-standing bug wherein display names or avatar URLs containing null bytes cause an internal server error
when stored in the DB.
10 changes: 6 additions & 4 deletions synapse/storage/databases/main/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -1640,8 +1640,8 @@ def _store_event_reference_hashes_txn(self, txn, events):
def _store_room_members_txn(self, txn, events, backfilled):
"""Store a room member in the database."""

def str_or_none(val: Any) -> Optional[str]:
return val if isinstance(val, str) else None
def non_null_str_or_none(val: Any) -> Optional[str]:
return val if isinstance(val, str) and "\u0000" not in val else None

self.db_pool.simple_insert_many_txn(
txn,
Expand All @@ -1653,8 +1653,10 @@ def str_or_none(val: Any) -> Optional[str]:
"sender": event.user_id,
"room_id": event.room_id,
"membership": event.membership,
"display_name": str_or_none(event.content.get("displayname")),
"avatar_url": str_or_none(event.content.get("avatar_url")),
"display_name": non_null_str_or_none(
event.content.get("displayname")
),
"avatar_url": non_null_str_or_none(event.content.get("avatar_url")),
}
for event in events
],
Expand Down
26 changes: 26 additions & 0 deletions tests/storage/test_roommember.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,32 @@ def test_get_joined_users_from_context(self):
)
self.assertEqual(users.keys(), {self.u_alice, self.u_bob})

def test__null_byte_in_display_name_properly_handled(self):
room = self.helper.create_room_as(self.u_alice, tok=self.t_alice)

# Create a profile with the offending null byte in the display name
new_profile = {"displayname": "ali\u0000ce"}

# Ensure that the change goes smoothly and does not fail due to the null byte
self.helper.change_membership(
room,
self.u_alice,
self.u_alice,
"join",
extra_data=new_profile,
tok=self.t_alice,
)

res = self.get_success(
self.store.db_pool.simple_select_onecol(
"room_memberships", {"user_id": "@alice:test"}, "display_name"
)
)
# verify that the display name was alice before change in membership
self.assertEqual(res[0], "alice")
# verify that it is now None
self.assertEqual(res[1], None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, slight problem here is that this is probably fine for tests, but it's relying on the database returning the rows in insertion order.

SQLite secretly maintains a 'rowid' (row ID) per row and stores rows in a B* tree keyed by those, so I can see it returning them in insertion order.
I assume Postgres does something similar — basically, returning them in order of insertion is 'easiest' for this query (it'd be more work to jumble them up) so I reckon this test will work!
(If the database was using an index to retrieve the rows, then it would probably return them in order of that index)

However, nothing guarantees that these databases will do that. I think a notable time that databases will violate this insertion order is if some rows get deleted and the database shuffles things around (e.g. Postgres' VACUUM; SQLite can recycles rowids afaik and SQLite will generate rowids somewhat randomly if you reach the limit).

I was curious about SQLite's VACUUM and in my case it kept the order when vacuuming, seemingly just rewriting rowids, oh well (I was hoping for some rows to be dramatically reordered! :P).

sqlite> create table tab1 (val TEXT);
sqlite> insert into tab1 values ("a"), ("b"), ("c");
sqlite> select rowid, val from tab1;
1|a
2|b
3|c
sqlite> delete from tab1 where val = "b";
sqlite> insert into tab1 values ("d"), ("e"), ("f");
sqlite> select rowid, val from tab1;
1|a
3|c
4|d
5|e
6|f
sqlite> vacuum;
sqlite> select rowid, val from tab1;
1|a
2|c
3|d
4|e
5|f

Anyway, excuse the tangent, but this is the kind of thing that has the tendency to bite you in the bum :(.
I think what I'd do here is:

  • select from room_memberships before changing Alice's name
    • check your preconditions and make a note of the event_id
  • update the name
  • select from room_memberships again, filtering out the row with the event_id you had before and check your postcondition

Check the expected number of rows at each step to make sure that if we get more rows for some reason, they don't cause confusion in the future when the test breaks and some poor soul has to figure out why :p.

I'd probably use a list comprehension to do the filtering in Python (mostly since doing it in SQL here is a pain and it's not as though it's performance sensitive since it's a test), something like that:

rows = [
	row for row in rows if row["event_id"] != first_event_id
]



class CurrentStateMembershipUpdateTestCase(unittest.HomeserverTestCase):
def prepare(self, reactor, clock, homeserver):
Expand Down