Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix 500 error on /messages when we accumulate more than 5 backward extremities #11027

1 change: 1 addition & 0 deletions changelog.d/11027.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix 500 error on `/messages` when the server accumulates more than 5 backwards extremities at a given depth for a room.
24 changes: 13 additions & 11 deletions synapse/handlers/federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,28 +238,30 @@ async def _maybe_backfill_inner(
)
return False

logger.debug(
"room_id: %s, backfill: current_depth: %s, max_depth: %s, extrems: %s",
room_id,
current_depth,
max_depth,
sorted_extremeties_tuple,
)

# We ignore extremities that have a greater depth than our current depth
# as:
# 1. we don't really care about getting events that have happened
# before our current position; and
# after our current position; and
Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read this over many times. I am pretty sure this is a typo or the logic below is wrong, t for t in sorted_extremeties_tuple if int(t[1]) <= current_depth

Let's just assume the code is correct and the source of truth. The code also matches the general summary above "We ignore extremities that have a greater depth than our current depth"

Depth is incremented for new messages.

Say our current_depth is 1 and the extremities are at depth 10, the extremities occurred after our current place. And the logic below will ignore any extremities greater than 1.

Therefore, "we don't really care about getting events that have happened after our current position"

# 2. we have likely previously tried and failed to backfill from that
# extremity, so to avoid getting "stuck" requesting the same
# backfill repeatedly we drop those extremities.
filtered_sorted_extremeties_tuple = [
t for t in sorted_extremeties_tuple if int(t[1]) <= current_depth
]

logger.debug(
"room_id: %s, backfill: current_depth: %s, limit: %s, max_depth: %s, extrems: %s filtered_sorted_extremeties_tuple: %s",
room_id,
current_depth,
limit,
max_depth,
sorted_extremeties_tuple,
filtered_sorted_extremeties_tuple,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this below because it's useful to see what filtered_sorted_extremeties_tuple is

)

# However, we need to check that the filtered extremities are non-empty.
# If they are empty then either we can a) bail or b) still attempt to
# backill. We opt to try backfilling anyway just in case we do get
# backfill. We opt to try backfilling anyway just in case we do get
# relevant events.
if filtered_sorted_extremeties_tuple:
sorted_extremeties_tuple = filtered_sorted_extremeties_tuple
Expand Down Expand Up @@ -389,7 +391,7 @@ async def try_backfill(domains: List[str]) -> bool:
for key, state_dict in states.items()
}

for e_id, _ in sorted_extremeties_tuple:
for e_id in event_ids:
Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big bug fix is here! I am iterating over event_ids which matches what we use to create states and then lookup in the loop.

Previously if we had say 8 backward extremities (sorted_extremeties_tuple), we would slice off the first 5 into extremities so we don't try to backfill too many in the request to the remote federated server.

# We don't want to specify too many extremities as it causes the backfill
# request URI to be too long.
extremities = dict(sorted_extremeties_tuple[:5])

Then we fetch the state for those 5 extremities and create the states dictionary.

But then later, we looped over sorted_extremeties_tuple which is the full 8 backward extremity list and we didn't grab the state for those extra 3 extremities. So we get a KeyError trying to lookup those events we didn't expect.

likely_extremeties_domains = get_domains_from_state(states[e_id])
Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question on the MSC2716 front. I ran into this error because each historical batch creates a backward extremity that will never be able to resolve because it points to a fake prev_event.

Is it okay to have this build-up?

My thinking is this would be problematic as we will have thousands of batches per room to backfill a big room for Gitter and will have to deal with thousands of backward extremities here.

Also because we use get_oldest_event_ids_with_depth_in_room and oldest backward extremities will always be historical events. Assuming we're paginating from the bottom (where current_depth > historical extremity depth), we will always pick the 5 unresolvable events in the room to backfill from. This would stop us from backfilling any other events in the room.

I feel like I may be missing something though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh bleurgh, that's true. Ideally we'd know that those are fake prev events and not store them? Or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking this separately via #11091


success = await try_backfill(
Expand Down
2 changes: 1 addition & 1 deletion synapse/handlers/federation_event.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,7 @@ async def check_join_restrictions(

@log_function
async def backfill(
self, dest: str, room_id: str, limit: int, extremities: List[str]
self, dest: str, room_id: str, limit: int, extremities: Iterable[str]
) -> None:
"""Trigger a backfill request to `dest` for the given `room_id`

Expand Down
64 changes: 64 additions & 0 deletions tests/handlers/test_federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,18 @@
from synapse.logging.context import LoggingContext, run_in_background
from synapse.rest import admin
from synapse.rest.client import login, room
from synapse.types import create_requester
from synapse.util.stringutils import random_string

from tests import unittest

logger = logging.getLogger(__name__)


def generate_fake_event_id() -> str:
return "$fake_" + random_string(43)


class FederationTestCase(unittest.HomeserverTestCase):
servlets = [
admin.register_servlets,
Expand Down Expand Up @@ -198,6 +203,65 @@ def test_rejected_state_event_state(self):

self.assertEqual(sg, sg2)

def test_backfill_with_many_backward_extremities(self):
"""
Check that we can backfill with many backward extremities.
The goal is to make sure that when we only use a portion
of backwards extremities(the magic number is more than 5),
no errors are thrown.

Regression test, see #11027
"""
# create the room
user_id = self.register_user("kermit", "test")
tok = self.login("kermit", "test")
requester = create_requester(user_id)

room_id = self.helper.create_room_as(room_creator=user_id, tok=tok)

ev1 = self.helper.send(room_id, "first message", tok=tok)

# Create "many" backward extremities. The magic number we're trying to
# create more than is 5 which corresponds to the number of backward
# extremities we slice off in `_maybe_backfill_inner`
for _ in range(0, 8):
event_handler = self.hs.get_event_creation_handler()
event, context = self.get_success(
event_handler.create_event(
requester,
{
"type": "m.room.message",
"content": {
"msgtype": "m.text",
"body": "message connected to fake event",
},
"room_id": room_id,
"sender": user_id,
},
prev_event_ids=[
ev1["event_id"],
# We're creating an backward extremity each time thanks
# to this fake event
generate_fake_event_id(),
],
)
)
self.get_success(
event_handler.handle_new_client_event(requester, event, context)
)

current_depth = 1
limit = 100
with LoggingContext("receive_pdu"):
# Make sure backfill still works
d = run_in_background(
self.hs.get_federation_handler().maybe_backfill,
room_id,
current_depth,
limit,
)
self.get_success(d)

def test_backfill_floating_outlier_membership_auth(self):
"""
As the local homeserver, check that we can properly process a federated
Expand Down