Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Backfill in the background if we're doing it "just because" #15710

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/15710.feature
@@ -0,0 +1 @@
Speed up `/messages` by backfilling in the background when there are no backward extremities where we are directly paginating.
8 changes: 7 additions & 1 deletion synapse/handlers/federation.py
Expand Up @@ -327,7 +327,9 @@ async def _maybe_backfill_inner(
logger.debug(
"_maybe_backfill_inner: all backfill points are *after* current depth. Trying again with later backfill points."
)
return await self._maybe_backfill_inner(
run_as_background_process(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like we're taking risks? But the code change includes a comment that seems to point out that we don't use the return value of the function anyway? (Should we remove the return value in that case?)

-- @reivilibre, #15710 (review)

There are two things being conflated here:

The first thing around "risk" is only really just calling out the fact that there is a behavior change here. Before this PR, if there wasn't any backward extremities where we were paginating, we would backfill anyway with the most-recent backward extremities. The exact reason why we did this wasn't specified beyond "in case we do get relevant events" but I assume it was for eventual consistency and just in case we catch a new branch of history that extends back into the place where we are paginating. Now with this PR, we are acknowledging that it's still a possibility that some useful history could come back but we don't need to block the client in this case because it's a best effort thing that could just as easily not return anything relevant since it's unlikely that a random backward extremity will lead to anything relevant to the place we are paginating and impossible in situations where the most-recent backward extremities are so far ahead of where we are paginating. And there really isn't any consequence since next time they scrollback, the history will be there since we still did the work in the background.

The second thing here is the return value of maybe_backfill() which is completely separate. This just signals whether we actually backfilled anything. We've historically not used it for anything since it was introduced. Coincidentally, I am using it in a new PR in order to see if there is anything new that we should go back to the database and ask about. Now that I understand the nuance better with an actual use case, I've updated the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm starting to understand what you are saying a bit; thanks for bearing with me, I don't ever touch backwards extremities so it's an unfamiliar topic for me.

So the root of the issue is that backwards extremities could point to events anywhere in the DAG (we don't know whereabouts they are because we don't have those events locally, which is why they are backwards extremities in the first place).
So when scrolling backwards in the event stream, we technically would have to try and fetch events from backwards extremities 'in case' they might appear in the scrollback window?
But this is not a very common occurrence, so most of the time it's a waste to wait for this to happen before letting the client go on its merry way?

And then the change introduced here means we don't block on that request. In the rare case that it should have produced events in the client's request window, it now won't.

Is that roughly the right idea?

The part I'm missing now is, which of these holds?:

  1. when the client scrolls back again, those missed events will be given to it
  2. those missed events will only be given to the client if the client repeated the request some point in the future after the background request completed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then the change introduced here means we don't block on that request. In the rare case that it should have produced events in the client's request window, it now won't.

Is that roughly the right idea?

This is the main gist to take away 👍


So the root of the issue is that backwards extremities could point to events anywhere in the DAG (we don't know whereabouts they are because we don't have those events locally, which is why they are backwards extremities in the first place).

We know approximately where any backward extremity is because we have the successor events persisted (see the backward extremity explanation in the docs). And we first pick from backward extremities that are approximately in range of where we are paginating already. This is all smart and good so far. But if there are no backward extremities at or further back in history than our current pagination position, then we just backfill with the most recent backward extremities (just because).

The backfill with the most recent backward extremities "just because" is the part we're deciding to put in the background now.

We could check whether these most recent backward extremities are "close" as well but it's just extra complexity and lookup time and we could just as easily be missing that single event vs a unknown chain. And maybe we even want to do something different, where we just choose the closest extremities to our window. But the current behavior of most recent extremities does fill in history that other people are most likely to hit first if they scrollback and is a bit simple to think about (although a bit arbitrary).

There is an argument where we might want to remove this logic completely, but I think it serves a good enough purpose to fill in history in rooms that people are visiting often and unstick things so they become eventually consistent.


those missed events will only be given to the client if the client repeated the request some point in the future after the background request completed

This is the right answer. Which does mean that clients could miss out on history but that's already the case and one of the reasons I created MSC3871 so we could indicate to clients that there is a gap that they could try refetching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable compromises. That gap suggestion by MSC3871 could well be useful for clients, thanks.

"_maybe_backfill_inner_anyway_with_max_depth",
self._maybe_backfill_inner,
room_id=room_id,
# We use `MAX_DEPTH` so that we find all backfill points next
# time (all events are below the `MAX_DEPTH`)
Expand All @@ -338,6 +340,10 @@ async def _maybe_backfill_inner(
# overall otherwise the smaller one will throw off the results.
processing_start_time=None,
)
# We return `False` because we're backfilling but doing it in the background
# and is just a best effort attempt for eventual consistency's sake. The
# return value of this function isn't used for anything yet anyway.
return False

# Even after recursing with `MAX_DEPTH`, we didn't find any
# backward extremities to backfill from.
Expand Down