Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Resume state re-syncing after a Synapse restart #12771

Closed
wants to merge 4 commits into from

Conversation

squahtx
Copy link
Contributor

@squahtx squahtx commented May 17, 2022

Resolves #12642.

These changes haven't been tested at all. How would we go about testing?

Sean Quah added 4 commits May 17, 2022 18:21
… returns an invalid response

Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
@squahtx squahtx requested a review from a team as a code owner May 17, 2022 17:30
room_id: room to be resynced
"""
assert not self.config.worker.worker_app
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implied by the identical assert in do_invite_join, which starts the background task.

Comment on lines +1475 to +1476
destination: Optional[str],
destinations: Collection[str],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not see a simple way to determine the destination the original sync was using after a restart, so I took the entire list of servers in the room to try.

Comment on lines +1497 to +1498
# Make an infinite iterator of destinations to try. Once we find a working
# destination, we'll stick with it until it flakes.
Copy link
Contributor Author

@squahtx squahtx May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a restart, it might take a little time before we stumble upon a working remote server. Can we do better?

Comment on lines +1559 to +1569
if attempt == num_destinations - 1:
# We have tried every remote server. Give up.
logger.error(
"Failed to get state for %s at %s from %s because %s. "
"Giving up!",
room_id,
event,
destination,
e,
)
raise
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about what to do if we've tried every destination and none of them worked.

Comment on lines +1465 to +1471
run_as_background_process(
desc="sync_partial_state_room",
func=self._sync_partial_state_room,
destination=None,
destinations=servers_in_room,
room_id=room_id,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to append the room ID to desc? Otherwise all the logging contexts will be labelled sync_partial_state_room-[0-9]+

@squahtx squahtx marked this pull request as draft May 19, 2022 18:14
@squahtx
Copy link
Contributor Author

squahtx commented May 19, 2022

I'm going to split this into two PRs, one for trying other homeservers and the other for resuming re-sync of state.

@squahtx squahtx closed this May 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster joins: resume state re-syncing after a Synapse restart
1 participant