Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Optimize how we calculate likely_domains during backfill #13575

Merged
merged 15 commits into from
Aug 30, 2022

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Aug 20, 2022

Optimize how we calculate likely_domains during backfill because I've seen this take 17s in production just to get_current_state which is used to get_domains_from_state (see case 2. Loading tons of events in the /messages investigation issue).

There are 3 ways we currently calculate hosts that are in the room:

  1. get_current_state -> get_domains_from_state
    • Used in backfill to calculate likely_domains and /timestamp_to_event because it was cargo-culted from backfill
    • This one is being eliminated in favor of get_current_hosts_in_room in this PR 🕳
  2. get_current_hosts_in_room
    • Used for other federation things like sending read receipts and typing indicators
  3. get_hosts_in_room_at_events
    • Used when pushing out events over federation to other servers in the _process_event_queue_loop

Fix #13626

Part of #13356

Mentioned in internal doc

Query performance

Before

The query from get_current_state sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)

But what about get_current_hosts_in_room: When there is 8M rows in the current_state_events table, the previous query in get_current_hosts_in_room took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.

$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
    type = 'm.room.member'
    AND membership = 'join'
    AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
  4130
(1 row)

Time: 13181.598 ms (00:13.182)

synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
 80814

synapse=# SELECT COUNT(*) from current_state_events;
  count
---------
 8162847

synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
 pg_size_pretty
----------------
 4702 MB

After

I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart.

After a Postgres restart, it takes 330ms and running back to back takes 260ms.

$ psql synapse
synapse=# \timing on
Timing is on.
synapse=# SELECT
    substring(c.state_key FROM '@[^:]*:(.*)$') as host
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
    c.type = 'm.room.member'
    AND c.membership = 'join'
    AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY host
ORDER BY min(e.depth) ASC;
Time: 333.800 ms

Going further

To improve things further we could add a limit parameter to get_current_hosts_in_room. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. Tracked by #13659

Another thing we can do is optimize the query to use a index skip scan:

Dev notes

Created a utility script to dump the state events from a
GET https://matrix-client.matrix.org/_matrix/client/r0/rooms/!OGEhHVWSdvArJzumhm:matrix.org/state response for testing in your local dev database, https://github.com/MadLittleMods/matrix-synapse-state-dumper

It can also add in filler events in between in the database so it isn't such an easy sequential read for Postgres.

It's rough and just barely does what I needed to get some testing for this PR. Not very general.


brew services restart postgresql@13

$ psql synapse
> \timing on
> ...
> \q

Get count of rows when using group by:

SELECT COUNT(*) OVER () AS TotalRecords
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
    c.type = 'm.room.member'
    AND c.membership = 'join'
    AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY substring(c.state_key FROM '@[^:]*:(.*)$')
ORDER BY min(e.depth);

BackfillBenchmarkTestCase test_benchmark_likely_domains
class BackfillBenchmarkTestCase(unittest.HomeserverTestCase):
    def prepare(self, reactor, clock, hs):
        self._storage_controllers = hs.get_storage_controllers()

    def test_benchmark_likely_domains(self):
        import time
        import logging

        logger = logging.getLogger(__name__)

        from synapse.handlers.federation import get_domains_from_state

        async def calculate_likely_domains():
            # Find other homeservers from the given state in the room
            curr_state = await self._storage_controllers.state.get_current_state(
                "!OGEhHVWSdvArJzumhm:matrix.org"
            )
            curr_domains = get_domains_from_state(curr_state)
            likely_domains = [
                domain for domain, depth in curr_domains if domain != self.server_name
            ]
            return likely_domains

        def time_calculate_likely_domains(test_prefix: str):
            benchmark_start_time = time.time()
            likely_domains = self.get_success(calculate_likely_domains())
            benchmark_end_time = time.time()
            logger.info(
                "Benchmark time (%s): %s",
                "{test_prefix: <13}".format(test_prefix=test_prefix),
                (benchmark_end_time - benchmark_start_time),
            )
            self.assertGreater(len(likely_domains), 4000)

        time_calculate_likely_domains("1")
        time_calculate_likely_domains("2")
        time_calculate_likely_domains("3")

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

@MadLittleMods MadLittleMods added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Aug 20, 2022
@MadLittleMods MadLittleMods added the A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) label Aug 20, 2022
@@ -99,37 +99,6 @@
)


def get_domains_from_state(state: StateMap[EventBase]) -> List[Tuple[str, int]]:
Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminating this option to get domains for a room in favor of get_current_hosts_in_room.

In a room like #matrixhq, getting all 80k state events in the room, then whittling it down in the app is not faster than a quick targeted database query or maybe even luckily hitting the cache.

/* Group all state events from the same domain into their own buckets (groups) */
GROUP BY server_domain
/* Sorted by lowest depth first */
ORDER BY min(e.depth) ASC;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the query to order by depth to match what get_domains_from_state did before (drop-in replacement).

The heuristic of sorting by servers who have been there the longest is good because they're most likely to have anything we ask about.

@MadLittleMods MadLittleMods marked this pull request as ready for review August 23, 2022 14:32
@MadLittleMods MadLittleMods requested a review from a team as a code owner August 23, 2022 14:32
@reivilibre reivilibre requested review from reivilibre and removed request for a team August 23, 2022 14:37
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting optimisation!

Would be worth checking how much we're going to negatively impact callers that don't care about the depth ordering ­— introducing a join (not even with all the columns covered by an index) in a section of the code that says 'Avoiding a join is a Very Good Thing' makes me worry a bit :).

WHERE type = 'm.room.member' AND room_id = ? AND membership = ?
SELECT c.state_key FROM current_state_events as c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is adding this join here going to bite us in the bum for callers that don't actually care about the depth ordering?

As an aside, adding an index on events(event_id, depth) could be considered to speed up this if it turns out to be a problem, but on the other hand we're not keen on adding more indices to the events tables because it's frequently written.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time-wise, I think by the nature of the extra complexity, it will take more time but it's not something I optimized for here. If we don't like this trade-off, we could remove the cache-shortcut lookup in get_current_hosts_in_room around it (it's kinda doing the same thing anyway looking in current_state_events).

Is there another concern about the order itself?

Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, here are some first-run timings. Not sure how much the first one affects the second one 🤷. Doubling the time isn't very good though so I'm up for removing the cache-shortcut lookup (just poke me further). Or we just sort by oldest current_state_events rows (not necessarily depth order but might be pretty close in practice) and not worry about the slight difference it makes to backfill (this one maybe sounds more interesting to me).

$ psql synapse
synapse=# \timing on

synapse=# SELECT c.state_key FROM current_state_events as c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE c.type = 'm.room.member' AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org' AND membership = 'join'
/* Sorted by lowest depth first */
ORDER BY e.depth ASC;

Time: 4374.227 ms (00:04.374)


synapse=# SELECT c.state_key FROM current_state_events as c
WHERE c.type = 'm.room.member' AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org' AND membership = 'join';

Time: 1947.409 ms (00:01.947)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only doubling isn't as bad as I was expecting, so maybe it's fine. Possibly worth checking a graph for how much time is spent on get_users_in_rooms and thinking if it would burn us if that time were doubled.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only doubling isn't as bad as I was expecting, so maybe it's fine. Possibly worth checking a graph for how much time is spent on get_users_in_rooms and thinking if it would burn us if that time were doubled.

Do we have a graph for this? I don't see special metrics annotated in it but 👍 it would be good to look at if we had it. Depending on confidence, we can merge a PR to measure it.


I've looked at the usages and there are some areas where it doesn't feel great to increase the time (calculating if an appservice is interested in an event (this could probably be smarter though anyway to use the user namespace regex directly on the members of the room like we do to get the domains), calculating join_authorised_via_users_server in a /make_join (idk how important this is), newly joined rooms in /sync, presence). Overall seems kinda fine ⏩

I can create separate PRs to fix up the following areas first, then we can merge this after those places are changed.

There are a few areas where it feels like we shouldn't be using get_users_in_room and can just use a direct lookup for the single user:

There are a few places where we should use get_current_hosts_in_room instead:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some follow-up. I think this PR was rolled out on matrix.org (or maybe just the RC was which wouldn't include this?)

The average looks about the same. It seems a bit more peakier on total txn time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another follow-up, this time I think the changes are actually deployed to matrix.org as indicated by that change in the UP graph at the top (and /versions shows a hash that includes it)

It looks like it does take significantly more time on average (~100ms vs ~10ms before) but we're doing the query a whole lot more less probably because we fixed up those mis-uses.

https://grafana.matrix.org/d/000000012/synapse?orgId=1&from=1662362659334&to=1662535459334&viewPanel=180

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably has a bigger penalty for other usage patterns, see #13942

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fixed #13942 via #13958 (another get_users_in_room mis-use)

Fixed up another mis-use in #13960

And found another mis-use that's tracked by #13967. Since this involves more than just switching it out, I'm not going to tackle. Feel free to pick it up!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up as two functions get_current_hosts_in_room and get_current_hosts_in_room_ordered so that we don't have to take the performance hit of the extra join in get_users_in_room, see #13972

synapse/storage/databases/main/roommember.py Outdated Show resolved Hide resolved
synapse/storage/databases/main/roommember.py Outdated Show resolved Hide resolved
WHERE type = 'm.room.member' AND room_id = ? AND membership = ?
SELECT c.state_key FROM current_state_events as c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only doubling isn't as bad as I was expecting, so maybe it's fine. Possibly worth checking a graph for how much time is spent on get_users_in_rooms and thinking if it would burn us if that time were doubled.

synapse/storage/databases/main/roommember.py Outdated Show resolved Hide resolved
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — suppose it would be prudent to land this after fixing up the (mis-)usages that you identified

MadLittleMods added a commit that referenced this pull request Aug 24, 2022
…ulating `join_authorised_via_users_server` of a `/make_join` request (#13606)

Use dedicated `get_local_users_in_room` to find local users when calculating `join_authorised_via_users_server` ("the authorising user for joining a restricted room") of a `/make_join` request.

Found while working on #13575 (comment) but it's not related.
MadLittleMods added a commit that referenced this pull request Aug 24, 2022
MadLittleMods added a commit that referenced this pull request Aug 24, 2022
@MadLittleMods MadLittleMods added the A-Performance Performance, both client-facing and admin-facing label Aug 27, 2022
@MadLittleMods MadLittleMods merged commit 51d732d into develop Aug 30, 2022
@MadLittleMods MadLittleMods deleted the madlittlemods/optimize-backfill-likely-domains branch August 30, 2022 06:38
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @reivilibre 🦒

Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Sep 15, 2022
Synapse 1.67.0 (2022-09-13)
===========================

This release removes using the deprecated direct TCP replication configuration
for workers. Server admins should use Redis instead. See the [upgrade
notes](https://matrix-org.github.io/synapse/v1.67/upgrade.html#upgrading-to-v1670).

The minimum version of `poetry` supported for managing source checkouts is now
1.2.0.

**Notice:** from the next major release (1.68.0) installing Synapse from a source
checkout will require a recent Rust compiler. Those using packages or
`pip install matrix-synapse` will not be affected. See the [upgrade
notes](https://matrix-org.github.io/synapse/v1.67/upgrade.html#upgrading-to-v1670).

**Notice:** from the next major release (1.68.0), running Synapse with a SQLite
database will require SQLite version 3.27.0 or higher. (The [current minimum
 version is SQLite 3.22.0](https://github.com/matrix-org/synapse/blob/release-v1.67/synapse/storage/engines/sqlite.py#L69-L78).)
See [matrix-org#12983](matrix-org#12983) and the [upgrade notes](https://matrix-org.github.io/synapse/v1.67/upgrade.html#upgrading-to-v1670) for more details.

No significant changes since 1.67.0rc1.

Synapse 1.67.0rc1 (2022-09-06)
==============================

Features
--------

- Support setting the registration shared secret in a file, via a new `registration_shared_secret_path` configuration option. ([\matrix-org#13614](matrix-org#13614))
- Change the default startup behaviour so that any missing "additional" configuration files (signing key, etc) are generated automatically. ([\matrix-org#13615](matrix-org#13615))
- Improve performance of sending messages in rooms with thousands of local users. ([\matrix-org#13634](matrix-org#13634))

Bugfixes
--------

- Fix a bug introduced in Synapse 1.13 where the [List Rooms admin API](https://matrix-org.github.io/synapse/develop/admin_api/rooms.html#list-room-api) would return integers instead of booleans for the `federatable` and `public` fields when using a Sqlite database. ([\matrix-org#13509](matrix-org#13509))
- Fix bug that user cannot `/forget` rooms after the last member has left the room. ([\matrix-org#13546](matrix-org#13546))
- Faster Room Joins: fix `/make_knock` blocking indefinitely when the room in question is a partial-stated room. ([\matrix-org#13583](matrix-org#13583))
- Fix loading the current stream position behind the actual position. ([\matrix-org#13585](matrix-org#13585))
- Fix a longstanding bug in `register_new_matrix_user` which meant it was always necessary to explicitly give a server URL. ([\matrix-org#13616](matrix-org#13616))
- Fix the running of [MSC1763](matrix-org/matrix-spec-proposals#1763) retention purge_jobs in deployments with background jobs running on a worker by forcing them back onto the main worker. Contributed by Brad @ Beeper. ([\matrix-org#13632](matrix-org#13632))
- Fix a long-standing bug that downloaded media for URL previews was not deleted while database background updates were running. ([\matrix-org#13657](matrix-org#13657))
- Fix [MSC3030](matrix-org/matrix-spec-proposals#3030) `/timestamp_to_event` endpoint to return the correct next event when the events have the same timestamp. ([\matrix-org#13658](matrix-org#13658))
- Fix bug where we wedge media plugins if clients disconnect early. Introduced in v1.22.0. ([\matrix-org#13660](matrix-org#13660))
- Fix a long-standing bug which meant that keys for unwhitelisted servers were not returned by `/_matrix/key/v2/query`. ([\matrix-org#13683](matrix-org#13683))
- Fix a bug introduced in Synapse v1.20.0 that would cause the unstable unread counts from [MSC2654](matrix-org/matrix-spec-proposals#2654) to be calculated even if the feature is disabled. ([\matrix-org#13694](matrix-org#13694))

Updates to the Docker image
---------------------------

- Update docker image to use a stable version of poetry. ([\matrix-org#13688](matrix-org#13688))

Improved Documentation
----------------------

- Improve the description of the ["chain cover index"](https://matrix-org.github.io/synapse/latest/auth_chain_difference_algorithm.html) used internally by Synapse. ([\matrix-org#13602](matrix-org#13602))
- Document how ["monthly active users"](https://matrix-org.github.io/synapse/latest/usage/administration/monthly_active_users.html) is calculated and used. ([\matrix-org#13617](matrix-org#13617))
- Improve documentation around user registration. ([\matrix-org#13640](matrix-org#13640))
- Remove documentation of legacy `frontend_proxy` worker app. ([\matrix-org#13645](matrix-org#13645))
- Clarify documentation that HTTP replication traffic can be protected with a shared secret. ([\matrix-org#13656](matrix-org#13656))
- Remove unintentional colons from [config manual](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html) headers. ([\matrix-org#13665](matrix-org#13665))
- Update docs to make enabling metrics more clear. ([\matrix-org#13678](matrix-org#13678))
- Clarify `(room_id, event_id)` global uniqueness and how we should scope our database schemas. ([\matrix-org#13701](matrix-org#13701))

Deprecations and Removals
-------------------------

- Drop support for calling `/_matrix/client/v3/rooms/{roomId}/invite` without an `id_access_token`, which was not permitted by the spec. Contributed by @Vetchu. ([\matrix-org#13241](matrix-org#13241))
- Remove redundant `_get_joined_users_from_context` cache. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13569](matrix-org#13569))
- Remove the ability to use direct TCP replication with workers. Direct TCP replication was deprecated in Synapse v1.18.0. Workers now require using Redis. ([\matrix-org#13647](matrix-org#13647))
- Remove support for unstable [private read receipts](matrix-org/matrix-spec-proposals#2285). ([\matrix-org#13653](matrix-org#13653), [\matrix-org#13692](matrix-org#13692))

Internal Changes
----------------

- Extend the release script to wait for GitHub Actions to finish and to be usable as a guide for the whole process. ([\matrix-org#13483](matrix-org#13483))
- Add experimental configuration option to allow disabling legacy Prometheus metric names. ([\matrix-org#13540](matrix-org#13540))
- Cache user IDs instead of profiles to reduce cache memory usage. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13573](matrix-org#13573), [\matrix-org#13600](matrix-org#13600))
- Optimize how Synapse calculates domains to fetch from during backfill. ([\matrix-org#13575](matrix-org#13575))
- Comment about a better future where we can get the state diff between two events. ([\matrix-org#13586](matrix-org#13586))
- Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child concurrent calls for understandable traces in Jaeger. ([\matrix-org#13588](matrix-org#13588))
- Improve performance of `@cachedList`. ([\matrix-org#13591](matrix-org#13591))
- Minor speed up of fetching large numbers of push rules. ([\matrix-org#13592](matrix-org#13592))
- Optimise push action fetching queries. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13597](matrix-org#13597))
- Rename `event_map` to `unpersisted_events` when computing the auth differences. ([\matrix-org#13603](matrix-org#13603))
- Refactor `get_users_in_room(room_id)` mis-use with dedicated `get_current_hosts_in_room(room_id)` function. ([\matrix-org#13605](matrix-org#13605))
- Use dedicated `get_local_users_in_room(room_id)` function to find local users when calculating `join_authorised_via_users_server` of a `/make_join` request. ([\matrix-org#13606](matrix-org#13606))
- Refactor `get_users_in_room(room_id)` mis-use to lookup single local user with dedicated `check_local_user_in_room(...)` function. ([\matrix-org#13608](matrix-org#13608))
- Drop unused column `application_services_state.last_txn`. ([\matrix-org#13627](matrix-org#13627))
- Improve readability of Complement CI logs by printing failure results last. ([\matrix-org#13639](matrix-org#13639))
- Generalise the `@cancellable` annotation so it can be used on functions other than just servlet methods. ([\matrix-org#13662](matrix-org#13662))
- Introduce a `CommonUsageMetrics` class to share some usage metrics between the Prometheus exporter and the phone home stats. ([\matrix-org#13671](matrix-org#13671))
- Add some logging to help track down matrix-org#13444. ([\matrix-org#13679](matrix-org#13679))
- Update poetry lock file for v1.2.0. ([\matrix-org#13689](matrix-org#13689))
- Add cache to `is_partial_state_room`. ([\matrix-org#13693](matrix-org#13693))
- Update the Grafana dashboard that is included with Synapse in the `contrib` directory. ([\matrix-org#13697](matrix-org#13697))
- Only run trial CI on all python versions on non-PRs. ([\matrix-org#13698](matrix-org#13698))
- Fix typechecking with latest types-jsonschema. ([\matrix-org#13712](matrix-org#13712))
- Reduce number of CI checks we run for PRs. ([\matrix-org#13713](matrix-org#13713))

# -----BEGIN PGP SIGNATURE-----
#
# iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmMgR2QQHGVyaWtAbWF0
# cml4Lm9yZwAKCRClQuTtGw+sCfG7B/94PwW1ChsaI8hkz/3e+93PEl/mNJ6YFaEB
# 5pP4Dh/0dipP/iKbpgNuj5xz/JFnIi8D49A8sKNnku3jk0/8AZHgqDiBgOkrN76z
# Y3awo5Q9ag4xww/105V3bhdnX1NrX8Avf6F2jchDv6/9q8wQHGBPg6DMgfZ/m/BL
# SB4dypbbNpgLykuwtWxx6YMUYH+trsXJOn/MoAqld3QcZsqkDR25wXCt9+Dr+6AT
# dPd/czi8kV8ruU59tf2K5HB7XKzBW9S3Qb3dJJmGOTTJ7ccUkN/XuTwqnII950Mo
# bSlMXjY2hqk8rKUNhGZpi9bqUkwNhMgOkZl9A0Y1XtsXx6yjy0T/
# =zSGi
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue Sep 13 10:03:32 2022 BST
# gpg:                using RSA key 053191DFF4670330465227F7A542E4ED1B0FAC09
# gpg:                issuer "erik@matrix.org"
# gpg: Can't check signature: No public key

# Conflicts:
#	synapse/config/experimental.py
#	synapse/push/bulk_push_rule_evaluator.py
#	synapse/storage/databases/main/event_push_actions.py
#	synapse/util/caches/descriptors.py
MadLittleMods added a commit that referenced this pull request Sep 29, 2022
Spawning from looking into `get_users_in_room` while investigating
#13942 (comment).

See #13575 (comment)
for the original exploration around finding `get_users_in_room` mis-uses.
erikjohnston added a commit that referenced this pull request Sep 30, 2022
Fixes #13942. Introduced in #13575.

Basically, let's only get the ordered set of hosts out of the DB if we need an ordered set of hosts. Since we split the function up the caching won't be as good, but I think it will still be fine as e.g. multiple backfill requests for the same room will hit the cache.
MadLittleMods added a commit that referenced this pull request Oct 1, 2022
…de` (#13960)

Spawning from looking into `get_users_in_room` while investigating #13942 (comment).

See #13575 (comment) for the original exploration around finding `get_users_in_room` mis-uses.

Related to the following PRs where we also cleaned up some `get_users_in_room` mis-uses:

 - #13605
 - #13608
 - #13606
 - #13958
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize how we calculate likely_domains during backfill
3 participants