Viewing room list in a Space / Searching for rooms in a Space is very slow #11694

reivilibre · 2022-01-06T10:24:15Z

Still need to find out what's going on here or even what the related requests are, but can't see an issue about it and this is surely something we ought to resolve if we want people to enjoy using spaces.

Description

The 'Explore Space' screen in Element is very slow, even if typing in a search term.

Steps to reproduce

go to a (decent size?) space: my example is Element's space
- I suspect this issue may not be visible from the matrix.org homeserver since I expect someone would have said something by now!
open the 'Explore rooms' view in Element
type in a keyword
watch spinner for 2 minutes (in my case; probably varies depending upon how far down the room is in the tree?)
a result appears (spinner is still going, so it's probably not a complete result set. However this was the room I wanted in my situation so I could finish here.)

I would expect searching to be much faster than that; let's say 10 seconds or less to be generous given that I suppose I can understand that it may be making a few round trips to remote homeservers.

Version information

Homeserver: librepush.net
Version: 1.49.0, but this issue has been here for as long as I can remember the feature existing
Install method: pip from PyPI
Platform: amd64 VPS

clokep · 2022-01-06T12:21:27Z

I would expect searching to be much faster than that; let's say 10 seconds or less to be generous given that I suppose I can understand that it may be making a few round trips to remote homeservers.

Searching is done client side, so your client is paginating through the entire space finding all results. I Don't know if they're shown as they get matched or at the end though.

See #10523 also for performance with this API.

reivilibre · 2022-02-23T17:03:35Z

I got some time to look into this... Here are some notes

Client-side view of what's going on

The slow request seems to be this endpoint (with some example parameters):

GET https://matrix.librepush.net/_matrix/client/unstable/org.matrix.msc2946/rooms/!OJBlkJuUrsKnqtNnTi%3Amatrix.org/hierarchy?suggested_only=false&from=GiSyKxltKfZBfrhSVEXQVjUb&limit=20

Waiting: 9.43 sec.

Each one only includes a few results, so it makes scrolling through the list quite slow and frustrating. You get a drip of a few new rooms every 10 seconds. It's not great but at least it feels less dead?
Search is performed client-side as a filter, performing back-to-back requests if needed. This really exacerbates the problem when performing a search. This is probably quite poor even if the requests weren't so slow. This is the equivalent of a table scan to search the space, but with high latency and tiny (20) block sizes...!

Synapse-side view of what's going on

Let's have a look in Jaeger (N.B. This is not the exact same request as the one mentioned in the first section, but it feels fairly reproducible so it probably serves well as a guide).

First off I notice that it says '40 Errors'.

It seems to start off with 2 fairly slow requests to t2l.io:

That adds up to 2.5 sec straight away.

Then there are many small database queries, followed by a 120 ms request to vector.modular.im:

Now I start noticing all the errors that the summary talked about...

Looks like it's trying loads of different hosts (from t=3.06s to t=10.61s) before finally getting to an answer (t2l.io, which takes 3.8s to respond)*:

It also takes a wee while in get_current_state_ids (250 ms) at the end.

Then we get our 20 rooms and start the cycle again :D with the next client-server request.

Logs from matrix.org for one of these requests:

2022-02-23 16:24:25,834 - synapse.handlers.room_summary - 926 - INFO - GET-1763521- - room !uUTpgiQEFiCNjzmgEe:matrix.org is unpeekable and requester librepush.net is not a member / not allowed to join, omitting from summary
2022-02-23 16:24:25,834 - synapse.http.server - 95 - INFO - GET-1763521 - <XForwardedForRequest at 0x7f3fcfa44658 method='GET' uri='/_matrix/federation/v1/hierarchy/%21uUTpgiQEFiCNjzmgEe%3Amatrix.org?suggested_only=false' clientproto='HTTP/1.1' site='16102'> SynapseError: 404 - Unknown room: !uUTpgiQEFiCNjzmgEe:matrix.org
2022-02-23 16:24:25,836 - synapse.access.http.16102 - 448 - INFO - GET-1763521 - 2a02:c205:2022:1137::1 - 16102 - {librepush.net} Processed request: 0.010sec/0.001sec (0.001sec, 0.001sec) (0.001sec/0.005sec/3) 78B 404 "GET /_matrix/federation/v1/hierarchy/%21uUTpgiQEFiCNjzmgEe%3Amatrix.org?suggested_only=false HTTP/1.1" "Synapse/1.53.0" [0 dbevts]

It might be intended for us to get a 404 in this case (I guess?), but it's not helping that it seems to try the fallback unstable endpoint after being served a 404!

*I'm not convinced this is exactly right. I think it tries servers for each room it wants, then moves on to the next, but I haven't studied the code.
In any case, sequentially trying many servers and many rooms seems to be the main bottleneck here.

Some solution ideas:

shut off requesting from the unstable endpoint or be smarter about falling back to it (if we know the server supports the stable version, don't fall back?)?
- should we even be falling back on a 404? If we have no choice, it'd be good to learn from this mistake in the future and make future proposals able to distinguish between an error and the feature not being supported.
race servers against each other (otherwise we're always vulnerable to picking a poor server), or otherwise adaptively change preferred servers
- I don't know how unreasonable the response times were from the other servers in this situation.
can we combine these requests into one somehow? It feels a pity that we need to do so many round-trips to the same servers.
- alternatively, wonder if HTTP request pipelining is something we could be doing?
read-ahead: have the homeserver predictively 'read' future blocks, so that they're ready to be received by the client with much lower latency

clokep · 2022-02-23T17:34:45Z

This seems to be exacerbated by a few things:

The fallback from /hierarchy -> unstable /hierarchy -> /spaces can be slow if you try all three (and still don't get an answer).
The above, but made worse by matrix-org/matrix-doc#1492 and maybe we're treating unknown data as an unknown endpoint (so trying situations which we don't need to).
Search over /hierarchy is client-side (this is just at thing we haven't gotten to yet FTR).

We also could be a bit more aggressive about caching results over federation, right now they're only cached for a few minutes.

clokep · 2022-02-24T15:53:43Z

Note that #12073 should help with the first part (as there's no longer a /spaces endpoint to try).

I think the clokep/hierarchy-404s branch might help a bit with the 404s issue, but not sure if that's the problem.

reivilibre · 2022-02-24T16:51:43Z

@clokep kindly slipped me a sneaky branch to try (the one above) and empirically seems to have brought down most request times from ~~10s to 3~~4s, with some outliers. :)

reivilibre · 2022-10-04T09:29:01Z

I noticed when looking at the code that if we're in a space/room, we use our local copy and recurse into children. If we're not in a space/room, we request that room from a remote homeserver which includes all its subtree of spaces.

A knock-on effect is that joining a space or subspace makes the hierarchy slower because we send one HTTP request per child, rather than just requesting the parent and having the children included in the response.

The code seems like it has already been well-written and set up to cache remote results, so perhaps a simple(-seeming) way to make things faster would be to request the root space (or any parent space where we need to look up ≥2 or 3 of its children on a remote) from a remote homeserver. In the happy case, this will pre-cache the children and prevent us from sending one request per child. If the remote doesn't answer properly, then it's only a cache and we can fall back to making individual requests as before.

dkasak · 2022-12-16T15:20:26Z

Search over /hierarchy is client-side (this is just at thing we haven't gotten to yet FTR).

I would just like to add that all things being equal, we should prefer exploring solutions which don't rely on server-side search, so that we don't dig ourselves into a hole which prevents us from encrypting room state in the future.

reivilibre added the X-Needs-Info This issue is blocked awaiting information from the reporter label Jan 6, 2022

clokep added A-Spaces Hierarchical organization of rooms T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jan 14, 2022

clokep mentioned this issue Feb 7, 2022

RoomHierarchyRestServlet hammers the DB way too hard #11919

Closed

reivilibre self-assigned this Feb 23, 2022

reivilibre removed the X-Needs-Info This issue is blocked awaiting information from the reporter label Feb 23, 2022

reivilibre removed their assignment Feb 23, 2022

clokep mentioned this issue Feb 24, 2022

Properly failover for unknown endpoints from Conduit/Dendrite #12077

Merged

MadLittleMods mentioned this issue May 10, 2022

Gracefully handle when rooms in space pagination token expires - Unable to paginate rooms in the space /hierarchy -> 400 Unknown pagination token element-hq/element-web#22138

Open

clokep mentioned this issue Oct 21, 2022

Add tracing to hierarchy endpoint. #14264

Closed

MadLittleMods added S-Major Major functionality / product severely impaired, no satisfactory workaround. O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience labels Nov 29, 2022

matrixbot mentioned this issue Dec 21, 2023

Viewing room list in a Space / Searching for rooms in a Space is very slow element-hq/synapse#11694

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Viewing room list in a Space / Searching for rooms in a Space is very slow #11694

Viewing room list in a Space / Searching for rooms in a Space is very slow #11694

reivilibre commented Jan 6, 2022

clokep commented Jan 6, 2022

reivilibre commented Feb 23, 2022

clokep commented Feb 23, 2022

clokep commented Feb 24, 2022

reivilibre commented Feb 24, 2022

reivilibre commented Oct 4, 2022

dkasak commented Dec 16, 2022

Viewing room list in a Space / Searching for rooms in a Space is very slow #11694

Viewing room list in a Space / Searching for rooms in a Space is very slow #11694

Comments

reivilibre commented Jan 6, 2022

Description

Steps to reproduce

Version information

clokep commented Jan 6, 2022

reivilibre commented Feb 23, 2022

Client-side view of what's going on

Synapse-side view of what's going on

clokep commented Feb 23, 2022

clokep commented Feb 24, 2022

reivilibre commented Feb 24, 2022

reivilibre commented Oct 4, 2022

dkasak commented Dec 16, 2022