members: Simplify disambiguation logic when loading member list #3184

timokoesters · 2024-03-04T16:09:38Z

When all room members are loaded, we do not need an incremental member update. We know that parsing the /members response will only lead to more ambiguous names, not less. And because /members returns the complete list, we can directly use that list as the ambiguation map.

This improves the performance in my emulator from 56s to 9s and on a less performant device from 11mins to 11s (Tested experimentally on Matrix HQ using log statements in element android. If I have time, I will write a proper benchmark tomorrow.

Public API changes documented in changelogs (optional)

bnjbvr

Is this actually correct? it seems that the v3::get_members::Request can take filters in, so clearing the full map when handling the response would be incorrect in that case?

timokoesters · 2024-03-04T18:08:08Z

Good point, it works with the current version, because no filters are used. We should probably add a comment there to warn against changing it. Or alternatively we can complicate the logic a bit and load all previous members, combine those lists and use that one.

bnjbvr · 2024-03-04T18:19:11Z

We should probably add a comment there to warn against changing it.

This sounds like a bad future footgun, so I'd rather not do that. At the limit, we could pass a parameter to receive_members that indicates the outgoing request didn't use any particular filter or at parameter (i.e. it's not using pagination).

Or alternatively we can complicate the logic a bit and load all previous members, combine those lists and use that one.

If we never get rid of previous members, this solution sounds nice!

timokoesters · 2024-03-05T09:07:34Z

Just to clarify, the filters just allow you to add/remove memberships (e.g. to only see banned users).
And the at parameter does not paginate the /members response, instead it allows you to load the full list of members at a specific point in the past.
I think these filters are not necessary for the displaying the member list.

bnjbvr · 2024-03-05T09:10:29Z

You're right, and we can't only have a receive_members function that works only for the specific use case of ElementX :-) We need to cater for all the uses of this endpoint, which implies supporting all the possible parameters. We can optimize it for specific use cases, though, hence my previous suggestion :-)

codecov · 2024-03-05T16:02:55Z

Codecov Report

Attention: Patch coverage is 88.37209% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 83.84%. Comparing base (cb6b420) to head (0a19e6b).
Report is 9 commits behind head on main.

Files	Patch %	Lines
crates/matrix-sdk-base/src/client.rs	86.48%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3184      +/-   ##
==========================================
- Coverage   83.85%   83.84%   -0.01%     
==========================================
  Files         232      232              
  Lines       24004    24010       +6     
==========================================
+ Hits        20129    20132       +3     
- Misses       3875     3878       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

timokoesters · 2024-03-05T16:39:05Z

cargo bench members gives this improvement over main:

Keys querying/memory store/100000 members

before:
                        time:   [488.39 ms 490.88 ms 493.58 ms]
                        thrpt:  [202.60 Kelem/s 203.72 Kelem/s 204.76 Kelem/s]

after:
                        time:   [386.51 ms 389.36 ms 392.45 ms]
                        thrpt:  [254.81 Kelem/s 256.83 Kelem/s 258.72 Kelem/s]
change:
                        time:   [-21.435% -20.680% -19.973%] (p = 0.00 < 0.05)
                        thrpt:  [+24.959% +26.072% +27.283%]
                        Performance has improved.

The performance difference is probably much more noticeable on less powerful hardware or with slightly different testing conditions. At least the actual app runs much better, contact @jmartinesp for more info.

I had to copy the synced_client method from a test because I could not import it in the benchmark code.
Also note that I've just added the benchmark to a random file, it should be moved to its own file.

jmartinesp · 2024-03-06T07:17:05Z

@timokoesters thanks again for your work on this! Should I continue the work in this PR?

@bnjbvr just to confirm, would we want a version of the code where:

When we load all members skip loading the room member events from the store and deserializing them, as an optimization.
Use the current diffing method for any other cases (membership filter, at parameter used, etc.).

I did some experiments by batch loading room membership events in a single query and while it was a bit faster than the current unoptimized version, the difference wasn't that big and it might be explained by me missing some special case. I believe what consumes so much time is the deserialization of these events.

Maybe we'll need yet another persisted cache for display name in room -> user ids in the long term.

timokoesters · 2024-03-06T11:42:14Z

I think the diffing algorithm is not necessary because I renamed the function to be explicitly for loading all members, see bc6315c (I discussed this with @bnjbvr)

But yes, I'd be happy if you continue this PR.

jmartinesp · 2024-03-06T12:00:29Z

I think the diffing algorithm is not necessary because I renamed the function to be explicitly for loading all members

Yes, but I think @bnjbvr would rather have the option to parse any response and apply the optimised/unoptimised disambiguation algorithm depending on the request parameters used, given his reply above.

bnjbvr

Thanks for the change + writing the benchmark! Many comments about the benchmark below, the new processing of the members response looks fine to me 👍

bnjbvr · 2024-03-07T12:59:58Z

benchmarks/benches/crypto_bench.rs

+/// Mount a Mock on the given server to handle the `GET /sync` endpoint with
+/// an optional `since` param that returns a 200 status code with the given
+/// response body.
+async fn mock_sync(


We shouldn't need to mock an endpoint for the benchmark.

bnjbvr · 2024-03-07T13:01:59Z

benchmarks/benches/crypto_bench.rs

+            let room = client.get_room(room_id).unwrap();
+            room.mark_members_missing();
+            room.sync_members().await.unwrap();


This actually sends the request and does a few check before measuring what we're really interested in measuring, that is, BaseClient::receive_members. How about we create a synthetic ruma::api::membership::get_member_events::v3::Response, and feed it into BaseClient::receive_members directly here? We'd be much much closer to what we actually want to measure

bnjbvr · 2024-03-07T13:02:18Z

benchmarks/benches/crypto_bench.rs

@@ -301,6 +398,6 @@ fn criterion() -> Criterion {
 criterion_group! {
    name = benches;
    config = criterion();
-    targets = keys_query, keys_claiming, room_key_sharing, devices_missing_sessions_collecting,
+    targets = load_members_benchmark, keys_query, keys_claiming, room_key_sharing, devices_missing_sessions_collecting,


As noted, this benchmark should go into a different file. Could be its own file, maybe room.rs or something like that.

bnjbvr · 2024-03-07T13:02:31Z

benchmarks/benches/store_bench.rs

@@ -31,7 +31,7 @@ const NUM_JOINED_ROOMS: usize = 10000;
 const NUM_STRIPPED_JOINED_ROOMS: usize = 10000;

 pub fn restore_session(c: &mut Criterion) {
-    let runtime = Builder::new_multi_thread().build().expect("Can't create runtime");
+    let runtime = Builder::new_multi_thread().enable_all().build().expect("Can't create runtime");


Not sure why this has been changed.

bnjbvr · 2024-03-07T13:04:14Z

crates/matrix-sdk-base/src/error.rs

+    /// An error caused by calling matrix-rust-sdk functions with invalid parameters
+    #[error("matrix-rust-sdk function was called with invalid parameters")]
+    ApiMisuse,


We'd need to be much more precise than that, because there's absolutely no way to understand what goes wrong, if looking at a rageshake that contains this error. I'd be fine with renaming this error InvalidReceiveMembersParameters.

bnjbvr · 2024-03-07T13:04:51Z

crates/matrix-sdk/src/test_utils.rs

@@ -7,7 +7,7 @@ use ruma::{api::MatrixVersion, device_id, user_id};
 use url::Url;

 use crate::{
-    config::RequestConfig,
+    config::{RequestConfig, SyncSettings},


This is likely unused?

bnjbvr · 2024-03-07T13:05:11Z

benchmarks/Cargo.toml

 serde_json = { workspace = true }
 tempfile = "3.3.0"
 tokio = { version = "1.24.2", default-features = false, features = ["rt-multi-thread"] }
+wiremock = "0.5.21"


(This should be reverted, after we stopped mocking endpoints for the benchmark)

bnjbvr · 2024-03-07T13:09:43Z

I think the diffing algorithm is not necessary because I renamed the function to be explicitly for loading all members

Yes, but I think @bnjbvr would rather have the option to parse any response and apply the optimised/unoptimised disambiguation algorithm depending on the request parameters used, given his reply above.

Clarified in private chat: I'm fine with the current approach after all, because this receive_all_members method is only used to handle a response for a members request that had no filters and so on. And we've added checks in the code to make sure it was used only like that; if we changed our mind later, we'd run into those errors. To make it more general, we'd need to make it public + add new APIs to give access to processing this response, and it's a bunch of work that we don't need right now (but could do/take as PRs in the future).

bnjbvr · 2024-03-07T13:10:29Z

crates/matrix-sdk-base/src/client.rs

+        if request.membership.is_some() || request.not_membership.is_some() || request.at.is_some()
+        {
+            return Err(Error::ApiMisuse);
+        }


Let's also add a comment why we have this restriction, and maybe link to this PR so future us can look at the previous state, if we wanted to add the new public APIs later.

jmartinesp · 2024-03-07T16:52:39Z

I think the latest changes should fix the review comments. I tried to make the benchmark so it does the bare minimum, but I'm not sure if anything can be improved.

bnjbvr

LGTM, I'd be curious to see a run of the new benchmark before/after, to see if the trend is similar to what we had before 😇

Also, question for @poljar: I just saw this comment in receive_all_members:

// However, this makes a new problem occur where setting the member events here
// potentially races with the sync.

I think it's still an issue here, the response could be processed while a sync is going on, so the processing of both could be entangled, leading to confusing (but temporary) discrepancies (e.g. an old profile name is shown for a bit of time, until the next sync_members). Is that acceptable? Or do we take the sync lock during processing of that response? (or another new lock)

jmartinesp · 2024-03-08T15:02:27Z

LGTM, I'd be curious to see a run of the new benchmark before/after, to see if the trend is similar to what we had before

Actually, they're quite similar because there aren't any profiles or display_names in the DB so their query and deserialization time is almost 0...

I can try to add those to the store too in the benchmark.

bnjbvr · 2024-03-08T15:30:51Z

I can try to add those to the store too in the benchmark.

Thanks for proposing to do this; if you think it's meaningful enough and you have a bit of time, let's do it. Otherwise, I'm already happy we do have some benchmark :-)

jmartinesp · 2024-03-08T15:32:37Z

I can try to add those to the store too in the benchmark.

Thanks for proposing to do this; if you think it's meaningful enough and you have a bit of time, let's do it. Otherwise, I'm already happy we do have some benchmark :-)

To be honest, I'm trying to, but I can't wrap my head around all the SyncStateEvent, Raw<SyncStateEvent> etc. we need to include in the StateChanges and insert in the store for this to work.

jmartinesp · 2024-03-08T18:09:00Z

Before I forget about these changes, I'll post them here:

Original algorithm (compared to a previous run with less samples):

Benchmarking Test/receive_members/100000 members: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 527.9s, or reduce sample count to 10.
Test/receive_members/100000 members
                        time:   [11.743 s 12.093 s 12.507 s]
                        thrpt:  [7.9955 Kelem/s 8.2695 Kelem/s 8.5158 Kelem/s]
                 change:
                        time:   [-1.2228% +2.0233% +5.3398%] (p = 0.39 > 0.05)
                        thrpt:  [-5.0692% -1.9832% +1.2379%]
                        No change in performance detected.
Found 12 outliers among 50 measurements (24.00%)
  1 (2.00%) low mild
  4 (8.00%) high mild
  7 (14.00%) high severe

Optimised algorithm:

Benchmarking Test/receive_members/100000 members: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 83.2s, or reduce sample count to 10.
Test/receive_members/100000 members
                        time:   [1.6927 s 1.6955 s 1.6983 s]
                        thrpt:  [58.884 Kelem/s 58.980 Kelem/s 59.076 Kelem/s]
                 change:
                        time:   [-86.453% -85.979% -85.557%] (p = 0.00 < 0.05)
                        thrpt:  [+592.38% +613.21% +638.16%]
                        Performance has improved.

PS: thanks @bnjbvr for helping me get this working!

…ation happen

bnjbvr

Yay, thanks for enhancing the benchmark and running it again! I'll address my final comment, and let's get it merged \o/

benchmarks/benches/room_bench.rs

timokoesters requested a review from a team as a code owner March 4, 2024 16:09

timokoesters requested review from poljar and removed request for a team March 4, 2024 16:09

bnjbvr requested changes Mar 4, 2024

View reviewed changes

bnjbvr requested changes Mar 7, 2024

View reviewed changes

bnjbvr reviewed Mar 7, 2024

View reviewed changes

bnjbvr removed the request for review from poljar March 7, 2024 13:10

bnjbvr self-requested a review March 7, 2024 16:57

bnjbvr reviewed Mar 8, 2024

View reviewed changes

timokoesters and others added 6 commits March 8, 2024 19:10

members: Simplify disambiguation logic

0b34ad4

members: Prevent api misuse for receive_members

ea393ce

members: Benchmark receive_all_members performance

48410f1

sdk: remove unused import

89ce749

sdk-base: rename ApiMisuse error to InvalidReceiveMembersParameters

d7b467a

benchmarks: extract the member loading benchmark to room_bench.rs

bcb946f

jmartinesp added 3 commits March 8, 2024 19:10

benchmarks: remove wiremock

7fb4bff

sdk-base: fix format

9081f03

sdk-base: try fixing tests

31c0166

jmartinesp force-pushed the timo/disambiguation branch from 8af9f10 to 2ce479c Compare March 8, 2024 18:11

benchmark: Provide some data to the store so the search and disambigu…

b985fce

…ation happen

jmartinesp force-pushed the timo/disambiguation branch from 2ce479c to b985fce Compare March 8, 2024 19:07

benchmark: fix clippy

ec854e2

jmartinesp force-pushed the timo/disambiguation branch from bb2c3c5 to ec854e2 Compare March 8, 2024 21:33

jmartinesp requested a review from bnjbvr March 11, 2024 10:51

bnjbvr approved these changes Mar 12, 2024

View reviewed changes

benchmarks/benches/room_bench.rs Outdated Show resolved Hide resolved

bnjbvr added 2 commits March 12, 2024 10:58

benchmark: use a constant for MEMBERS_IN_ROOM

e5f406c

sdk(style): reduce indent in receive_all_members

0a19e6b

bnjbvr enabled auto-merge (squash) March 12, 2024 10:01

bnjbvr merged commit 2f58cb1 into main Mar 12, 2024
34 checks passed

bnjbvr deleted the timo/disambiguation branch March 12, 2024 10:15

jmartinesp mentioned this pull request Mar 13, 2024

Member list in large rooms takes a very long time to load element-hq/element-x-android#851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

members: Simplify disambiguation logic when loading member list #3184

members: Simplify disambiguation logic when loading member list #3184

timokoesters commented Mar 4, 2024

bnjbvr left a comment

timokoesters commented Mar 4, 2024

bnjbvr commented Mar 4, 2024

timokoesters commented Mar 5, 2024

bnjbvr commented Mar 5, 2024

codecov bot commented Mar 5, 2024 •

edited

timokoesters commented Mar 5, 2024 •

edited

jmartinesp commented Mar 6, 2024

timokoesters commented Mar 6, 2024 •

edited

jmartinesp commented Mar 6, 2024

bnjbvr left a comment

bnjbvr Mar 7, 2024

bnjbvr Mar 7, 2024

bnjbvr Mar 7, 2024

bnjbvr Mar 7, 2024

bnjbvr Mar 7, 2024

bnjbvr Mar 7, 2024

bnjbvr Mar 7, 2024

bnjbvr commented Mar 7, 2024

bnjbvr Mar 7, 2024

jmartinesp commented Mar 7, 2024

bnjbvr left a comment

jmartinesp commented Mar 8, 2024

bnjbvr commented Mar 8, 2024

jmartinesp commented Mar 8, 2024

jmartinesp commented Mar 8, 2024 •

edited

bnjbvr left a comment

members: Simplify disambiguation logic when loading member list #3184

members: Simplify disambiguation logic when loading member list #3184

Conversation

timokoesters commented Mar 4, 2024

bnjbvr left a comment

Choose a reason for hiding this comment

timokoesters commented Mar 4, 2024

bnjbvr commented Mar 4, 2024

timokoesters commented Mar 5, 2024

bnjbvr commented Mar 5, 2024

codecov bot commented Mar 5, 2024 • edited

Codecov Report

timokoesters commented Mar 5, 2024 • edited

jmartinesp commented Mar 6, 2024

timokoesters commented Mar 6, 2024 • edited

jmartinesp commented Mar 6, 2024

bnjbvr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnjbvr commented Mar 7, 2024

Choose a reason for hiding this comment

jmartinesp commented Mar 7, 2024

bnjbvr left a comment

Choose a reason for hiding this comment

jmartinesp commented Mar 8, 2024

bnjbvr commented Mar 8, 2024

jmartinesp commented Mar 8, 2024

jmartinesp commented Mar 8, 2024 • edited

bnjbvr left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 5, 2024 •

edited

timokoesters commented Mar 5, 2024 •

edited

timokoesters commented Mar 6, 2024 •

edited

jmartinesp commented Mar 8, 2024 •

edited