Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

members: Simplify disambiguation logic when loading member list #3184

Merged
merged 13 commits into from Mar 12, 2024

Conversation

timokoesters
Copy link
Contributor

When all room members are loaded, we do not need an incremental member update. We know that parsing the /members response will only lead to more ambiguous names, not less. And because /members returns the complete list, we can directly use that list as the ambiguation map.

This improves the performance in my emulator from 56s to 9s and on a less performant device from 11mins to 11s (Tested experimentally on Matrix HQ using log statements in element android. If I have time, I will write a proper benchmark tomorrow.

  • Public API changes documented in changelogs (optional)

@timokoesters timokoesters requested a review from a team as a code owner March 4, 2024 16:09
@timokoesters timokoesters requested review from poljar and removed request for a team March 4, 2024 16:09
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually correct? it seems that the v3::get_members::Request can take filters in, so clearing the full map when handling the response would be incorrect in that case?

@timokoesters
Copy link
Contributor Author

Good point, it works with the current version, because no filters are used. We should probably add a comment there to warn against changing it. Or alternatively we can complicate the logic a bit and load all previous members, combine those lists and use that one.

@bnjbvr
Copy link
Member

bnjbvr commented Mar 4, 2024

We should probably add a comment there to warn against changing it.

This sounds like a bad future footgun, so I'd rather not do that. At the limit, we could pass a parameter to receive_members that indicates the outgoing request didn't use any particular filter or at parameter (i.e. it's not using pagination).

Or alternatively we can complicate the logic a bit and load all previous members, combine those lists and use that one.

If we never get rid of previous members, this solution sounds nice!

@timokoesters
Copy link
Contributor Author

Just to clarify, the filters just allow you to add/remove memberships (e.g. to only see banned users).
And the at parameter does not paginate the /members response, instead it allows you to load the full list of members at a specific point in the past.
I think these filters are not necessary for the displaying the member list.

@bnjbvr
Copy link
Member

bnjbvr commented Mar 5, 2024

You're right, and we can't only have a receive_members function that works only for the specific use case of ElementX :-) We need to cater for all the uses of this endpoint, which implies supporting all the possible parameters. We can optimize it for specific use cases, though, hence my previous suggestion :-)

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 88.37209% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 83.84%. Comparing base (cb6b420) to head (0a19e6b).
Report is 9 commits behind head on main.

Files Patch % Lines
crates/matrix-sdk-base/src/client.rs 86.48% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3184      +/-   ##
==========================================
- Coverage   83.85%   83.84%   -0.01%     
==========================================
  Files         232      232              
  Lines       24004    24010       +6     
==========================================
+ Hits        20129    20132       +3     
- Misses       3875     3878       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timokoesters
Copy link
Contributor Author

timokoesters commented Mar 5, 2024

cargo bench members gives this improvement over main:

Keys querying/memory store/100000 members

before:
                        time:   [488.39 ms 490.88 ms 493.58 ms]
                        thrpt:  [202.60 Kelem/s 203.72 Kelem/s 204.76 Kelem/s]

after:
                        time:   [386.51 ms 389.36 ms 392.45 ms]
                        thrpt:  [254.81 Kelem/s 256.83 Kelem/s 258.72 Kelem/s]
change:
                        time:   [-21.435% -20.680% -19.973%] (p = 0.00 < 0.05)
                        thrpt:  [+24.959% +26.072% +27.283%]
                        Performance has improved.

The performance difference is probably much more noticeable on less powerful hardware or with slightly different testing conditions. At least the actual app runs much better, contact @jmartinesp for more info.

I had to copy the synced_client method from a test because I could not import it in the benchmark code.
Also note that I've just added the benchmark to a random file, it should be moved to its own file.

@jmartinesp
Copy link
Contributor

@timokoesters thanks again for your work on this! Should I continue the work in this PR?

@bnjbvr just to confirm, would we want a version of the code where:

  • When we load all members skip loading the room member events from the store and deserializing them, as an optimization.
  • Use the current diffing method for any other cases (membership filter, at parameter used, etc.).

I did some experiments by batch loading room membership events in a single query and while it was a bit faster than the current unoptimized version, the difference wasn't that big and it might be explained by me missing some special case. I believe what consumes so much time is the deserialization of these events.

Maybe we'll need yet another persisted cache for display name in room -> user ids in the long term.

@timokoesters
Copy link
Contributor Author

timokoesters commented Mar 6, 2024

I think the diffing algorithm is not necessary because I renamed the function to be explicitly for loading all members, see bc6315c (I discussed this with @bnjbvr)

But yes, I'd be happy if you continue this PR.

@jmartinesp
Copy link
Contributor

I think the diffing algorithm is not necessary because I renamed the function to be explicitly for loading all members

Yes, but I think @bnjbvr would rather have the option to parse any response and apply the optimised/unoptimised disambiguation algorithm depending on the request parameters used, given his reply above.

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change + writing the benchmark! Many comments about the benchmark below, the new processing of the members response looks fine to me 👍

/// Mount a Mock on the given server to handle the `GET /sync` endpoint with
/// an optional `since` param that returns a 200 status code with the given
/// response body.
async fn mock_sync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to mock an endpoint for the benchmark.

Comment on lines 116 to 118
let room = client.get_room(room_id).unwrap();
room.mark_members_missing();
room.sync_members().await.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually sends the request and does a few check before measuring what we're really interested in measuring, that is, BaseClient::receive_members. How about we create a synthetic ruma::api::membership::get_member_events::v3::Response, and feed it into BaseClient::receive_members directly here? We'd be much much closer to what we actually want to measure

@@ -301,6 +398,6 @@ fn criterion() -> Criterion {
criterion_group! {
name = benches;
config = criterion();
targets = keys_query, keys_claiming, room_key_sharing, devices_missing_sessions_collecting,
targets = load_members_benchmark, keys_query, keys_claiming, room_key_sharing, devices_missing_sessions_collecting,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, this benchmark should go into a different file. Could be its own file, maybe room.rs or something like that.

@@ -31,7 +31,7 @@ const NUM_JOINED_ROOMS: usize = 10000;
const NUM_STRIPPED_JOINED_ROOMS: usize = 10000;

pub fn restore_session(c: &mut Criterion) {
let runtime = Builder::new_multi_thread().build().expect("Can't create runtime");
let runtime = Builder::new_multi_thread().enable_all().build().expect("Can't create runtime");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this has been changed.

Comment on lines 60 to 62
/// An error caused by calling matrix-rust-sdk functions with invalid parameters
#[error("matrix-rust-sdk function was called with invalid parameters")]
ApiMisuse,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to be much more precise than that, because there's absolutely no way to understand what goes wrong, if looking at a rageshake that contains this error. I'd be fine with renaming this error InvalidReceiveMembersParameters.

@@ -7,7 +7,7 @@ use ruma::{api::MatrixVersion, device_id, user_id};
use url::Url;

use crate::{
config::RequestConfig,
config::{RequestConfig, SyncSettings},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely unused?

serde_json = { workspace = true }
tempfile = "3.3.0"
tokio = { version = "1.24.2", default-features = false, features = ["rt-multi-thread"] }
wiremock = "0.5.21"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This should be reverted, after we stopped mocking endpoints for the benchmark)

@bnjbvr
Copy link
Member

bnjbvr commented Mar 7, 2024

I think the diffing algorithm is not necessary because I renamed the function to be explicitly for loading all members

Yes, but I think @bnjbvr would rather have the option to parse any response and apply the optimised/unoptimised disambiguation algorithm depending on the request parameters used, given his reply above.

Clarified in private chat: I'm fine with the current approach after all, because this receive_all_members method is only used to handle a response for a members request that had no filters and so on. And we've added checks in the code to make sure it was used only like that; if we changed our mind later, we'd run into those errors. To make it more general, we'd need to make it public + add new APIs to give access to processing this response, and it's a bunch of work that we don't need right now (but could do/take as PRs in the future).

Comment on lines 1113 to 1123
if request.membership.is_some() || request.not_membership.is_some() || request.at.is_some()
{
return Err(Error::ApiMisuse);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a comment why we have this restriction, and maybe link to this PR so future us can look at the previous state, if we wanted to add the new public APIs later.

@bnjbvr bnjbvr removed the request for review from poljar March 7, 2024 13:10
@jmartinesp
Copy link
Contributor

I think the latest changes should fix the review comments. I tried to make the benchmark so it does the bare minimum, but I'm not sure if anything can be improved.

@bnjbvr bnjbvr self-requested a review March 7, 2024 16:57
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'd be curious to see a run of the new benchmark before/after, to see if the trend is similar to what we had before 😇

Also, question for @poljar: I just saw this comment in receive_all_members:

// However, this makes a new problem occur where setting the member events here
// potentially races with the sync.

I think it's still an issue here, the response could be processed while a sync is going on, so the processing of both could be entangled, leading to confusing (but temporary) discrepancies (e.g. an old profile name is shown for a bit of time, until the next sync_members). Is that acceptable? Or do we take the sync lock during processing of that response? (or another new lock)

@jmartinesp
Copy link
Contributor

LGTM, I'd be curious to see a run of the new benchmark before/after, to see if the trend is similar to what we had before

Actually, they're quite similar because there aren't any profiles or display_names in the DB so their query and deserialization time is almost 0...

I can try to add those to the store too in the benchmark.

@bnjbvr
Copy link
Member

bnjbvr commented Mar 8, 2024

I can try to add those to the store too in the benchmark.

Thanks for proposing to do this; if you think it's meaningful enough and you have a bit of time, let's do it. Otherwise, I'm already happy we do have some benchmark :-)

@jmartinesp
Copy link
Contributor

I can try to add those to the store too in the benchmark.

Thanks for proposing to do this; if you think it's meaningful enough and you have a bit of time, let's do it. Otherwise, I'm already happy we do have some benchmark :-)

To be honest, I'm trying to, but I can't wrap my head around all the SyncStateEvent, Raw<SyncStateEvent> etc. we need to include in the StateChanges and insert in the store for this to work.

@jmartinesp
Copy link
Contributor

jmartinesp commented Mar 8, 2024

Before I forget about these changes, I'll post them here:

Original algorithm (compared to a previous run with less samples):

Benchmarking Test/receive_members/100000 members: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 527.9s, or reduce sample count to 10.
Test/receive_members/100000 members
                        time:   [11.743 s 12.093 s 12.507 s]
                        thrpt:  [7.9955 Kelem/s 8.2695 Kelem/s 8.5158 Kelem/s]
                 change:
                        time:   [-1.2228% +2.0233% +5.3398%] (p = 0.39 > 0.05)
                        thrpt:  [-5.0692% -1.9832% +1.2379%]
                        No change in performance detected.
Found 12 outliers among 50 measurements (24.00%)
  1 (2.00%) low mild
  4 (8.00%) high mild
  7 (14.00%) high severe

Optimised algorithm:

Benchmarking Test/receive_members/100000 members: Warming up for 3.0000 s
Warning: Unable to complete 50 samples in 5.0s. You may wish to increase target time to 83.2s, or reduce sample count to 10.
Test/receive_members/100000 members
                        time:   [1.6927 s 1.6955 s 1.6983 s]
                        thrpt:  [58.884 Kelem/s 58.980 Kelem/s 59.076 Kelem/s]
                 change:
                        time:   [-86.453% -85.979% -85.557%] (p = 0.00 < 0.05)
                        thrpt:  [+592.38% +613.21% +638.16%]
                        Performance has improved.

PS: thanks @bnjbvr for helping me get this working!

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, thanks for enhancing the benchmark and running it again! I'll address my final comment, and let's get it merged \o/

benchmarks/benches/room_bench.rs Outdated Show resolved Hide resolved
@bnjbvr bnjbvr enabled auto-merge (squash) March 12, 2024 10:01
@bnjbvr bnjbvr merged commit 2f58cb1 into main Mar 12, 2024
34 checks passed
@bnjbvr bnjbvr deleted the timo/disambiguation branch March 12, 2024 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants