New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
members: Simplify disambiguation logic when loading member list #3184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually correct? it seems that the v3::get_members::Request
can take filters in, so clearing the full map when handling the response would be incorrect in that case?
Good point, it works with the current version, because no filters are used. We should probably add a comment there to warn against changing it. Or alternatively we can complicate the logic a bit and load all previous members, combine those lists and use that one. |
This sounds like a bad future footgun, so I'd rather not do that. At the limit, we could pass a parameter to
If we never get rid of previous members, this solution sounds nice! |
Just to clarify, the filters just allow you to add/remove memberships (e.g. to only see banned users). |
You're right, and we can't only have a |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3184 +/- ##
==========================================
- Coverage 83.85% 83.84% -0.01%
==========================================
Files 232 232
Lines 24004 24010 +6
==========================================
+ Hits 20129 20132 +3
- Misses 3875 3878 +3 ☔ View full report in Codecov by Sentry. |
Keys querying/memory store/100000 members
The performance difference is probably much more noticeable on less powerful hardware or with slightly different testing conditions. At least the actual app runs much better, contact @jmartinesp for more info. I had to copy the synced_client method from a test because I could not import it in the benchmark code. |
@timokoesters thanks again for your work on this! Should I continue the work in this PR? @bnjbvr just to confirm, would we want a version of the code where:
I did some experiments by batch loading room membership events in a single query and while it was a bit faster than the current unoptimized version, the difference wasn't that big and it might be explained by me missing some special case. I believe what consumes so much time is the deserialization of these events. Maybe we'll need yet another persisted cache for display name in room -> user ids in the long term. |
Yes, but I think @bnjbvr would rather have the option to parse any response and apply the optimised/unoptimised disambiguation algorithm depending on the request parameters used, given his reply above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change + writing the benchmark! Many comments about the benchmark below, the new processing of the members response looks fine to me 👍
benchmarks/benches/crypto_bench.rs
Outdated
/// Mount a Mock on the given server to handle the `GET /sync` endpoint with | ||
/// an optional `since` param that returns a 200 status code with the given | ||
/// response body. | ||
async fn mock_sync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to mock an endpoint for the benchmark.
benchmarks/benches/crypto_bench.rs
Outdated
let room = client.get_room(room_id).unwrap(); | ||
room.mark_members_missing(); | ||
room.sync_members().await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually sends the request and does a few check before measuring what we're really interested in measuring, that is, BaseClient::receive_members
. How about we create a synthetic ruma::api::membership::get_member_events::v3::Response
, and feed it into BaseClient::receive_members
directly here? We'd be much much closer to what we actually want to measure
benchmarks/benches/crypto_bench.rs
Outdated
@@ -301,6 +398,6 @@ fn criterion() -> Criterion { | |||
criterion_group! { | |||
name = benches; | |||
config = criterion(); | |||
targets = keys_query, keys_claiming, room_key_sharing, devices_missing_sessions_collecting, | |||
targets = load_members_benchmark, keys_query, keys_claiming, room_key_sharing, devices_missing_sessions_collecting, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted, this benchmark should go into a different file. Could be its own file, maybe room.rs
or something like that.
benchmarks/benches/store_bench.rs
Outdated
@@ -31,7 +31,7 @@ const NUM_JOINED_ROOMS: usize = 10000; | |||
const NUM_STRIPPED_JOINED_ROOMS: usize = 10000; | |||
|
|||
pub fn restore_session(c: &mut Criterion) { | |||
let runtime = Builder::new_multi_thread().build().expect("Can't create runtime"); | |||
let runtime = Builder::new_multi_thread().enable_all().build().expect("Can't create runtime"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this has been changed.
crates/matrix-sdk-base/src/error.rs
Outdated
/// An error caused by calling matrix-rust-sdk functions with invalid parameters | ||
#[error("matrix-rust-sdk function was called with invalid parameters")] | ||
ApiMisuse, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to be much more precise than that, because there's absolutely no way to understand what goes wrong, if looking at a rageshake that contains this error. I'd be fine with renaming this error InvalidReceiveMembersParameters
.
crates/matrix-sdk/src/test_utils.rs
Outdated
@@ -7,7 +7,7 @@ use ruma::{api::MatrixVersion, device_id, user_id}; | |||
use url::Url; | |||
|
|||
use crate::{ | |||
config::RequestConfig, | |||
config::{RequestConfig, SyncSettings}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely unused?
benchmarks/Cargo.toml
Outdated
serde_json = { workspace = true } | ||
tempfile = "3.3.0" | ||
tokio = { version = "1.24.2", default-features = false, features = ["rt-multi-thread"] } | ||
wiremock = "0.5.21" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This should be reverted, after we stopped mocking endpoints for the benchmark)
Clarified in private chat: I'm fine with the current approach after all, because this |
crates/matrix-sdk-base/src/client.rs
Outdated
if request.membership.is_some() || request.not_membership.is_some() || request.at.is_some() | ||
{ | ||
return Err(Error::ApiMisuse); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a comment why we have this restriction, and maybe link to this PR so future us can look at the previous state, if we wanted to add the new public APIs later.
I think the latest changes should fix the review comments. I tried to make the benchmark so it does the bare minimum, but I'm not sure if anything can be improved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'd be curious to see a run of the new benchmark before/after, to see if the trend is similar to what we had before 😇
Also, question for @poljar: I just saw this comment in receive_all_members
:
// However, this makes a new problem occur where setting the member events here
// potentially races with the sync.
I think it's still an issue here, the response could be processed while a sync is going on, so the processing of both could be entangled, leading to confusing (but temporary) discrepancies (e.g. an old profile name is shown for a bit of time, until the next sync_members). Is that acceptable? Or do we take the sync lock during processing of that response? (or another new lock)
Actually, they're quite similar because there aren't any I can try to add those to the store too in the benchmark. |
Thanks for proposing to do this; if you think it's meaningful enough and you have a bit of time, let's do it. Otherwise, I'm already happy we do have some benchmark :-) |
To be honest, I'm trying to, but I can't wrap my head around all the |
Before I forget about these changes, I'll post them here: Original algorithm (compared to a previous run with less samples):
Optimised algorithm:
PS: thanks @bnjbvr for helping me get this working! |
8af9f10
to
2ce479c
Compare
2ce479c
to
b985fce
Compare
bb2c3c5
to
ec854e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, thanks for enhancing the benchmark and running it again! I'll address my final comment, and let's get it merged \o/
When all room members are loaded, we do not need an incremental member update. We know that parsing the /members response will only lead to more ambiguous names, not less. And because /members returns the complete list, we can directly use that list as the ambiguation map.
This improves the performance in my emulator from 56s to 9s and on a less performant device from 11mins to 11s (Tested experimentally on Matrix HQ using log statements in element android. If I have time, I will write a proper benchmark tomorrow.