LL members can get out of sync with server on limited sync response #7211

ara4n · 2018-08-22T09:49:36Z

I invited uhoreg into a private dev room; he clearly joined, but in practice LL clients show him to still be invited. We see RRs from him though.

bwindels · 2018-08-29T12:37:48Z

Can't repro in a room on localhost server with 4 joined + 1 invited members. I did notice another bug, where one of the joined members in the room forgot about the invited user. The invite event was in the room timeline, but not present in the room state nor the accumulated sync data. Seems like a Riot bug. Looking into it now, but these two issues couldn't be related you think?

bwindels · 2018-08-29T14:49:48Z

UPDATE: see comment below first

The problem described above is also hard to reproduce, but I noticed one thing in the sync response after accepting an invite (so when the room appear under "join" for the first time): the state events are not complete. I don't receive the state at the point where I join. Instead I only get what seem to be the 3 first state events (m.room.create, m.room.member for room creator, m.room.power_levels). I would have expected the invite member event for another member to be in there that was invite just before me.
More concretely, the timeline looks like this, and the syncing user is Bruno3:

Bruno1 joined the room.
Bruno1 made future room history visible to all room members.
Bruno1 changed the room name to the happy few.
Bruno1 invited Bruno2.
Bruno1 invited Bruno5.
Bruno1 invited Bruno3.
Bruno2 joined the room.
Bruno3 joined the room.

Is this how it is supposed to work @ara4n ?
I tried to reproduce on develop with LL turned off but there state.events was just empty?
I thought you'd get the full room state (apart from joined members with LL turned on) after accepting a room, much like when doing an initial sync for a joined room? Maybe I'm missing something.

bwindels · 2018-08-29T15:34:50Z

Creating a ticket for the issue described above, as it's probably not related to what happend with uhoreg where join event seems to have been fabricated. #7243

ara4n · 2018-09-08T16:13:59Z

Right, looks like you might have found another bug with this "sync on joining is wrong".

Separately, I've reproduced the original bug - it looks to be a problem with limited syncs. To reproduce:

Alice turns on LL
Alice creates room
Bob joins room
Alice pauses her Riot
Bob invites Charlie
Charlie joins, and can even say something
Bob says 30 messages
Alice unpauses her Riot, and so gets a catchup (limited) /sync
Alice observes that the /sync has an empty sync block entirely.

This is very sytest-able and just a screwup in the impl, i guess; I didn't explicitly test limited syncs.

ara4n · 2018-09-08T19:50:18Z

Actually, my issue isn't really synapse's fault. It's correct that the state is missing for the members who joined during the gap, given by definition if they're not in the timeline, and the current contract for LL is that you only get state for members who are in the timeline. Instead, it's an edge case that after a limited (aka gappy) sync, we would have to do a full bg resync of the members list to go and fill in the details of any changes which happened during the gap.

So, to fix this, either we:

Decide that gappy incremental syncs shouldn't lazy-load members (which is questionable; why are we loading members for things which aren't visible in the timeline?)
Re-trigger a bg members update after a gappy incremental sync. This has potential performance problems, as gappy syncs happen whenever a client is getting behind (either because of slow performance, slow network, or a hiatus in connectivity - e.g. laptop being slept). Background membership syncs are obviously quite heavy, so it seems like a bit of a pyrrhic victory to swap out an incremental state sync across the gap with a full resync because of this API decision.

I'm not sure what the right soln is, tbh. On one hand, just making synapse include the state during a gappy sync is kinda appealing as it will fix this without changes on the clients, and typically there won't be too much of a delta. On the other hand, most sophisticated clients (e.g. Riot on all platforms) only ever do initial sync at initial launch, and thereafter always do an incremental sync - so if they're offline for weeks and then relaunch, the relaunch could be delayed by needlessly pulling in tonnes of membership events.

Having written it out, i think we should optimise for the smallest possible syncs, given we might have better ways in future for pulling in the membership data selectively (rather than the very blunt hack of currently resyncing the whole membership list in the bg). In other words, responsivity is more important than bandwidth utilisation.

ara4n · 2018-09-08T20:07:29Z

i've split out the other issue into #7303

bwindels · 2018-09-10T08:52:15Z

@bmarty pointed me to the exact same issue on friday, suggesting we should refetch the members when receiving a gap in a timeline. I had it on my todo-list to check how riot-web handles this (e.g. if it always back-fills the gap, but that seems unlikely.)

ara4n · 2018-09-10T16:31:08Z

Conclusion from IRLing on this this morning is that we don't really have enough metrics on how fast/slow incremental sync acts in the wild in order to make an educated decision on how to address this.
So:

Matthew to add metrics on synapse to track how many sync responses are limited, and how many membership events they contain. I expect this to look like "incr sync for 40 rooms, 30 gaps, sent 243054 member events" in the synapse logs

meanwhile, we want to see just how slow incremental syncs are (in terms of the server processing & network bandwidth) time on the clients in general, for which we don't have metrics
i suggest we track on clients:

time taken for initial /sync and incr /sync to return
how many membership events there are, and how many rooms were limited?
how long it takes to process the /sync result
how big the /sync result is in bytes (compressed & uncompressed ideally)

in the short term, I'll disable LL on incremental syncs entirely, and so fix the bug like that.

and then once we've gathered metrics from both server & clients, we can decide what the correct solution actually based on the data, where are options are:

sync /members every time there's a gappy sync
give up on LL for incr sync (i.e. keep the status quo)
pass a since= param to /members (or use /sync) to keep the members list updated.

ara4n · 2018-09-10T22:25:23Z

(i guess another option could be to ditch doing the bg /members and just pull in the profile data when needed. however, this doesn't play well with nick autocomplete, offline support, rapidly filtering the membership list and other places where you might need the full membership list)

See element-hq/element-web#7211 for details

bwindels · 2018-09-12T09:10:52Z

Remaining action point for riot-web is to add some tracking as described below:

so:
Matthew to add metrics on synapse to track how many sync responses are limited, and how many membership events they contain. I expect this to look like "incr sync for 40 rooms, 30 gaps, sent 243054 member events" in the synapse logs

meanwhile, we want to see just how slow incremental syncs are (in terms of the server processing & network bandwidth) time on the clients in general, for which we don't have metrics
i suggest we track on clients:

time taken for initial /sync and incr /sync to return

how many membership events there are, and how many rooms were limited?

how long it takes to process the /sync result

in the short term, I'll disable LL on incremental syncs entirely, and so fix the bug like that
and then once we've gathered metrics from both server & clients, we can decide what the correct solution actually based on the data, where are options are:

sync /members every time there's a gappy sync

give up on LL for incre sync

pass a since= param to /members (or use /sync) to keep the members list updated.

does that match everyone's understanding from the call just then?

bwindels · 2018-09-12T15:35:10Z

Split of the above comment into #7327 as this is tracking the synapse work which is done.

ara4n · 2018-09-24T09:16:03Z

This happened

richvdh · 2024-02-23T09:52:41Z

For the record, this bug in Element was worked around by disabling LL for gappy syncs in Synapse (matrix-org/synapse#3840).

In other words: it's still a bug in Element, but it gets away with it because Synapse doesn't implement this particular bit of the spec. Other spec-compliant implementations could cause the bug to resurface. Indeed, there are reports of buggy behavior with Conduit which could be caused by such bugs.

richvdh · 2024-04-04T16:45:14Z

For the record, this bug in Element was worked around by disabling LL for gappy syncs in Synapse (matrix-org/synapse#3840).

In other words: it's still a bug in Element, but it gets away with it because Synapse doesn't implement this particular bit of the spec. Other spec-compliant implementations could cause the bug to resurface. Indeed, there are reports of buggy behavior with Conduit which could be caused by such bugs.

I have filed #27285 about this.

bwindels added T-Defect A-Lazy-Loading and removed T-Defect labels Aug 22, 2018

bwindels added this to In Progress in Web App Team Aug 29, 2018

bwindels mentioned this issue Aug 29, 2018

Lazy loading: room avatar based on available members, not room summary #7227

Closed

bwindels removed this from In Progress in Web App Team Aug 29, 2018

bwindels added the X-Cannot-Reproduce label Sep 3, 2018

ara4n added T-Defect P1 and removed X-Cannot-Reproduce labels Sep 8, 2018

ara4n mentioned this issue Sep 8, 2018

State is incomplete on joining a room with LL #7303

Closed

manuroe mentioned this issue Sep 10, 2018

Membership joins went missing from the timeline with LL on element-hq/element-ios#2024

Closed

bwindels changed the title ~~invited users may play badly with LL~~ LL members can get out of sync with server on limited sync response Sep 10, 2018

lampholder added this to Not Started in Web App Team Sep 10, 2018

lampholder moved this from Not Started to In Progress in Web App Team Sep 10, 2018

ara4n self-assigned this Sep 10, 2018

ara4n added a commit to matrix-org/sytest that referenced this issue Sep 10, 2018

disable gappy incr LL test due to element-hq/element-web#7211

aa65de6

ara4n added a commit to matrix-org/synapse that referenced this issue Sep 10, 2018

disable LL for incr syncs, and log incr sync stats

8e8dfb4

See element-hq/element-web#7211 for details

ara4n added a commit to matrix-org/synapse that referenced this issue Sep 11, 2018

disable LL for incr syncs, and log incr sync stats (#3840)

c4aa9d6

See element-hq/element-web#7211 for details

bwindels mentioned this issue Sep 11, 2018

Add e2e tests for lazy loading scenarios #7287

Closed

5 tasks

lampholder added the size:1 label Sep 11, 2018

lampholder added this to the RW013 milestone Sep 11, 2018

lampholder moved this from In Progress to In Test in Web App Team Sep 12, 2018

lampholder moved this from In Test to Done in Web App Team Sep 12, 2018

ara4n closed this as completed Sep 24, 2018

ara4n mentioned this issue Dec 24, 2018

Spec lazy_load_members and include_redundant_members matrix-org/matrix-spec-proposals#1758

Merged

turt2live mentioned this issue May 28, 2019

Spec lazy-loading room members matrix-org/matrix-spec-proposals#2035

Merged

squahtx mentioned this issue Aug 9, 2022

compute_state_delta pulls out the full state when doing gappy syncs matrix-org/synapse#13380

Open

richvdh mentioned this issue Dec 6, 2022

lazy-loading behaviour for /sync is incorrectly specified matrix-org/matrix-spec#942

Open

richvdh mentioned this issue Feb 23, 2024

Fix bug in calculating state for non-gappy syncs element-hq/synapse#16942

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LL members can get out of sync with server on limited sync response #7211

LL members can get out of sync with server on limited sync response #7211

ara4n commented Aug 22, 2018

bwindels commented Aug 29, 2018

bwindels commented Aug 29, 2018 •

edited

Loading

bwindels commented Aug 29, 2018

ara4n commented Sep 8, 2018

ara4n commented Sep 8, 2018

ara4n commented Sep 8, 2018

bwindels commented Sep 10, 2018 •

edited

Loading

ara4n commented Sep 10, 2018

ara4n commented Sep 10, 2018

bwindels commented Sep 12, 2018

bwindels commented Sep 12, 2018

ara4n commented Sep 24, 2018

richvdh commented Feb 23, 2024 •

edited

Loading

richvdh commented Apr 4, 2024

LL members can get out of sync with server on limited sync response #7211

LL members can get out of sync with server on limited sync response #7211

Comments

ara4n commented Aug 22, 2018

bwindels commented Aug 29, 2018

bwindels commented Aug 29, 2018 • edited Loading

bwindels commented Aug 29, 2018

ara4n commented Sep 8, 2018

ara4n commented Sep 8, 2018

ara4n commented Sep 8, 2018

bwindels commented Sep 10, 2018 • edited Loading

ara4n commented Sep 10, 2018

ara4n commented Sep 10, 2018

bwindels commented Sep 12, 2018

bwindels commented Sep 12, 2018

ara4n commented Sep 24, 2018

richvdh commented Feb 23, 2024 • edited Loading

richvdh commented Apr 4, 2024

bwindels commented Aug 29, 2018 •

edited

Loading

bwindels commented Sep 10, 2018 •

edited

Loading

richvdh commented Feb 23, 2024 •

edited

Loading