Skip to content

Restart sync on joinLayer for immediate room detection#13

Merged
ThomasHalwax merged 6 commits intomainfrom
fix/sync-restart-on-join
Mar 23, 2026
Merged

Restart sync on joinLayer for immediate room detection#13
ThomasHalwax merged 6 commits intomainfrom
fix/sync-restart-on-join

Conversation

@axel-krapotke
Copy link
Copy Markdown
Contributor

Problem

After joinLayer(), the sync-gated content mechanism in Project.start() could not detect the room because the running long-poll's rooms filter didn't include it. The poll had to time out (30s) before the next one picked up the updated filter.

Solution

Reorder joinLayer() to follow: filter update → sync restart → join

  1. Add room to idMapping (temporary self-mapping) so it appears in the rooms filter
  2. Call restartSync() to abort the current long-poll
  3. Perform the actual POST /join
  4. The restarted sync sees the join event → stateEvents check fires → content() loads operations → received() delivers them

Implementation

  • httpApi.sync(): Accept optional AbortSignal, pass to ky
  • TimelineAPI.restartSync(): Abort the current iteration's AbortController
  • TimelineAPI.stream(): Per-iteration AbortController, catch abort as restart (not error)
  • Project.joinLayer(): Reordered as described above

Tests

  • 66 unit tests passing
  • 2 E2E tests: immediate content() + sync-gated received() (1.5s)

joinLayer() now follows the sequence: add room to filter (idMapping),
restart the sync long-poll, then perform the actual join. This ensures
the restarted poll includes the room and sees the join event when it
arrives, enabling the sync-gated content fetch to work reliably.

Changes:
- http-api: pass AbortSignal through to ky for sync requests
- timeline-api: add restartSync() to abort current long-poll; stream
  loop catches the abort and re-enters with updated filter
- project: reorder joinLayer() to update filter before join
- E2E test: verify sync-gated received() delivers content after join
restartSync() now returns a Promise that resolves once the stream loop
has applied the updated filter. joinLayer() awaits this before calling
join(), ensuring the sync request with the new room is already in
flight when the join event arrives on the server.
When a room appears in sync with limited:true but /messages returns
empty (remote server hasn't backfilled yet), keep the room in
pendingContent and retry on subsequent sync cycles instead of giving
up immediately. Gives up after 10 retries.

Changed pendingContent from Set to Map to track retry state.
content() now accepts a 'from' pagination token. When provided, it
paginates backward (dir=b) instead of forward from the start. This
fixes federation scenarios where forward pagination returns empty
because the remote server hasn't backfilled yet, but the prev_batch
token from sync provides a valid backward pagination starting point.

The pending content processing now captures prevBatches from sync
responses and passes them through to content().
After the first sync appearance, keep retrying content fetch on
subsequent sync cycles even if the room doesn't appear again.
This handles the case where encrypted events are fetched but
Megolm keys haven't arrived yet via Historical Key Sharing.

Retries are throttled to every 5 seconds (configurable via
contentRetryIntervalMs constructor option) and give up after
10 attempts (~50 seconds).
matrix.org can take 2+ minutes for federation backfill. Previous
10x5s window (~50s) was too tight. Now 20x10s (~3.5 minutes).
@ThomasHalwax ThomasHalwax merged commit 0c6acb9 into main Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants