Skip to content

Conversation

@rkistner
Copy link
Contributor

@rkistner rkistner commented Sep 3, 2024

A race conditions with websocket handling could cause the route handler to "hang" indefinitely, after the underlying connection is closed. This surfaces in two ways:

  1. The "concurrent connections" metric reports these as connections, giving a slow increase in connection count over time.
  2. The syncSemaphore could permanently hold on to a lock for these, eventually leading to no locks being available, and no users able to sync.

While the race condition is rare, it does build up over time.

In tests I could trigger the race condition in two places:

  1. Closing a connection before observer.registerListener() is called, by setting lifetime: 0 on the client. I'm not sure whether this ever happened in practice.
  2. Close the connection after next batch of data is retrieved, but before the if (requestedN <= 0) check. This caused the handler to indefinitely wait for the next request() or cancel() to be triggered.

The fix here is to:

  1. Use an AbortController instead of the observer to handle stream cancellation.
  2. Check if (signal.aborted) to see whether the stream is already cancelled, wherever we listen for cancellation.
  3. Throw an error when unable to acquire the syncSemaphore within 30 seconds, instead of just waiting silently.

@changeset-bot
Copy link

changeset-bot bot commented Sep 3, 2024

🦋 Changeset detected

Latest commit: 306b6d8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@powersync/service-rsocket-router Patch
@powersync/service-core Patch
@powersync/service-image Patch
test-client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

stevensJourney
stevensJourney previously approved these changes Sep 3, 2024
Copy link
Collaborator

@stevensJourney stevensJourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 🙏 Thanks for the fix.

@rkistner rkistner marked this pull request as ready for review September 3, 2024 13:29
@rkistner rkistner merged commit ce768cd into main Sep 3, 2024
@rkistner rkistner deleted the fix-mutex branch September 3, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants