-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(client): Better throttle snapshot stop strategy #1251
Conversation
Thanks for spotting this issue and submitting a fix! Although unsetting the throttled snapshot resolves the issue I am a bit reluctant about using that as a solution cause ultimately if it's getting called after termination we have some more underlying timing issue which should probably be solved instead. The way I understand it, the throttled snapshot is called in four places:
I think one main issue right now is that in To solve 2) and 3), I think moving the lock acquisition for the sake of waiting on any current snapshots after clearing the interval and notifier subscriptions should work: electric/clients/typescript/src/satellite/process.ts Lines 339 to 342 in d2df5fe
To solve 4) I think there needs to be an additional, separate call to unsubscribe from client listeners (since The 1) case is only an issue if Let me know if this sounds reasonable to you, the solution you've proposed should work but we'd rather solve the underlying issues first as they could cause problems elsewhere! P.S. indeed a test would be great for catching this, doing a timing mock for the 2), a notifier event mock for 3), or a client event mock for 4) I believe should capture this error somehow |
@msfstef Thank you for reviewing! While doing the test I noticed that I wasn't able to properly trigger it. I then noticed that the problem is that the clean function in the context doesn't actually close the underlying db connection, so even if the db files are removed, the connection still works, preventing seeing the issue of snapshoting when the db is closed. I used the same mechanism for postgres and attach the stop function to the sqlite tests context. Making this change already manifested problem number 1 in some of the tests, so I included my proposed solution for that here as well. |
It looks like some pglite tests can now fail because the db connection is actually closed when tearing down each test. For instance this simple test, because the connection promise is not awaited, when doing test('can use sub in JWT', async (t) => {
const { satellite, authState } = t.context
await t.notThrowsAsync(async () => {
await startSatellite(
satellite,
authState,
insecureAuthToken({ sub: 'test-userB' })
)
})
}) |
@davidmartos96 thank you for updating the tests and everything! You're right, there are some tests that are not waiting for the connection promise to fulfil and that causes issues. From my testing it seems the following tests in await t.notThrowsAsync(async () => {
const { connectionPromise } = await startSatellite(
satellite,
authState,
insecureAuthToken({ user_id: 'test-userA' })
)
await connectionPromise
})
There seems to be one additional test for postgres failing, might require the following fix in let stopPromise: Promise<void>
// We use the database directory as the name
// because it uniquely identifies the DB
return {
db,
stop: () => {
if (stopPromise) return stopPromise
stopPromise = pg.stop()
return stopPromise
},
} With those fixes in place it seems tests are good! I will separately review the updated changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left some comments about ordering and some nits, if you could add a changeset for this as well that would be great! (pnpm changeset
in the root directory, should be a patch change)
@msfstef Tests seem to be ok now, thank you for the comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you for your contribution! This was a good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch and solution, awesome! 💯
Left some minor comments.
@kevin-dp Done! |
We've encountered a race condition that can occurr if, after cancelling the throttle when stopping the process, some other piece of code calls throttlesnapshot. That's because cancelling the throttle doesn't prevent from scheduling new ones. The proposed solution is to assign undefined to the function, so places that would normally trigger a snapshot, like the poll interval, would just do nothing until properly closed. On top of this, we found that `setClientListeners` is only called when instantiating the process, but it's cleaned up when calling `stop` with the `disconnect` function. This makes it so that stopping and then starting the process won't have some of the client listeners properly set up. The proposed solution is to just move the client listeners intialization to the `start` function. I'm not sure how to test this. It would probably need some way to mock client events after a stop and a start.
We've encountered a race condition that can occurr if, after cancelling the throttle when stopping the process, some other piece of code calls throttlesnapshot. That's because cancelling the throttle doesn't prevent from scheduling new ones.
The proposed solution is to assign undefined to the function, so places that would normally trigger a snapshot, like the poll interval, would just do nothing until properly closed.
On top of this, we found that
setClientListeners
is only called when instantiating the process, but it's cleaned up when callingstop
with thedisconnect
function. This makes it so that stopping and then starting the process won't have some of the client listeners properly set up. The proposed solution is to just move the client listeners intialization to thestart
function.I'm not sure how to test this. It would probably need some way to mock client events after a stop and a start.