Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resuming a Session while it's waiting to auto-resume from a non-fatal error can crash #6961

Closed
sync-by-unito bot opened this issue Sep 8, 2023 · 4 comments · Fixed by #7207
Closed
Assignees

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Sep 8, 2023

If you resume a session that has received a non-fatal error (i.e. InitialSyncNotCompleted) before the timer to auto-resume the session has fired and then you receive another non-fatal error you can crash when the sync client tries to create a new auto-resume timer.

Copy link
Author

sync-by-unito bot commented Nov 16, 2023

➤ Jonathan Reams commented:

What are the cases where this could happen? When you call sync_manager->reconnect() it will eventually call sync::Session::cancel_reconnect_delay(), but only if the SyncSession is in the active state, which if you've received a fatal error the SyncSession should be in an Inactive state - so I don't see how calling reconnect would re-use an abandoned SessionWrapper.

Copy link
Author

sync-by-unito bot commented Nov 18, 2023

➤ danieltabacaru commented:

Can't there be a race between the SyncSession becoming inactive (as a result of SessionImpl receiving a fatal error) and reconnect being called? There may be other cases when this can happen though.

@sync-by-unito sync-by-unito bot assigned jbreams and unassigned michael-wb Nov 20, 2023
Copy link
Author

sync-by-unito bot commented Nov 20, 2023

➤ Jonathan Reams commented:

Got it, I've reproduced this in a test. I think I know what to do now.

@sync-by-unito sync-by-unito bot changed the title Session can reconnect after receiving a fatal error Resuming a Session while it's waiting to auto-resume from a non-fatal error can crash Dec 13, 2023
Copy link
Author

sync-by-unito bot commented Dec 13, 2023

➤ Jonathan Reams commented:

I was able to reproduce the original description of this ticket but couldn't actually get it to crash. So I went back to the logs and was able to reproduce the exact crash in RNET-1051 by calling SyncSession::handle_reconnect() after two successive retryable errors without having the try again timer complete between the two of them. I've added a test that reliably reproduces this and the fix is to just cancel the try again timer if we're explicitly resuming the session.

@jbreams jbreams linked a pull request Dec 13, 2023 that will close this issue
4 tasks
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants