Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 'cooperative transactions' to help avoid SQLITE_BUSY #902

Closed
wants to merge 3 commits into from

Conversation

@mhammond
Copy link
Member

commented Apr 3, 2019

This adds the concept of "cooperative transactions", which I hope I've explained fairly well in the comments in the new coop_transaction.rs.

Note that as it stands, logins is broken - this is because I'm hoping to avoid places code from accidentally using the "wrong" unchecked_transaction - there are some comments in conn_ext.rs about this and I welcome all ideas about how to resolve that tension. Also, naming is hard, so please bikeshed.

There's also a new example, which I'm not 100% sure should be committed, but it's here for your pleasure - it should be quite easy to make that example work without the rest of the patch applied, in which case you will probably be able the reproduce the thread starvation which currently happens.

@mhammond mhammond self-assigned this Apr 3, 2019

@mhammond mhammond requested review from thomcc and linacambridge Apr 3, 2019

@thomcc
Copy link
Contributor

left a comment

I'm not sure that this is good, TBH. Another apporach would be for us to put sync() on the writer connection in the public API, which would move the locking to somewhere it's more easy to reason about. In particular, the locking strategy is complex enough here that I worry we could be introducing a deadlock.

components/places/src/db/coop_transaction.rs Outdated Show resolved Hide resolved
components/support/sql/src/conn_ext.rs Outdated Show resolved Hide resolved

mhammond added some commits Apr 8, 2019

@thomcc

This comment has been minimized.

Copy link
Contributor

commented on components/support/sql/src/conn_ext.rs in a50bd03 Apr 8, 2019

Should we document that this is discouraged now (and forbidden in places)?

@mhammond

This comment has been minimized.

Copy link
Member Author

commented Apr 8, 2019

I've put a new version up based on our discussions in slack. I think we all agree that this "2 writers" approach kinda sucks, but we don't really have a good alternative, and this patch is a consequence of that.

As per the discussion with @lina, this patch now does a single retry after failing to get the DB lock with a BUSY error.

@mhammond mhammond requested a review from thomcc Apr 8, 2019

@mhammond mhammond marked this pull request as ready for review Apr 8, 2019

@linacambridge
Copy link
Member

left a comment

I'll look more tomorrow. 😄

///
/// One additional wrinkle here is that even if there was exactly one writer,
/// there's still a possibility of SQLITE_BUSY if the database is being
/// checkpointed. So we handle that case and perform exactly 1 retry.

This comment has been minimized.

Copy link
@linacambridge

linacambridge Apr 8, 2019

Member

Hmm, I wasn't sure if 1 retry would be sufficient...are you thinking just give up in that case, it's not worth looping?

This comment has been minimized.

Copy link
@mhammond

mhammond Apr 9, 2019

Author Member

I should have mentioned this - I'm assuming that the 5 second "wait for lock" still applies to both calls even in the checkpoint case. I can't find any docs on that, but if it's true, then we have a 10 second wait already. The logging was supposed to help make that clear (although actually seeing the logs remains an issue).

Sadly I couldn't work out how to simulate that in the example.

self.store.update_local_items(descendants, deletions)?;
let mut tx = self.store.db.time_chunked_transaction()?;
self.store
.update_local_items(descendants, deletions, &mut tx)?;
self.store.stage_local_items_to_upload()?;

This comment has been minimized.

Copy link
@linacambridge

linacambridge Apr 8, 2019

Member

(The reason we can't maybe_commit between update_local_items and stage_local_items_to_upload is because stage_local_items_to_upload stages outgoing bookmarks using the change counter. At the end of a merge, everything that needs to be uploaded has changeCounter = 1. If the main connection sneaks in a write between the time we call update_local_items, and stage_local_items_to_upload, it might bump the change counter for items that we haven't merged yet, and upload inconsistent state to the server).

@thomcc

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

Obsoleted by #949

@thomcc thomcc closed this Apr 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.