Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

begin is not cancel-safe #2054

Closed
madadam opened this issue Aug 16, 2022 · 7 comments · Fixed by #2057
Closed

begin is not cancel-safe #2054

madadam opened this issue Aug 16, 2022 · 7 comments · Fixed by #2057
Labels

Comments

@madadam
Copy link
Contributor

madadam commented Aug 16, 2022

Bug Description

At least in sqlite, when the future returned from Connection::begin is cancelled, the BEGIN statement might still get executed but the Transaction struct is not created and so there is no way to commit / rollback it.

Minimal Reproduction

let mut conn = pool.acquire().await?;

tokio::select! {
    tx = conn.begin() => {
        // use the transaction ...
    }
    _ = some_other_async_function() => {
       // the transaction might have already been started at this point 
       // but we don't have any `Transaction` to commit / rollback it.
    }
}

Notes

I'm not sure whether sqlx even aims to be cancel-safe but I think it should be. Cancel-unsafety can be a source of bugs that are very hard to track down. As a minimum, it should be clearly documented that a function is not cancel safe.

Also, it's possible this issue exists for other db backends as well but I haven't tested it (I'm only using sqlite).

Idea for a fix

Modify Worker::begin somehow like this:

  1. Send the Begin command
  2. Create a RAII guard which sends a Rollback on drop
  3. await the response from the command
  4. Disable the RAII guard
  5. Return the response

However, this assumes that flume::Sender::send_async is cancel-safe, that is, it's guaranteed that if it's cancelled, the message is not sent. The flume docs currently don't talk about this. I started a discussion about it here zesterer/flume#104.

Info

  • SQLx version: 0.6.1
  • SQLx features enabled: default-features = false, features = ["runtime-tokio-rustls", "sqlite"]
  • Database server and version: SQLite
  • Operating system: Ubuntu 20.04.4 LTS
  • rustc --version: rustc 1.61.0 (fe5b13d68 2022-05-18)
@abonander
Copy link
Collaborator

In general, cancellation safety has been a thorn in our side and it's been very difficult to cover it from all possible angles.

However, in this case I'm rather surprised as I would expect there to only be two outcomes to a .send_async() call:

  • It waits until a slot in the channel is available (if necessary) and then it sends the message immediately and returns, or:
  • The call is cancelled and the message is not sent.

If Flume is allowing for a third possibility where the message is sent but the method suspends and gives an opportunity for the call to be cancelled, I think that's a case of broken expectations on Flume's part.

@abonander
Copy link
Collaborator

Except I completely forgot about the second .await for the response from the call, which is likely where the cancellation is happening. Duh.

@abonander
Copy link
Collaborator

But yeah, it's highly likely that begin() is not cancellation safe for any of the database drivers so I would avoid select!() with it.

@madadam
Copy link
Contributor Author

madadam commented Aug 17, 2022

cancellation safety has been a thorn in our side

Same here :)

If Flume is allowing for a third possibility where the message is sent but the method suspends and gives an opportunity for the call to be cancelled

That seems to be the case: zesterer/flume#104 (comment)

But yeah, it's highly likely that begin() is not cancellation safe for any of the database drivers so I would avoid select!() with it.

So you don't think it's worth trying to fix it? I think there are ways to make it work, even for other drivers than sqlite.

@madadam
Copy link
Contributor Author

madadam commented Aug 17, 2022

I realized that there is a similar problem with cancellation of commit and rollback. I created a PR that fixes all three cases: #2057. It's sqlite specific though.

@DXist
Copy link
Contributor

DXist commented Sep 8, 2022

The proposed concurrency safe API could be cancellation safe.

The whole transaction pipeline is executed by "commit finalizer" method that could create RAII guard that includes rollback logic. In Postgres case when transaction pipeline could squash into single implicit transaction it is executed in a single batch and there is no need to cancel it - it's either applied / rolled backed by server on query error or rolled backed by server after TCP stack closes servers half of connection (when cancellation happens concurrently with client half TCP connection failure and implicit pipeline is not sent fully).

@NOBLES5E
Copy link

Should this issue be reopened since for other databases, begin is still not cancel-safe?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants