Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for transactions on datasets #109

Merged
merged 1 commit into from
Nov 1, 2020

Conversation

ttencate
Copy link
Contributor

@ttencate ttencate commented Oct 27, 2020

A must have for performance with some GDAL drivers. Writing 22655 points to a GeoPackage:

  • without transactions (= one transaction per feature): 1.9 seconds
  • with transactions (= one transaction for the entire dataset): 0.058 seconds

I've just copied the C++ API here, but I'm sure there's a more rustilicious way of doing it. Maybe layer.with_transaction(|| { ... }), where the lambda returns a Result, and we commit if it's Ok and rollback if it's Err?

If the code can't easily be put into a lambda, I think the simple wrapper functions could still be useful.

@jdroenner
Copy link
Member

i think this is a good addition to the crate. could you add some documentation to the methods?

@ttencate ttencate changed the title Add support for transactions on vector layers Add support for transactions on datasets Oct 28, 2020
@ttencate
Copy link
Contributor Author

Looking more closely at the GDAL docs, transactions on individual layers are discouraged because they are rarely supported by drivers. So I changed this PR to implement transactions at the dataset level instead, added documentation, and some tests.

@ttencate
Copy link
Contributor Author

ttencate commented Oct 28, 2020

We should probably discuss a better (safer) API. I can think of two approaches:

dataset.with_transaction(|| -> Result<()> {
  ...
});

If the lambda returns Ok, the transaction is committed; if it returns Err, it is rolled back.

An alternative is to have a Transaction struct:

let transaction = dataset.start_transaction();
...
transaction.commit();

Transaction would implement Drop and default to rolling back, unless it's already been committed at that point. Transaction::commit and Transaction::rollback would take self by value, implicitly destroying the Transaction after either of them has been called.

I like this approach better because it's more flexible and needs less nesting. It is also somewhat similar to how postgres does it. However, we need to be careful about lifetimes; the Transaction should never outlive the Dataset it was created from. It also cannot hold any reference (mutable or immutable) to the Dataset, otherwise we couldn't do any operations inside the transaction! So I guess it would go something like this:

// 'dataset is the lifetime of the original Dataset.
struct Transaction<'dataset> {
    c_dataset: GDALDatasetH,
    committed: bool,
}

impl Dataset {
    // Are explicit lifetimes needed?
    fn start_transaction(&'dataset mut self) -> Result<Transaction<'dataset>> {
        Transaction { c_dataset: self.c_dataset, committed: false }
    }
}

I'm a bit of a newbie when it comes to good API design in Rust, so any feedback is very welcome.

@ChristianBeilschmidt
Copy link
Contributor

First of all, I really like this PR and the idea behind it, unlocking the feature for this crate. Here are my comments:

  1. You should remove all the .gdb files you (I guess) accidentally comitted.

  2. I vote in favor of the transaction object (second idea). The lifetime to a PhantomData object rather than a reference should be fine. And Drop should rollback the transaction if it is uncommitted.

If you change the tests accordingly, we should see if this works out as intended.

We should also address what happens if you start a second transaction on a dataset.

With respect to the FileGDB driver, is it possible to check whether it is enabled and then run the test instead of ignoring it?

@ttencate
Copy link
Contributor Author

Thanks! I've updated the PR and also added an example/doctest to the docs. The lifetime trick with Transaction<'a> didn't exactly work, because as long as the transaction exists, it's assumed to have a mutable borrow of the Dataset object (even if it doesn't really). So then we wouldn't be able to do anything useful during the transaction anymore.

I got around this by simply adding a &'a mut Dataset as a field in the Transaction, and offering Transaction::dataset() and Transaction::dataset_mut() methods. It feels like this is the way Rust wants to flow. The postgres crate does something similar by exposing the query API on the Transaction object as well.

The .gdb directory and all the files in it are for use in the test, so they were certainly not committed by accident (copied from the GDAL test suite; sorry there's so many, but I guess that's just how the format works). I updated the test to run only if FileGDB is supported, but because my system doesn't have it, I wasn't able to test the test conclusively.

@ttencate ttencate force-pushed the feature/transactions branch 2 times, most recently from b555a60 to 803062a Compare October 28, 2020 17:19
src/dataset.rs Outdated
///
/// If an error occurs after a successful `start_transaction`, the whole transaction may or may
/// not be implicitly canceled, depending on the driver. For example, the PG driver will cancel
/// it, but the SQLite and GPKG drivers will not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is true for SQLite, which, in general supports transaction rollback. Can you provide any more detail?

Copy link
Contributor Author

@ttencate ttencate Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the relevant portions from the GDAL API docs (with minor wordsmithing) so I have no clue either.

@jdroenner
Copy link
Member

can we change the tests to use GPKG instead of FileGDB? I guess GPKG should be available in most builds since it only requires libsqlite.

@jdroenner jdroenner mentioned this pull request Oct 29, 2020
4 tasks
@ttencate
Copy link
Contributor Author

There are already tests using GPKG, but it has actual transaction support (not emulated), so try_start_transaction works fine on GPKG while it should fail on FileGDB.

If you don't like to have a test that doesn't test anything on most systems, and requires two hundred stupid fixture files, I agree! I'd be happy to just remove the test and the files. And maybe even remove try_start_transaction entirely... if this one obscure driver is the only one where it makes a difference compared to start_transaction (I'd need to double check).

@jdroenner
Copy link
Member

Maybe there is a driver where we don't need 200 files... Could you check this? If there is no alternative we should keep it as well as try_start_transaction.

@ttencate
Copy link
Contributor Author

ttencate commented Oct 31, 2020

In the end, I decided to just get rid of try_start_transaction and the possibly broken tests altogether. It's just not worth the maintenance cost and user confusion.

Checking the GDAL source confirmed that FileGDB really is the only driver for which this matters. Most people don't have FileGDB because it's not built-in; you need to download a closed-source SDK and build GDAL from source to enable it.

Moreover, I'm not even sure that the test will work: the FileGDB driver only supports version 10 of the format (unlike OpenFileGDB, which supports version 9 as well), and I can't find a way to check which version the example file is without buying ArcGIS.

If someone does have FileGDB support and wants to use emulated transactions, they can just use start_transaction. And if they don't want transactions, they can just choose not to use them.

The only remaining use case for try_start_transaction is then for applications that don't know ahead of time which driver will be used. But even those users can simply check at runtime whether it's FileGDB and act accordingly. (Using GDALDataset::TestCapability would be better, but it's not supported in this crate yet.)

I also improved the tests a bit to check that changes are actually commited or rolled back as expected. This added a dev-only dependency on the tempfile crate; hope that's okay.

@jdroenner
Copy link
Member

i guess this is the best way to go. FileGDB is probably not relevant when there is OpenFileGDB as well. Thanks for your contribution.

@jdroenner jdroenner merged commit c7e62aa into georust:master Nov 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants