Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Owned layers from a dataset that are Send #238

Merged
merged 7 commits into from Feb 12, 2022

Conversation

ChristianBeilschmidt
Copy link
Contributor

@ChristianBeilschmidt ChristianBeilschmidt commented Jan 9, 2022

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

We ran into the problem of having non-Send Layers. Without Send it is barely usable in an asynchronous or threading context. You cannot store it somewhere and create tasks that do something with them since you would need to send a Layer to another thread.

From previous discussions about thread-safety in GDAL we found out that we must not access Datasets or Layers at the same time from different threads. But for Datasets, we found that it is safe to Send them to another thread since GDAL 2.4 (or something). For Layers, it is actually okay to assume that the are Send as long as we have a one-to-one relationship between Datasets or Layers, or, stated differently, we must ensure that there are no two Layers that point to one Dataset. If we would have to Layers pointing to the same Dataset and they are Send, then we could have two different threads accessing the same Dataset. We must not allow this.

However, from discussions with @jdroenner , we thought that if we would introduce a Layer that owns its Dataset, we could make it Send since we can ensure that the Dataset isn't accessed another time, for instance, from another thread.

So in this draft PR I've created an OwnedLayer that does exactly this, it encapsulates a Layer and its Dataset. In Dataset, we can then choose if we want to have a reference to Layer or we want to convert our Dataset into an OwnedLayer. I've added a test that this behavior works.

Since the operations on Layer and OwnedLayer are more or less the same, I've transferred them into a LayerAccess trait. This unfortunately would mean a breaking change. On the other hand, it means not much more code for preserving both types.

Having said all this, what are your opinions on OwnedLayers?
Do you like or criticize the concept?
What are your thoughts on naming the two types?

@ChristianBeilschmidt ChristianBeilschmidt changed the title owned layers from a dataset Owned layers from a dataset that are Send Jan 9, 2022
@rmanoka
Copy link
Contributor

rmanoka commented Jan 11, 2022

I support this; I've wanted something like this for the RasterBand of a Dataset. For comparison, the current method is to instead send (Dataset, isize) and expect the band / layer to be extracted when needed. This is typically okay, because we usually need to pass it only once across to a thread (in multi-threaded processing setups).

Your approach above makes it a bit more ergonomic, and may be more efficient if the object is being passed around many threads. It also makes it much more ergonomic to use in async contexts (most spawners require Send), which is quite useful.

@milosimr
Copy link

Can someone tell me what's required to finish here in order to get this PR approved?

@jdroenner
Copy link
Member

i had no time to look at this yet. i will try to do it this week

@jdroenner
Copy link
Member

I am not sure if it is enough to add the LayerAccess. I guess we need to do the same for Datasets. Some datasets and layers in GDAL are the same object with a pointer into the data.

So we need a DatasetAccess trait that wraps the dataset pointer. We must ensure that the dataset is not opened in shared mode i guess. Then we can have a Dataset<A: DatasetAccess> . An owned Dataset is Send since we impl Send for OwnedDatasetAccess but not for SharedDatasetAccess.

Then we need a method in Dataset to create an owned Layer. This method must consume the dataset and take the pointer. It might have a method to get the Dataset back when the OwnedLayer is closed.
This is important since you could open the same layer many times from the same dataset.

@ChristianBeilschmidt
Copy link
Contributor Author

ChristianBeilschmidt commented Jan 29, 2022

I am not sure if it is enough to add the LayerAccess. I guess we need to do the same for Datasets. Some datasets and layers in GDAL are the same object with a pointer into the data.

So we need a DatasetAccess trait that wraps the dataset pointer. We must ensure that the dataset is not opened in shared mode i guess. Then we can have a Dataset<A: DatasetAccess> . An owned Dataset is Send since we impl Send for OwnedDatasetAccess but not for SharedDatasetAccess.

Then we need a method in Dataset to create an owned Layer. This method must consume the dataset and take the pointer. It might have a method to get the Dataset back when the OwnedLayer is closed. This is important since you could open the same layer many times from the same dataset.

Okay, but how is this different from my implementation?

How would we open something in shared mode in the current implementation? Datasets are already Send.

@jdroenner
Copy link
Member

I think your Implementation is fine but there is a hole. GDALOpenEx allows to specify a GDAL_OF_SHARED option. This does the same as GDALOpenShared.
A shared Dataset should not be Send. That is not an issue in your Implementation but more a Bug in the crate.

It works like this:

"Starting with GDAL 1.6.0, if GDALOpenShared() is called on the same pszFilename from two different threads, a different GDALDataset object will be returned as it is not safe to use the same dataset from different threads, unless the user does explicitly use mutexes in its code."

So you can open a dataset (pointer) twice and send it to two different threads. Then you can create two OwnedLayers with the same dataset (pointer)

@ChristianBeilschmidt
Copy link
Contributor Author

Okay, but this means the current implementation has the problem that Dataset must not be Send if we allow passing the GDAL_OF_SHARED flag.

We could just prevent opening a Dataset with this flag. I'm not sure how often this is used in practice, but we could create a second SharedDataset when we need it.

I could create both versions now, so, idk.

@rmanoka
Copy link
Contributor

rmanoka commented Jan 30, 2022

@jdroenner I believe we discussed this while supporting flags via open_ex here. The summary is (IIUC) that we don't really support GDAL_OF_SHARED except via unsafe code.

@ChristianBeilschmidt
Copy link
Contributor Author

This would mean, I don't implement a SharedDataset that is not Send but just throw an error in case of passing GDAL_OF_SHARED?

@rmanoka
Copy link
Contributor

rmanoka commented Jan 31, 2022

This would mean, I don't implement a SharedDataset that is not Send but just throw an error in case of passing GDAL_OF_SHARED?

Yeah, if there is a safe way to pass that option, then that needs to throw an error.

@ChristianBeilschmidt
Copy link
Contributor Author

Luckily, this is already implemented to ensure that a Dataset is Send.

    /// Note that the `GDAL_OF_SHARED` option is removed
    /// from the set of allowed option because it subverts
    /// the [`Send`] implementation that allow passing the
    /// dataset the another thread. See
    /// https://github.com/georust/gdal/issues/154.

So I have nothing to do here.

In addition to that, I've added an OwnedFeatureIterator to have the same behavior for the features() method.
Please check that the transmutation of the Defn lifetime is also correct in your perspective.

@ChristianBeilschmidt ChristianBeilschmidt marked this pull request as ready for review January 31, 2022 12:01
@rmanoka
Copy link
Contributor

rmanoka commented Feb 1, 2022

@ChristianBeilschmidt could you add a test for the OwnedFeatures iterator?

@ChristianBeilschmidt
Copy link
Contributor Author

@ChristianBeilschmidt could you add a test for the OwnedFeatures iterator?

You can find one in b1afd7d. It is called test_iterate_owned_features.

@rmanoka
Copy link
Contributor

rmanoka commented Feb 3, 2022

Thanks @ChristianBeilschmidt . PR looks good to me. We could give it a couple of days to see if anyone else wants to take a look.

@rmanoka
Copy link
Contributor

rmanoka commented Feb 12, 2022

Shall we merge this? Could someone with bors rights accept this?

@lnicola
Copy link
Member

lnicola commented Feb 12, 2022

@rmanoka can you try to r+ it?

@rmanoka
Copy link
Contributor

rmanoka commented Feb 12, 2022

bors r+

@bors
Copy link
Contributor

bors bot commented Feb 12, 2022

🔒 Permission denied

Existing reviewers: click here to make rmanoka a reviewer

@lnicola
Copy link
Member

lnicola commented Feb 12, 2022

@rmanoka try again now

@rmanoka
Copy link
Contributor

rmanoka commented Feb 12, 2022

bors r+ please

bors bot added a commit that referenced this pull request Feb 12, 2022
238: Owned layers from a dataset that are `Send` r=rmanoka a=ChristianBeilschmidt

- [X] I agree to follow the project's [code of conduct](https://github.com/georust/gdal/blob/master/CODE_OF_CONDUCT.md).
- [ ] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users.
---

We ran into the problem of having non-`Send` `Layer`s. Without `Send` it is barely usable in an asynchronous or threading context. You cannot store it somewhere and create tasks that do something with them since you would need to send a `Layer` to another thread.

From previous discussions about thread-safety in GDAL we found out that we must not access `Dataset`s or `Layer`s at the same time from different threads. But for `Dataset`s, we found that it is safe to `Send` them to another thread since GDAL 2.4 (or something). For `Layer`s, it is actually okay to assume that the are `Send` as long as we have a one-to-one relationship between `Dataset`s or `Layer`s, or, stated differently, we must ensure that there are no two `Layer`s that point to one `Dataset`. If we would have to `Layer`s pointing to the same `Dataset` and they are `Send`, then we could have two different threads accessing the same `Dataset`. We must not allow this.

However, from discussions with `@jdroenner` , we thought that if we would introduce a `Layer` that owns its `Dataset`, we could make it `Send` since we can ensure that the `Dataset` isn't accessed another time, for instance, from another thread.

So in this draft PR I've created an `OwnedLayer` that does exactly this, it encapsulates a `Layer` and its `Dataset`. In `Dataset`, we can then choose if we want to have a reference to `Layer` or we want to convert our `Dataset` into an `OwnedLayer`. I've added a test that this behavior works.

Since the operations on `Layer` and `OwnedLayer` are more or less the same, I've transferred them into a `LayerAccess` trait. This unfortunately would mean a breaking change. On the other hand, it means not much more code for preserving both types.

Having said all this, what are your opinions on `OwnedLayer`s?
Do you like or criticize the concept?
What are your thoughts on naming the two types?


Co-authored-by: Christian Beilschmidt <christian.beilschmidt@geoengine.de>
@rmanoka
Copy link
Contributor

rmanoka commented Feb 12, 2022

bors r-
@ChristianBeilschmidt Do you wish to add a changelog? This is probably a breaking change as users would now have to include the trait in their scope to access the necessary methods. Pls. document this, and we can merge this.

@bors
Copy link
Contributor

bors bot commented Feb 12, 2022

Canceled.

@lnicola
Copy link
Member

lnicola commented Feb 12, 2022

I understand the reasoning and don't have any constructive suggestions, but TBH I really don't like these changes 😃.

@ChristianBeilschmidt
Copy link
Contributor Author

I can include a changelog. If you don't like this PR @lnicola, we should think about how we can do this differently. In my opinion, we cannot not tackle this problem.

@lnicola
Copy link
Member

lnicola commented Feb 12, 2022

No, we should merge this. If someone comes up with a better approach in the future, we can break the API again.

@jdroenner
Copy link
Member

bors merge

@bors
Copy link
Contributor

bors bot commented Feb 12, 2022

Build succeeded:

@bors bors bot merged commit 6cb085a into georust:master Feb 12, 2022
@ChristianBeilschmidt ChristianBeilschmidt deleted the owned-layer branch February 13, 2022 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants