Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Send trait to vector structs #419

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

callmeahab
Copy link

@callmeahab callmeahab commented Jul 5, 2023

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

I needed vector structs to be send for my use case, due to use in an async functions. These changes for good for my use case, since I'm wrapping everything in an Arc<Mutex<, but let me know if this might cause potential issues, or if something else is needed to make these changes safer.

@lnicola
Copy link
Member

lnicola commented Jul 29, 2023

r? @ttencate

@lnicola
Copy link
Member

lnicola commented Jul 29, 2023

Rebasing over master should fix the CI error.

@ttencate
Copy link
Contributor

If merged, this will close #417.

Copy link
Contributor

@ttencate ttencate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I'm not convinced that these are actually safe. For example, Feature has a lifetime parameter that refers to its underlying Defn. The Defn is internally refcounted, so by sending a Feature to another thread, I would be able to obtain two Defns on different threads that point to the same GDAL object.

A problem exists even with structs that don't have a lifetime parameter, like SpatialRef. It is also internally refcounted, so I can Clone it and Send the clone to another thread, then call (mutating) functions on both copies concurrently, causing UB.

Quoting the Nomicon:

Something can safely be Send unless it shares mutable state with something else without enforcing exclusive access to it.

And the GDAL docs:

[Y]ou should not call simultaneously GDAL functions from multiple threads on the same data instance, or even instances that are closely related through ownership relationships. For example, for a multi-band raster dataset, it is not safe to call concurrently GDAL functions on different GDALRasterBand instances owned by the same GDALDataset instance (each thread should instead manipulate a distinct GDALDataset). Similarly for a GDALDataset owning several OGRLayer.

(I don't remember this paragraph; maybe it was added after I filed #417.)

What can be done?

One option is to keep Send but add a stern warning to the docs. I think there are several more places in the crate already where safe Rust can actually trigger UB.

A more principled option is to add a utility wrapper that makes the unsafety explicit, e.g.:

pub unsafe trait UnsafeSend {}

pub struct Sender<T>(T);

impl<T> Sender<T> {
    /// # Safety
    /// ... instructions for safe use go here ...
    pub unsafe fn new(inner: T) -> Self {
        Self(inner)
    }
    
    pub fn into_inner(self) -> T {
        self.0
    }
}

impl<T> From<T> for Sender<T>
where
    T: UnsafeSend
{
    fn from(inner: T) -> Self {
        Sender(inner)
    }
}

unsafe impl UnsafeSend for SpatialRef {}

unsafe impl<T> Send for Sender<T>
where
    T: UnsafeSend
{}

fn main() {
    let srs = SpatialRef;
    let wrapped = unsafe { Sender::new(srs) };
    std::thread::spawn(|| {
        let srs = wrapped.into_inner();
    });
}

Not sure if this idea is any good, just throwing it out there.

If everything needs to be made entirely safe, the crate could use the typestate pattern. Every refcounted type would get a generic parameter which indicates whether it's unique, or might be shared. Most GDAL functions would return shared objects, but constructor-like functions return unique ones. Only unique objects are Send. It's possible to convert a shared object to a unique object, which checks the reference count and safely gives an Err if the refcount is greater than 1. As you might guess, this is a bigger refactor, and I'm not sure the additional complexity is worth it...

@@ -635,7 +637,7 @@ mod tests {
#[cfg(not(major_ge_3))]
assert_eq!(spatial_ref.to_wkt().unwrap(), "PROJCS[\"unnamed\",GEOGCS[\"GRS 1980(IUGG, 1980)\",DATUM[\"unknown\",SPHEROID[\"GRS80\",6378137,298.257222101]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433]],PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],PARAMETER[\"latitude_of_center\",52],PARAMETER[\"longitude_of_center\",10],PARAMETER[\"false_easting\",4321000],PARAMETER[\"false_northing\",3210000],UNIT[\"Meter\",1]]");
#[cfg(major_ge_3)]
assert_eq!(spatial_ref.to_wkt().unwrap(), "PROJCS[\"unknown\",GEOGCS[\"unknown\",DATUM[\"Unknown based on GRS80 ellipsoid\",SPHEROID[\"GRS 1980\",6378137,298.257222101,AUTHORITY[\"EPSG\",\"7019\"]]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]]],PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],PARAMETER[\"latitude_of_center\",52],PARAMETER[\"longitude_of_center\",10],PARAMETER[\"false_easting\",4321000],PARAMETER[\"false_northing\",3210000],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH]]");
assert_eq!(spatial_ref.to_wkt().unwrap(), "PROJCS[\"unknown\",GEOGCS[\"unknown\",DATUM[\"Unknown based on GRS 1980 ellipsoid\",SPHEROID[\"GRS 1980\",6378137,298.257222101,AUTHORITY[\"EPSG\",\"7019\"]]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]]],PROJECTION[\"Lambert_Azimuthal_Equal_Area\"],PARAMETER[\"latitude_of_center\",52],PARAMETER[\"longitude_of_center\",10],PARAMETER[\"false_easting\",4321000],PARAMETER[\"false_northing\",3210000],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH]]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CI is happy with this change, then so am I. But if not, it probably needs a version check; see e.g. a0d1d30

@@ -282,6 +282,8 @@ impl<'a> Default for LayerOptions<'a> {
}
}

unsafe impl Send for LayerOptions<'_> {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe, because the struct contains references to types that are not Sync. But LayerOptions generally doesn't need to be Send anyway because it's just a pack of parameters.

@lnicola
Copy link
Member

lnicola commented Aug 1, 2023

I don't remember this paragraph; maybe it was added after I filed #417.

That's pretty old, see for example the note at the top of https://gdal.org/drivers/raster/vrt.html#multi-threading-issues.

One option is to keep Send but add a stern warning to the docs. I think there are several more places in the crate already where safe Rust can actually trigger UB.

I don't want this. There are indeed some unsound APIs (ahem read_block), but we should fix them. Exposing this would be pretty horrible.

If everything needs to be made entirely safe, the crate could use the typestate pattern. Every refcounted type would get a generic parameter which indicates whether it's unique, or might be shared. Most GDAL functions would return shared objects, but constructor-like functions return unique ones.

This might work, but I'm not 100% sure yet. It does bring a fair amount of complexity, though.

In the meanwhile, you can usually work around the types being !Send by using channels. That should work even in async code.

@callmeahab
Copy link
Author

After about a month of using this, I haven't encountered any issues. I am however not using any structs in multiple threads, only async functions.

@ttencate Is the idea behind a typestate to enforce so that certain structs have to be moved therefore preventing accidental use in multi-thread? I'd we willing to take a stab at it, but would need some help with that idea.

@lnicola
Copy link
Member

lnicola commented Aug 2, 2023

After about a month of using this, I haven't encountered any issues. I am however not using any structs in multiple threads, only async functions.

Sure, it's not going to cause problems in practice, but the whole idea of Rust is that you're not supposed to do bad things like this in safe code, no matter how hard you try. There was a bit of a mess a while ago, with a web framework that pulled this kind of trick (implementing Send for !Send types).

Is the idea behind a typestate to enforce so that certain structs have to be moved therefore preventing accidental use in multi-thread?

If I got it right, it's to make sure you go through some hoops when you send them. For example, you have a normal (owned) Dataset that's !Send, you turn it into a SendableDataset which is (and check the reference count at this point), then the receiving thread turns it back into a Dataset. You can't use SendableDataset for anything else, consider this case:

  • you own a Dataset
  • you convert it to its Send flavour, asserting that the reference count is 1
  • you get a RasterBand from it
  • your dataset now has a reference count of 2

There's also the option to use a generic argument to keep track of "sendability", but I think it boils down to the same thing.

This makes some code look nicer (you can pass datasets to other threads instead of paths to open), but doesn't help much in async contexts, where you switch threads all the time.

@ttencate
Copy link
Contributor

ttencate commented Aug 2, 2023

What I had in mind with the typestate pattern was something along the lines of Dataset<Unique> vs. Dataset<Shared>, where only Dataset<Unique> is Send. However, I like @lnicola's idea much better, because it doesn't require changing literally all the types in the crate to be generic.

How about this then:

  • We add a trait RefCounted, which has (at least) a function ref_count(&self) -> usize;.
  • Reference-counted objects like Dataset implement this trait.
  • We add a wrapper struct Sender<T> which wraps a T and (unsafely, manually) implements Send. A Sender<T> can be constructed from any T: RefCounted, but the construction fails if the reference count is greater than 1:
pub struct Sender<T>(T);

unsafe impl<T> Send for Sender<T> {}

impl<T: RefCounted> Sender<T> {
    // Could also be a TryFrom impl.
    pub fn try_wrap(inner: T) -> Result<T> {
        if inner.ref_count() <= 1 {
            Ok(Self(inner))
        } else {
            Err(Error::...)
        }
    }

    // Could also be into, but we can't blanket-impl From<Sender<T>> for T due to orphan rules.
    pub fn into_inner(self) -> T {
        self.0
    }
}

However, this is still not fully sound. It can break if T contains references to other ref-counted objects inside it, if those inner objects don't hold a reference back to the T. For example, in the scenario I sketched earlier: OGRFeature has a reference to its OGRFeatureDefn but presumably not the other way round. So if we send a freshly created, uniquely referenced Feature to another thread, both threads have a reference to the underlying Defn.

(In this particular case, the Defn is probably only read from, and concurrent reads are probably okay, but neither is guaranteed.)

@lnicola
Copy link
Member

lnicola commented Aug 2, 2023

How about this then

Yup, that looks like a nice API.

In this particular case, the Defn is probably only read from, and concurrent reads are probably okay, but neither is guaranteed.

Maybe I'm missing something, but won't destroying the Feature touch the reference count of the Defn?

@ttencate
Copy link
Contributor

ttencate commented Aug 2, 2023

Yes, I suppose that's a data race.

@lnicola
Copy link
Member

lnicola commented Aug 2, 2023

Well, given all this, I don't think making the GDAL objects Send is a good idea.

@callmeahab
Copy link
Author

Ok, is there a work around for async functions in that case? Just so I don't have to sync this branch to master indefinitely :)

@lnicola
Copy link
Member

lnicola commented Aug 2, 2023

I don't know your app, but you can do something like:

  • make an IO thread or thread pool, each with an associated async MPSC channel
  • in your request handler, make a new one-shot channel for the response and send it to an IO thread, along with the request data
  • await on the one-shot channel for the response
  • in the IO threads, block while waiting for a request, process it when it arrives, then send the response back

#425 is somewhat related, too.

@ttencate
Copy link
Contributor

ttencate commented Aug 2, 2023

Well, given all this, I don't think making the GDAL objects Send is a good idea.

Not safely. But an unsafe API, where the user is responsible for upholding GDAL's thread safety rules, would still be better than no API at all.

@lnicola
Copy link
Member

lnicola commented Aug 2, 2023

I could get behind an API like your Sender above, if its docs include a copious amount of warnings.

@callmeahab
Copy link
Author

I don't know your app, but you can do something like:

  • make an IO thread or thread pool, each with an associated async MPSC channel
  • in your request handler, make a new one-shot channel for the response and send it to an IO thread, along with the request data
  • await on the one-shot channel for the response
  • in the IO threads, block while waiting for a request, process it when it arrives, then send the response back

#425 is somewhat related, too.

Thanks. The app is basically a rabbitmq consumer(deadpool-lapin ) running in a loop waiting for messages, once a message is received it then sends a message to an actor(coerce-rs), which is where gdal functions are invoked, after gdal finishes that actor further delegates work to other actors (db, s3 etc).

Coerce is working basically how you're describing this to be implemented, only thing is they don't provide synchronous handlers, which is why I had to make this workaround. I'm also thinking about implementing a synchronous handler that can return a Box<Pin<Future<...>>> and get resolved later, that way I could call gdal functions in the synchronous part of the handler and just pass the data to async part, at which point gdal structs can safely be dropped.

That being said I also like the idea of implementing a Sender struct to handle unsafe code.

@BlakeOrth
Copy link

I just ran into this myself for using SpatialRef and I'm not convinced the following statement is true.

A problem exists even with structs that don't have a lifetime parameter, like SpatialRef. It is also internally refcounted, so I can Clone it and Send the clone to another thread, then call (mutating) functions on both copies concurrently, causing UB.

While the underlying OGRSpatialReference from GDAL is internally reference counted, the implementation here does not expose or use any explicit reference counting methods. The implementation for Clone correctly calls OSRClone(...), rather than simply using a reference count:

impl Clone for SpatialRef {
fn clone(&self) -> SpatialRef {
let n_obj = unsafe { gdal_sys::OSRClone(self.0) };
SpatialRef(n_obj)
}
}

Examining the code on GDAL's side we can see when OSRClone(...) is called they make a proper deep copy as well, and don't just increase the reference count:
https://github.com/OSGeo/gdal/blob/e4026b077a092f614dc8a25b0025f01080eee47d/ogr/ogrspatialreference.cpp#L1376-L1392

Given the above, I believe at least SpatialRef can be safely marked as Send in its current implementation. I do think there could be some possibility for unsafe behavior between threads if a user chooses to use to_c_hsrs(&self) -> OGRSpatialReferenceH. I could see marking the method as unsafe in the event SpatialRef implements Send. I haven't reviewed any of the other types in detail at this point, so I cannot speak to them directly, but it's possible some other reference counted types could also be Send with close inspection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants