Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPI for coverage operations #866

Closed
wants to merge 3 commits into from
Closed

Conversation

pramsey
Copy link
Member

@pramsey pramsey commented Apr 11, 2023

No description provided.

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

What motivates the creation of a new GeometryList type rather than using GeometryCollection ?

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023

I want to give the users the freedom to assemble the input and disassemble the output without having to make copies. I might have been over-worried about that, but the semantics of GeometryCollection, particularly in the CAPI I think might force more copying than ideal.

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

Creating a non-owning Geometry collection might be a trickier implementation but I think it is a cleaner API. There are lots of operations we might want to do on a list of geometries (union, buffer, cluster, extent, distance, etc.) that might similarly benefit from avoiding the copy.

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023

Ironically (?) for PostGIS I think the owning GeometryCollection would be fine, since we're building copies anyways on both the way in and the way out of GEOS.

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

What motivates the creation of a new GeometryList type rather than using GeometryCollection ?

Another semantic that some coverage operations use is to support null elements in the list. This is used in the CoverageValidator to indicate that an input coverage element is valid.

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

I can see the appeal of a non-owning GeometryCollection. But it seems more likely to lead to errors by confused clients that forget to free the elements.

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

This is used in the CoverageValidator to indicate that an input coverage element is valid.

This doesn't strike me as an ideal API or a compelling reason to add a new collection type.

But it seems more likely to lead to errors by confused clients that forget to free the elements.

I'm not sure why this seems likely (or less clear than a GeometryList whose elements have undefined ownership?). Either way, I don't think we should design the API around errors we imagine novice C developers might make.

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023

I'm entirely open to other approaches. We do need some way to signal null members of the list, for the output from CoverageValidator. We did talk about a much more at the metal approach, passing out and in **Geometry and a list size separately.

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

I don't know if this is ideal. I think it is more explicit than "return NULL if no error" and more quickly answers the question "is the entire coverage valid?"

GEOSCoverageValidation GEOSCoverageValidate(const GEOSGeometry* cov);

int GEOSCoverageIsValid(GEOSCoverageValidation* c); // is the entire coverage valid
int GEOSCoverageHasError(GEOSCoverageValidation*c, unsigned int i); // is there an error associated with input geometry i?
GEOSGeometry* GEOSCoverageGetError(GEOSCoverageValidation* c, unsigned int i); // get the error associated with input geometry i, if any

GEOSCoverageValidation_destroy(GEOSCoverageValidation* c);

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

But it seems more likely to lead to errors by confused clients that forget to free the elements.

I'm not sure why this seems likely (or less clear than a GeometryList whose elements have undefined ownership?).

GeometryList elements are always owned by the client. Whereas if two kinds of GeometryCollection is provided, the client has to check which kind it is dealing with. Granted it may know from context, but it's less apparent to someone reading the code IMO.

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

For the proposed structure GEOSCoverageValidation, what is the ownership of the error geometries?

What would be the API for other one-to-one coverage operations? (Currently: CoverageSimplifier; in the near future CoverageCleaner, perhaps CoveragePrecisionReducer and maybe others).

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

GeometryList elements are always owned by the client.

In this implementation the ownership is flexible and is made explicit only at destruction time (did we use _destroy or _release ?).

Whereas if two kinds of GeometryCollection is provided, the client has to check which kind it is dealing with. Granted it may know from context, but it's less apparent to someone reading the code IMO.

I am proposing a GEOSGeom_createCollectionView returning an object that behave the same any other geometry collection. I don't understand the scenario where someone would need to check which type it is -- the point is that they behave identically. Adding this type is not a requirement for adding coverage operations, though. It would remove a copy that is probably insignificant compared to whatever operation is being performed on the coverage.

For the proposed structure GEOSCoverageValidation, what is the ownership of the error geometries?

In the signature above ownership is given to whoever calls GEOSCoverageGetError (otherwise we would be returning a const GEOSGeometry*). Maybe there is an implementation-based reason for GEOSCoverageValidation to retain ownership. I haven't looked at the implementation.

What would be the API for other one-to-one coverage operations? (Currently: CoverageSimplifier; in the near future CoverageCleaner, perhaps CoveragePrecisionReducer and maybe others).

These strike me as simpler and being able to use a GEOSGeometry* GEOSCoverageSimplify(const GEOSGeometry* cov) pattern, with some elements collapsing to POLYGON EMPTY if collapses are allowed. The coverage validation is different because it is not exactly 1:1 -- not every input element has an associated error.

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

For the proposed structure GEOSCoverageValidation, what is the ownership of the error geometries?
In the signature above ownership is given to whoever calls GEOSCoverageGetError (otherwise we would be returning a const GEOSGeometry*). Maybe there is an implementation-based reason for GEOSCoverageValidation to retain ownership. I haven't looked at the implementation.

There is no need for GEOSCoverageValidation to retain ownership (this is explicit in tbe proposed API, because it has a single return value). So to be clear, for this implementation, the client would need to access and free every invalid value. Of course, generally they would want to check every value. This just seems like a pattern that hasn't been used before in GEOS?

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

So to be clear, for this implementation, the client would need to access and free every invalid value.

I would expect GEOSCoverageValidation_destroy to do any necessary cleanup.

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

I am proposing a GEOSGeom_createCollectionView returning an object that behave the same any other geometry collection. I don't understand the scenario where someone would need to check which type it is -- the point is that they behave identically.

The point is that a CollectionView is not quite identical to other kinds of Geometry, because its elements are not owned and have to be freed. Now, it seems likely that most clients would do this in the same context that the geometry was created and used. But to support more complex usage patterns there should be a way for a client to tell if a geometry requires element-freeing or not?

Adding this type is not a requirement for adding coverage operations, though. It would remove a copy that is probably insignificant compared to whatever operation is being performed on the coverage.

Agreed, and it sounds like in the PostGIS case at least (and perhaps most other bindings) the input data is copied anyway. A CollectionView could be added down the road as an optimization if required (it has been suggested before).

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

So to be clear, for this implementation, the client would need to access and free every invalid value.

I would expect GEOSCoverageValidation_destroy to do any necessary cleanup.

So it would record if GEOSCoverageGetError has been called on a result element, and if not then free them?

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

The coverage validation is different because it is not exactly 1:1 -- not every input element has an associated error.

In fact CoverageValidation could follow the same pattern, by returning a collection containing LINESTRING EMPTY for each valid input element. It seems likely that often many or most input elements will be coverage-valid, so this does create a lot of "useless" little LineString geometries. But perhaps this is an insignificant cost as well.

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

So it would record if GEOSCoverageGetError has been called on a result element, and if not then free them?

I was envisioning that GEOSCoverageGetError would create the error geometry if it was requested. But looking at the implementation, I can see that the error geometries are created whether or not they are needed. Assuming the implementation remains that way, a better signature is probably

const GEOSGeometry* GEOSCoverageGetError(const GEOSCoverageValidation* c, unsigned int i); // get the error associated with input geometry i, if any

unless there is a performance reason to have GEOSCoverageGetError release the error geometry which, as you say, would be a new pattern to GEOS.

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023

I'm not super sold on a new GEOSCoverageValidation object. If the goal is simplicity, I'd tend to go for consuming/emitting GeometryCollection in all cases, and just fill in the valid slots of a validation with EMPTY, and add in a signature for a global true/false validation function.

And in terms of avoiding new structs in the API, there is also the possibility of passing/receiving a **Geometry and a size_t on the input and output of each coverage operation. This would have the "advantage" of being super-explicit about memory management, since we'd just be laying all the sharp knives out on the table and saying "have at 'er!".

@dbaston
Copy link
Member

dbaston commented Apr 12, 2023

in terms of avoiding new structs in the API

I don't mean to advocate for avoiding new structs in the API. I think a special-purpose struct can bring a lot of clarity. My hesitation with GeometryList is that it seems to largely duplicate GeometryCollection while being incompatible with it. I have the same hesitation about introducing a pattern of GEOSGeometry**, size_t when we already have have functions that use GeometryCollection to return coverages (GEOSDelaunayTriangulation, GEOSVoronoiDiagram) or operate on them (GEOSCoverageUnion, GEOSEnvelope, many others).

@brendan-ward
Copy link
Contributor

If the goal is simplicity, I'd tend to go for consuming/emitting GeometryCollection in all cases, and just fill in the valid slots of a validation with EMPTY, and add in a signature for a global true/false validation function.

As a downstream user this does indeed seem like the simplest approach, even though I'm not excited about the proliferation of EMPTY for this specific use case. I think it also may be a common pattern that if validation returns false that we may stop altogether rather than visiting and then freeing each validation error.

Provided I'm following along correctly, my instinct is that GeometryCollection seems like the better container than the new GeometryList.

Could the owning / non-owning GeometryCollection be deferred until later, or does it need to be worked out to move forward on this particular PR? Since that has a wider reach in the API, it seems like that may need separate discussion. For the coverage functions here, the inputs could be owning or non-owning GeometryCollections; it is up to the caller to free them correctly.

Since the output of GEOSCoverage_valid and GEOSCoverage_simplify both return new objects, I'd expect those to be owning GeometryCollections, and I'd expect to free them at the collection rather than individual level.

to support more complex usage patterns there should be a way for a client to tell if a geometry requires element-freeing or not

It really seems like the caller should take a pretty strict approach to this, because the complexity probably should be on their side. Either have an entirely non-owning collection and free things individually, or an owning collection and free just that; mixing those two approaches seems like asking for trouble. So long as outputs of C API functions are entirely new and not some mix of passed in non-owned geometries and newly constructed geometries, then it is entirely up to the user to keep track of ownership.

GEOSGeom_createCollectionView

This seems reasonable to me, and the implication is that if I call this, I'm on my own for managing the ownership of any geometries within it.

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023

What do we think of #867?

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023 via email

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

My hesitation with GeometryList is that it seems to largely duplicate GeometryCollection while being incompatible with it. I have the same hesitation about introducing a pattern of GEOSGeometry**, size_t when we already have have functions that use GeometryCollection to return coverages (GEOSDelaunayTriangulation, GEOSVoronoiDiagram) or operate on them (GEOSCoverageUnion, GEOSEnvelope, many others).

The key difference to existing uses of GeometryCollection is that for coverage operations the result list of geometries is exactly in 1-1 correspondence with the input list. This is to support use cases such as window functions in PostGIS, where it's essential to be able to match each output value to its original input value.

Some coverage operations (such as validation) may produce an empty result for some inputs, so the result structure needs to be able to express that. My original thinking was that the simplest and lightest way to do this was to return null elements. The other options proposed here (empty geometries, or a multi-operation structure) are fine too (but all have somewhat more complexity/cost).

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

Another thing that makes me uneasy about a non-owning GeometryCollection is that result values with non-owning semantics will not be explicit in the API. The only way to indicate this will be via documentation (which is unreliable and tends to be overlooked).

I suppose one way to mitigate this is to provide some kind of check in the GEOSGeom_destroy function, which will error out if the contained elements have not been "released" (perhaps by setting them to null?). In practice I suspect this means another function will be needed to "fully destroy" a non-owning collection.

I guess it could also be a policy to never return a non-owning collection. This still might cause 3rd-party code to be susceptible to mistakes, however.

Having a distinct non-owning type (such as GeometryList) would make the non-owning semantics explicit in the API.

As for the desire to run geometry functions on the contents of a GeometryList, it's easy to create a GeometryCollection first, and then build a GeometryList on its elements (and there can be an API function to do that). The (client-owned) GeometryCollection can be used to execute geometry functions. This works the other way too, with an API which builds a (owning) GeometryCollection out of the elements of a GeometryList.

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023

We have the capability to "releaseGeometries" in the C++ API, and this might be a nice bridge to allow end users to decompose returned GeometryCollections if they need to if we added it to the CAPI. The only overhead then would be the creation and disposal of a containing GeometryCollection.

@dr-jts
Copy link
Contributor

dr-jts commented Apr 12, 2023

We have the capability to "releaseGeometries" in the C++ API, and this might be a nice bridge to allow end users to decompose returned GeometryCollections if they need to if we added it to the CAPI. The only overhead then would be the creation and disposal of a containing GeometryCollection.

Sounds good to me.

@pramsey
Copy link
Member Author

pramsey commented Apr 12, 2023

Active work is on #867.

@pramsey pramsey closed this Apr 12, 2023
@pramsey pramsey deleted the main-coverage-capi branch April 24, 2023 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants