Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traits: How to represent Polygon EMPTY? #1039

Open
kylebarron opened this issue Jul 16, 2023 · 2 comments
Open

Traits: How to represent Polygon EMPTY? #1039

kylebarron opened this issue Jul 16, 2023 · 2 comments

Comments

@kylebarron
Copy link
Collaborator

(I'm thinking that the best way to start a focused discussion on a specific part of traits is to open a new issue, rather than add to the existing discussion thread/PR)

In geoarrow-rs I started a custom implementation of parsing a vector of WKB objects (a custom two-pass approach should be really efficient and I want geoarrow<-->WKB to be performant). For the implementation details, I created structs like WKBPolygon that reference a WKB object in a &[u8], and then implemented PolygonTrait for that struct. Then for the high-level conversion from a WKBArray to a PolygonArray, I have two functions called first_pass and second_pass that operate on iterators of objects implementing PolygonTrait. This works surprisingly well! And it means that I can use those exact same functions to convert arrays of anything implementing PolygonTrait, such as arrays of geo::Polygon or an array inside a bump-allocated-vec.

But I ran into a snag adding tests from the geoarrow/geoarrow-data repo. There this Parquet file contains an object that is stored in WKB as Polygon EMPTY. You can inspect this easily in e.g. geopandas:
image

But when I try to load this file, I get an error calling PolygonTrait::exterior and I'm realizing that the PolygonTrait (and also geo-types) in effect doesn't support Polygon EMPTY because it requires at least an exterior ring to exist.

In Arrow specifically, every array has a validity bitmap, so I should be able to work around this in this case (if I enforce that a missing/invalid geometry is the same as an empty geometry, which I'm not sure is true). But I wanted to raise this in relation to traits and see what people thought.

@michaelkirk
Copy link
Member

michaelkirk commented Jul 17, 2023

Reconciling different geometry representations is hard, and IMO it's probably not possible to match all users expectations 100% of the time. It definitely feels like a "pick your tradeoffs" kind of thing.

Some possible approximations that come to mind for different circumstances might be: Option<Polygon>::None or maybe Polygon(vec![]).

Watch out also for POINT EMPTY, which precludes something like the second option.

https://github.com/georust/geozero deals with translating between formats, and might have some insight.

if I enforce that a missing/invalid geometry is the same as an empty geometry

I'm not entirely sure what a "missing" geometry is, but an "invalid" geometry feels like a distinct thing from an empty geometry.

e.g. the output of unioning two disjoint polygons should be empty, but not invalid.

Sorry I don't have more answers.

@kylebarron
Copy link
Collaborator Author

Reconciling different geometry representations is hard, and IMO it's probably not possible to match all users expectations 100% of the time. It definitely feels like a "pick your tradeoffs" kind of thing.

I agree, but also feel like the simple features standard is a good target to strive for, and thus having some way to represent empty geometries is useful.

Maybe it makes most sense to add an is_empty() method onto the traits and leave the storage mechanism to the underlying format.

In terms of the current polygon trait, however, the user would have to remember to check if it's empty before accessing exterior(). Maybe that means exterior() should return an Option, being None when the polygon is empty.

If you're curious, GeoArrow defines all geometries except for point as empty if they have an empty internal list, and a point as empty if the coordinates are NaN, NaN. GeoArrow only defines float coordinates, so they don't have an issue of empty integer points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants