Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define polygon orientation rules #46

Closed
mentin opened this issue Mar 23, 2022 · 35 comments
Closed

Define polygon orientation rules #46

mentin opened this issue Mar 23, 2022 · 35 comments
Milestone

Comments

@mentin
Copy link

mentin commented Mar 23, 2022

I think the standard should define polygon orientation.

1. Spherical edges case

With spherical edges on sphere, there is an ambiguity in polygon definition, if the system allows polygons larger than hemisphere.

A sequence of vertices that define a polygon boundary can define either polygon to the left of that line, or to the right of the line. E.g. global coastal line can define either continents or oceans. Systems that support polygons larger than hemisphere usually use orientation rule to solve this ambiguity. E.g. MS SQL, Google BigQuery interpret the side to the left of the line as the content of the ring.

2. Planar edges case

Planar case does not have such ambiguity, but it is still good idea to have specific rule.

E.g. GeoJson RFC defines a rule consistent with the rule above:

   o  Polygon rings MUST follow the right-hand rule for orientation
      (counterclockwise external rings, clockwise internal rings).
@cholmes cholmes added this to the 0.2 milestone Mar 23, 2022
@cholmes
Copy link
Member

cholmes commented Mar 23, 2022

Very good call, thanks @mentin! If you'd like to contribute directly to the spec feel free to open a pull request. If not I'm sure someone will be able to do it - hoping we can get it in before 0.2. It'll be good to have this clear.

@rouault
Copy link
Contributor

rouault commented Mar 24, 2022

My preference would be that this should only be a requirement for the spherical edges case. For planar, it imposes a constraint on all writers to potentially fix winding order, that isn't required by the underlying standard (OGC Simple Feature Access).

@alasarr
Copy link
Collaborator

alasarr commented Mar 27, 2022

Nice catch @mentin! I can help to open a PR to include this, just let me know.

According to #2. As @rouault mentions, the current format we use for encoding geometries (WKB) doesn't impose that constraint, so my preference would be to not include it unless we have a specific reason like #1. Not a strong opinion here, just want to keep as simpler as possible at the beginning to increase adoption. Does it make sense for you @mentin?

@mentin
Copy link
Author

mentin commented Mar 28, 2022

Yes, it makes sense to only require it for spherical case. Our (BigQuery) experience with GeoJson was that despite the RFC requiring specific orientation, we frequently see opposite orientation too, so we ignore it when reading GeoJson files.

@felixpalmer
Copy link
Collaborator

I think it is instructive to observe what happened with GeoJSON here. The winding order was omitted from the original spec and then later added. As a result, spec-compliant GeoJSON does not require a winding check and potential re-winding, but in practice clients need to deal with old GeoJSON out there so all perform the winding check.

My perspective from a frontend visualization point of view is that this is unfortunate, and means clients cannot simply copy across the vertex data to the GPU, but must instead perform a pass through the data.

My preference would be to enforce a winding order in all cases. I recognize that this moves the burden from the readers to the writers, but I think it makes for a better user experience and in fact makes for a simpler spec as we don't need distinguish the spherical/planar cases.

I'm still open to discussion on this, but to get the ball rolling I've created a PR.

@jorisvandenbossche
Copy link
Collaborator

jorisvandenbossche commented Apr 15, 2022

I certainly understand the importance of a proper orientation (even for planar data) for certain use cases such as visualization (eg spatialpandas also has an orient keyword in input functions (similarly as bigquery)).
But to highlight a practical aspect: GEOS doesn't directly provide a method to ensure your data uses correct winding order, which also means that Shapely/GeoPandas currently don't have an efficient implementation for this (it could be added though, manually implementing this using the existing GEOS APIs, I opened shapely/shapely#1366 for this). While the (current) unavailability of an efficient algorithm in one of the GeoParquet implementations is certainly not a sufficient reason to not require this in the spec, I just wanted to highlight that aspect. And even when it is implemented, it will always incur some overhead to check/fix winding order when writing. And for some uses cases (eg basically all geospatial operations in GEOS) this doesn't matter, and thus can give unnecessary overhead.

Another option could also be to add a new field to the metadata that indicates if the data is "known to have correct winding order" (eg oriented=True|False). Based on this field, readers could then read the data as is or check/fix on the fly (if they require correct winding order).
For example for bigquery, it would basically provide the value that should be passed to the oriented keyword (although that keyword is for reading WKT, not WKB in their case).

Of course, having it as a field that can be set to false, creates the risk that many people generate datasets that don't have a correct winding order and that become less efficient to read (in case a specific winding order is required for your application).


@rouault you mention that the OGC Simple Feature Access standard doesn't require a specific winding order, but from https://www.ogc.org/standards/sfa (https://portal.ogc.org/files/?artifact_id=25355), I see:

The exterior boundary LinearRing defines the “top” of the surface which is the side of the surface from which the
exterior boundary appears to traverse the boundary in a counter clockwise direction. The interior LinearRings will
have the opposite orientation, and appear as clockwise when viewed from the “top”,

@rouault
Copy link
Contributor

rouault commented Apr 15, 2022

you mention that the OGC Simple Feature Access standard doesn't require a specific winding order, but from https://www.ogc.org/standards/sfa (https://portal.ogc.org/files/?artifact_id=25355), I see:

Interesting... So this mentions exists in SFA 1.2.0 and 1.2.1, but the earlier SFA 1.1.0 (https://portal.ogc.org/files/?artifact_id=13227) from 2005 doesn't mention any winding order.
But my understanding of this sentence is not that it requires the exterior ring to be counter clockwise oriented. It says that you can use the winding order to decide which side of the surface is the top or the bottom, when considering the 3D space. This mostly matters for PolyhedralSurface / volumic objects to distinguish the interior and exterior of the object (see page 29 of SFA 1.2.1). It is correlated with the fact that there was no PolyhedralSurface in SFA 1.1.0, and it was only added in SFA 1.2.0

@jorisvandenbossche
Copy link
Collaborator

jorisvandenbossche commented Apr 15, 2022

Hmm, then the text is not very clear in how it should be interpreted .. (eg in this SO answer it is being interpreted as counter-clockwise for exterior rings: https://gis.stackexchange.com/a/119156)

@alasarr
Copy link
Collaborator

alasarr commented Apr 16, 2022

But to highlight a practical aspect: GEOS doesn't directly provide a method

Very interesting point @jorisvandenbossche. It definitely makes things harder for writers.

Of course, having it as a field that can be set to false, creates the risk that many people generate datasets that don't have a correct winding order and that become less efficient to read (in case a specific winding order is required for your application).

For the use case of GeoParquet, I still feel it's better to have a solid format that removes ambiguity. My feeling is that it will be harder at the beginning, but easier in the long term.

I know I'm prioritizing readers over writers, but just because I think it might be better for interoperability.

@alasarr
Copy link
Collaborator

alasarr commented Apr 19, 2022

Closing this as we've merged @felixpalmer suggestion. Feel free to re-open if you've further suggestions

@alasarr alasarr closed this as completed Apr 19, 2022
@jorisvandenbossche
Copy link
Collaborator

jorisvandenbossche commented Apr 19, 2022

I know the PR has been merged, but to repeat my proposal from above: how would people feel about an explicit field for this that can be set (eg oriented=True|False) instead of having this a non-optional requirement? (for planar coordinates)

Speaking for myself as geopandas developer, I assume we would offer people the option to not correct geometries before writing a parquet file (because it is a costly operation), which with the current wording in the merged PR, that would mean that we would enable users to write invalid files.

(now, I would have to do some experimentation myself to assess how costly this is (and to have an estimate how important this option would be, it's also only for writing while read performance is probably more sensitive). But generally in geopandas, because being based on GEOS and any GEOS operation returning "wrong" coordinate orientation, we would almost always have to correct coordinate orientation before writing)

@Jesus89
Copy link
Collaborator

Jesus89 commented Apr 19, 2022

Maybe we can revisit your proposal @jorisvandenbossche for version 0.3 after your experimentation, and we can move forward with the 0.2.

@alasarr alasarr reopened this Apr 19, 2022
@alasarr alasarr modified the milestones: 0.2, 0.3 Apr 19, 2022
@alasarr
Copy link
Collaborator

alasarr commented Apr 19, 2022

I feel that @jorisvandenbossche and @rouault are both worried about this issue and you have a very valuable voice here as you work on two very famous libraries which will become one of the most used in the future. Thus, how about discussing it for 0.3? I'd like to propose a synchronous working session to discuss this and other topics (spatial index) for version 0.3. We did for 0.1 and was very productive.

@brendan-ward
Copy link
Contributor

(early contributor on the initial development of geoparquet / geofeather support in GeoPandas here)

I'm with @jorisvandenbossche on this one, and I think an optional metadata field to state the orientation (defaulting to unoriented / nonstandardized) is the way to go. Without getting on a rant that is not helpful to advancing the goals of this effort (fast I/O, broader compatibility), I'll summarize with saying that this non-optional requirement made it into 0.2.0 spec in a way that does not sufficiently support those goals when major components of the ecosystem (e.g., GEOS) have a different representation. It will be hard to undo that requirement without it being seen as a step backward; in the future I'd urge a lot of caution around adding non-optional requirements to the spec. Much easier to add those and eventually promote those to required as appropriate. Apologies for not engaging sooner (haven't followed this closely since it moved from initial work on geo-arrow-spec repo), and I know everyone means well so this is NOT intended to come across as harsh or personally critical.

I appreciate the perceived benefits of not having to re-validate / re-orient winding order on read when inputs are known and trusted, but it incurs non-zero cost to achieve it uniformly. I think this is not a pay-it-once and all benefit proposition; this is a pay-it-once and possibly benefit proposition. Instead, the benefit is in stating the degree to which you can trust the inputs (metadata field value says oriented according to a standard); the same is true for geometric validity, NOT in trying to force nonzero-cost standardization (how high the cost depends on your native geometry representation).

Since the underyling geometry storage (WKB) does not rigidly require winding order consistency (and in the wild cannot be trusted to be consistent), the metadata (spec) should not be the mechanism that tries to force it. Instead, it is for the underlying geometry storage representation to take this up (e.g., maybe GeoArrow geometry encoding is responsible for this?), and the role of the metadata is to state the degree to which geometries rigidly conform to defined standards. E.g., if GeoArrow encoding enforces consistent winding order during construction, then it is very easy to add a metadata value to indicate geometries are wound in expected order (or maybe it is simply redundant once you know the storage type); the cost is incurred by the geometry representation not the spec. Not saying that GeoArrow has to enforce winding order, just trying to make the point that if your representation takes a stance on it, it is achieved by that representation rather than the spec.

Otherwise, I think you are adding impedence on writes that is counterproductive. Geoparquet / Geofeather grew out of early efforts to have much faster I/O with geo data using existing, widely used representations (WKB) than was available from native GIS formats within GeoPandas (was much easier than our larger efforts to add better vectorized I/O there). It would be a shame for the spec to undo those gains by requiring a standardization that does not yield uniform benefits purely for the sake of making the data easier to reason about.

One of my most common use cases of feather files (same metadata spec as GeoParquet for now) is as intermediates within a consistent toolchain: read from GIS format, do stuff (using GEOS under the hood), write feather files ... later read feather files, do more stuff (again using GEOS), etc. No benefit but nonzero cost to enforcing winding order during write for each of those intermediates.

I'd like to think that the benefits of having an as-fast-as-possible intermediate storage representation outweigh the benefits of strong consistency of representation (e.g., higher frequency of use between toolkits than within). Otherwise, use one of the other GIS formats already out there for portability, or take up standardization with up-and-coming representations (e.g., GeoArrow).

If I needed to post data to long-term storage and distribution for others in other toolkits, being able to opt-in to doing this and stating the degree to which others can trust rather than check winding order, is the right way to go. I.e., incur the cost at the boundary for data sharing, not for internal use. Making this an optional field in the spec directly supports that.

All that said, I really think this needs to be moved from required to optional in 0.3.0, and to strongly suggest a greater period of testing between promoting elements of the spec from optional to required.

@kylebarron
Copy link
Collaborator

kylebarron commented Apr 20, 2022

Thanks @brendan-ward, I think that perspective is really helpful. I think there are two different use cases here:

  • write-once, read-once: writing fast is very important, and correcting winding order is prohibitively expensive
  • write-once, read-often: writing fast is less important, and the emphasis is on reading as fast as possible.

What would people think about a metadata field that specifies whether or not the winding order is known? That would let the user choose which of these use cases fits their needs better. If the user has a read-often use-case, they can tell the writer to fix winding order before writing.

I suppose the worry here is that most writers don't implement the optional winding order fix?

a greater period of testing between promoting elements of the spec from optional to required.

👍 on this. I worry that moving too fast about the spec could fracture the ecosystem.

@brendan-ward
Copy link
Contributor

Thanks for reframing the use cases @kylebarron

I think instead of the frequency of read / write (because I have a lot of internal cases of your second bullet), I'd instead suggest these fall into two categories:

  • within toolkit boundary: fastest read / writes are very important; low tolerance for impedence on write (or read)
  • between toolkits: degree of standardization / trust is important to state; some tolerance for impedence on read or write depending on specific use cases

Standardizing winding order does not guarantee faster reads; that still depends on your toolkit. Instead, it's a benefit to specific users of specific toolkits: if the metadata tells you things are wound this way, you don't need to check and rewind them IF you would normally do so.

I suppose the worry here is that most writers don't implement the optional winding order fix?

And also that it is a non-zero cost depending on your toolkit, and that your toolkit may then need to reverse it on read. Once such capabilities are broadly accessible and shown to have very little overhead, so that it is a drop-in for most writers of GeoParquet, that would be the time to consider promoting it from optional to required.

@cholmes
Copy link
Member

cholmes commented Apr 20, 2022

I'm pretty convinced on making it optional, thanks for sounding in @brendan-ward and others.

I think we were probably trying to move a bit too fast.

But it sounds like the more conservative thing is actually to make it optional, and soon. I'd be for cutting a 0.3.0 release asap that just has one change - making it a field as @jorisvandenbossche proposes. If someone can make a PR I'm happy to review & merge (with other's reviews) and cut a release.

@brendan-ward
Copy link
Contributor

Thanks for being willing to be flexible here!

How about resolving that in an 0.2.1 release? That seems more in keeping with a "we weren't quite ready to flip that switch" than pushing an 0.3.0 to fix it. Though either way these are 0.x releases which hopefully have less long-lasting impacts than a major version release, and either way getting one out soon will close the window in which someone would implement this for 0.2.0 (though @rouault beat us to it!).

What about instead of a binary oriented=True|False, which then requires further reading to determine what that means, using orientation="ccw"|"cw"|null or more verbose orientation="counterclockwise"|"clockwise"|null, defaulting to unset (null`)? That way for toolkits that natively work in clockwise outer ring order, they know that it is consistently ordered in their representation across all geometries, and signals to toolkits that have the opposite representation that transformation is needed for all geometries (otherwise they'd have to check). Unset would mean that no claims are made about orientation or consistency of such across geometries.

@cholmes
Copy link
Member

cholmes commented Apr 20, 2022

How about resolving that in an 0.2.1 release? That seems more in keeping with a "we weren't quite ready to flip that switch" than pushing an 0.3.0 to fix it.

I have no strong feelings on it. We're out of SemVer land since it's pre-1.0.0, so I don't think it really matters. 0.2.1 feels to me more like a 'typo' than a change in a field. But yeah, I agree these 0.x should not have long lasting impacts, and I think the most important thing is to get it out soon.

I'm far from the expert on this stuff, but your proposal makes sense, and I like being more explicit / self-documenting, and probably more verbose, so +1, but curious for others to sound in.

@rouault
Copy link
Contributor

rouault commented Apr 20, 2022

orientation="counterclockwise"|"clockwise"

The explicitness of the winding order sounds good to me rather than a true/false approach. But should we really offer the two orientations ? Shouldn't that just be orientation="counterclockwise" ? It would be the first spec I can think of where, where the 2 orientations are possible. I don't have strong feeling though. I can understand that some writers/readers might really expect one or the other orientation to get maximum throughput.
Minor: an explicit orientation=null sounds a bit unnecessary to me. A writer that doesn't guarantee any orientation should just omit this property, and it could also be omitted for geometry types where that doesn't make sense. It is a tiny bit simpler for the reader to know that if the orientation member is there it must be a string with either "counterclockwise" or "clockwise", rather than also being possibly a null value.

@brendan-ward
Copy link
Contributor

an explicit orientation=null sounds a bit unnecessary to me

sorry, I was being unclear and conflating this being unset (no key present) with what it would equate internally. You're right, and explicit null value is not helpful.

orientation="counterclockwise" or unset is also 👍 by me. You're right that having orientation="clockwise" may go too far in terms of supporting opposing orientations; I was thinking of it more as an opportunity for the writer to state what they know about the geometry orientation.

So this could end up being more like

Field Name Type Description
orientation string OPTIONAL Winding order of outer ring of polygons; interior rings are wound in opposite order. If present must be "counterclockwise". If absent, no assertions are made regarding the winding order.

Might need to be expanded a bit to cover correct terms for ellipsoidal / spherical or 3D geometries; I'm unfamiliar with the correct terms there.

@cholmes
Copy link
Member

cholmes commented Apr 20, 2022

Looks good to me! Feel free to make a PR and continue discussion there, feels like it's getting close. (Or I'm happy to write this up in a PR, but I like the traceability to who came up with it, and seeing lots of different people as explicit contributors to lines of the spec :)

@jorisvandenbossche
Copy link
Collaborator

An explicit orientation="counterclockwise" sounds good to me! And that also leaves the option open to later add "clockwise" as a possible value, if someone comes along to argue for that.

A remaining question is how this interacts with edges="spherical". My understanding is that for spherical coordinates the winding order always needs to be fixed to be able to interpret the polygon? (e.g. when using S2)
In that case I suppose we have the option to either indicate that if you specify edges="spherical" the default for orientation is "counterclockwise" (instead of unknown), or require that if you specify edges="spherical" you also always need to specify orientation="counterclockwise".

@felixpalmer
Copy link
Collaborator

Thanks for the input everyone. I'm on board with adding orientation:"counterclockwise" as an optional parameter. As long as a reader can be sure about a orientation and optimize the rendering for that, it is fine. The pain with GeoJSON right now is that we can't assume anything and are forced to check orientation on every read.

My hope is that we can incentivize the usage of CCW orientation so that readers can optimize. We should try and avoid a situation where some organizations/users choose CW and some CCW, as such I would try to avoid allowing orientation:"clockwise" as a supported option.

Regarding the interaction with edges="spherical" I would say it is best to be explicit here, rather than having the default depend on another property and stipulate that orientation must be "counterclockwise" for edges="spherical". @brendan-ward is this still OK for your "within toolkits" use case?

@felixpalmer
Copy link
Collaborator

I've put all these thoughts together in a PR, using the proposed wording by @brendan-ward. CC @rouault @kylebarron @brendan-ward (I couldn't add you as reviewers)

@brendan-ward
Copy link
Contributor

stipulate that orientation must be "counterclockwise" for edges="spherical"

I'll have to plead ignorance here, having not yet worked much with spherical coordinates and certainly not from the perspective of I/O. If it is true that the only way to represent spherical edges is using counterclockwise ordering, then this stipulation makes sense. If instead, this is based on design decisions of particular toolkits (e.g., S2) and not consistent in practice across all toolkits, then I don't think we can make this stipulation. The fallback would be the same for spherical coordinates as for planar coordinates, right? In that if you don't know for certain the winding order, you have to check and reorder as needed for your toolkit? (I don't know here)

Given that, my instinct is to keep those decoupled from a specification standpoint, and instead let them be coupled in practice (e.g., if you output coords from S2 and know their orientation, it is easy to set this as well), otherwise this gets into the same potential issues that we're trying to avoid in enforcing planar coordinate winding order.

Dumb question: are we able to use WKB as a storage for spherical coordinates as well?

@rouault
Copy link
Contributor

rouault commented Apr 21, 2022

The fallback would be the same for spherical coordinates as for planar coordinates, right? In that if you don't know for certain the winding order, you have to check and reorder as needed for your toolkit? (I don't know here)

not really (but I'm only superficially aware of issues with spherical coordinates). My understanding is that with spherical coordinates is that you can't guess which "side" of the Earth is in the polygon. imagine a polygon whose coordinates are [[-180,60],[-90,60],[0,60],[90,60],[180 60],[-180,60]] : you don't know if it is meant to be be everything above 60 deg north (actually it is not strictly everything above 60 as the geodesics are not a consistent latitude, but whatever...), or everything below 60 deg north, if there's no winding order convention

@brendan-ward
Copy link
Contributor

So the issue is that you simply cannot interpret the spherical coordinates without knowing the intent of the representation (i.e., based on winding order convention)? Thus - unlike planar coordinates where you can reorder based on the needs of a specific toolkit without losing the meaning of what is the exterior vs interior ring (even if ordered incorrectly) - spherical coordinates cannot be simply reordered to meet the needs of a toolkit because of these implications? Sorry, out of my domain here, likely to suggest bad ideas...

I guess then the choice becomes whether or not orientation="counterclockwise" becomes conditionally REQUIRED by the spec or required in practice by the reader (e.g., refuse to read spherical coordinates if orientation is not defined).

@mentin
Copy link
Author

mentin commented Apr 21, 2022

For edges="spherical" there are still two plausible approaches:

  1. don't assume any orientation, choose the smaller polygon instead. This way one can only represent polygons smaller than hemisphere, but it is still practical for the typical data you use day-to-day. This option was chosen e.g. by initial release of Microsoft SqlServer Geography type
  2. assume edges are oriented. Now you can represent any polygon. This is used by BigQuery, later Microsoft SqlServer releases, etc.

So I would say we don't have to make it REQUIRED, but we do need to describe that lacking orientation flag, the polygons must be interpreted using smaller-of-the-two-possible-interpretations rule.

@rouault
Copy link
Contributor

rouault commented Apr 21, 2022

the polygons must be interpreted using smaller-of-the-two-possible-interpretations rule.

playing devil's advocate: what about a polygon that divides the ellipsoid in 2 strictly equal parts (north hemisphere vs south hemisphere typically) ?

@mentin
Copy link
Author

mentin commented Apr 21, 2022

playing devil's advocate: what about a polygon that divides the ellipsoid in 2 strictly equal parts (north hemisphere vs south hemisphere typically) ?

That polygon becomes unrepresentable without orientation rule too, and should not be written out without oriented flag. As for the readers - garbage in, garbage out. If the creator of the file wrote garbage, there is not much reader can do, so I don't think the standard should care.

@felixpalmer
Copy link
Collaborator

Personally, I don't really like the "choose the smaller polygon" approach. How does one compare the two polygons, with what error margin? It feels error-prone and as mentioned means we cannot have polygons larger than the hemisphere. How about for edges: spherical the reader just assumes orientation: counterclockwise?

@alasarr
Copy link
Collaborator

alasarr commented Apr 22, 2022

Great discussion here. Just want to give a step back in my previous comment after reading all the arguments. I'm ok adding the optional orientation field.

@alasarr
Copy link
Collaborator

alasarr commented Apr 22, 2022

#74 has been merged, the idea is to cut the release on Monday end of the day (pacific) or Tuesday. If you've any comments around this would be great to share before that time.

@jorisvandenbossche
Copy link
Collaborator

Given that #74 and #80 are merged, let's close this issue.
There is some follow-up discussion in #82 about the exact interpretation of the orientation in case of edges="spherical".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants