Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapely 2.0 naming inconsistencies to resolve #1276

Open
jorisvandenbossche opened this issue Dec 28, 2021 · 6 comments
Open

Shapely 2.0 naming inconsistencies to resolve #1276

jorisvandenbossche opened this issue Dec 28, 2021 · 6 comments
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 28, 2021

What follows is an overview of some (naming) inconsistencies we currently have between the vectorized functions (moved from pygeos) and existing shapely geometry methods or functions.
We can use this as a general issue, but can also open separate issue to discuss some of them in more detail if needed.

(the naming inconsistency is listed as "original shapely name" vs "vectorized function name (moved from pygeos)")

Unary union:

  • unary_union vs union_all

Unary union is the typical way this is called (also in PostGIS), and this is an often used attribute. But, we also have intersection_all and symmetric_difference_all, for which the _all naming scheme probably makes more sense than adding unary_. For this reason, we could keep both unary_union and union_all as aliases?

This is both an attribute on the geometry objects, as a function in the ops.py submodule.

PR -> #1536

Linear referencing related methods

  • interpolate vs line_interpolate_point
  • project vs line_locate_point

PostGIS also uses the ST_LineInterpolatePoint and ST_LineLocatePoint terminology, while the interpolate and project naming follows GEOS.

Those are methods on the geometry objects

Other methods on the geometry class:

  • parallel_offset vs offset_curve (method only on LineString)
  • representative_point vs point_on_surface

PostGIS has ST_OffsetCurve, and also GEOS uses this naming scheme (GEOSOffsetCurve_r).
Similar for ST_PointOnSurface / GEOSPointOnSurface_r

Buffer resolution keyword

  • resolution vs quadsegs

In Shapely, the quadsegs keyword was deprecated in favor of resolution at some point (5689abf), but the vectorized function (moved from pygeos) is using quadsegs (which maps to the name used in GEOS and more or less in PostGIS). @sgillies do you remember the reason for changing it to resolution? (to have a more readable name?)

Simplify preserve_topology default

The simplify method on the geometry object has a default of preserve_topology=True. The vectorized function (moved from pygeos) on the other hand uses preserve_topology=False.

PostGIS has two separate function for this (ST_Simplify and ST_SimplifyPreserveTopology) so cannot be used as prior art.

PR -> #1392

Polygonization

  • triangulate vs delaunay_triangles
      • keyword naming: edges vs only_edges
      • return type: list of geometries vs GeometryCollection
  • voronoi_diagram vs voronoi_polygons
      • keyword naming: envelope vs extend_to, edges vs only_edges
      • return type: list of geometries vs GeometryCollection
  • polygonize: also different return type: list of geometries vs GeometryCollection

PostGIS has ST_DelaunayTriangles and ST_VoronoiPolygons (and ST_VoronoiLines). GEOS uses GEOSDelaunayTriangulation_r and GEOSVoronoiDiagram_r.

This are functions in the shapely.ops submodule.

Validity

  • explain_validity vs is_valid_reason (in validation.py)
  • validate vs is_valid_reason (in ops.py)

PostGIS has ST_IsValidReason, GEOS also uses GEOSisValidReason_r.

transform vs apply

Shapely has transform(func, geom) in ops.py, which takes a function that accepts/returns a single coordinate pair (so basically func(x, y, z=None) -> x, y, (z)).
The vectorized function (moved from pygeos) is called apply(geometry, transformation) (so the function is the second keyword here), and here the function is assumed to accept/return a 2D array of coordinates (all coordinates at once).
This apply function is already used under the hood in affine_transform.

PR -> #1393

@jorisvandenbossche
Copy link
Member Author

Specifically for those cases that are about functions in a submodule (and not methods on the geometry objects), i.e. for the polygonization and validity functions, we don't necessarily have to "solve" this directly, if we are happy with the new naming. In that case, the preferred top-level (vectorized) functions will have the new naming, and the old functions in the submodule can for now be left as-is soft-deprecated (until we decide if we actually want to deprecate/eventually remove them or not, depending on the discussion in #1225).

It's mostly for the geometry class methods that it would be nice to already provide a consistent interface with the vectorized functions (i.e. interpolate, project, representative_point, parallel_offset, buffer resolution keyword).

@jorisvandenbossche jorisvandenbossche added this to the 2.0 milestone Jan 3, 2022
@sgillies
Copy link
Contributor

I felt like "resolution" was more understandable to shapely users than "quadsegs". The latter being sort of jargon-y.

@jorisvandenbossche I would prefer to maintain the attribute and method names from shapely 1.x unless they are misleading. While 2.0 can have breaking changes, the fewer we make, the better. Users won't celebrate 2.0 as much if they have to make a lot of changes.

Consistency in naming between class attributes/methods and the vectorized functions would be a plus, certainly. What would you think about changing the vectorized function names to be consistent with the class attributes and methods? Or adding in aliases in that direction? When merging pygeos into shapely, which API needs to give is a challenging question. As pygeos is newer and not yet at 1.0, maybe it could make more changes than we've considered up to now?

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Feb 16, 2022

I would prefer to maintain the attribute and method names from shapely 1.x unless they are misleading. While 2.0 can have breaking changes, the fewer we make, the better.

Yes, I certainly wouldn't propose to remove any of the current geometry class attributes/methods. And if we wanted to do that, we should have deprecated them in Shapely 1.8.

Focusing on the attributes/methods for a moment (so this is about interpolate vs line_interpolate_point, project vs line_locate_point, representative_point vs point_on_surface, and parallel_offset vs offset_curve), I think there are basically two options:

  1. Only keep the original Shapely names, and rename the top-level functions to match those names
  2. Add the new names as aliases on the geometry class (like we have had .unary_union and .cascaded_union as aliases for a long time), so we have at least the aliases matching with the top-level function names.
    • And for this option, we can still decide later in the future if we want to start actively deprecate the "old" names or keep them both versions as aliases for a longer time

I am personally not super attached to the new names from pygeos (I think @caspervdw is a bit more :)), but I think one clear advantage of having them (so at least keeping those as aliases, option 2) is that they match the names from eg PostGIS (and R sf I think as well), which can improve discoverability for users.

@jorisvandenbossche
Copy link
Member Author

For two of the other items mentioned above, I opened a PR to align the moved pygeos functions with Shapely:

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Aug 22, 2022

The issues from the list in the top post that are not yet finalized are:

  • parallel_offset vs offset_curve (method only on LineString)
  • Buffer resolution keyword: resolution vs quadsegs

Plus for inconsistent naming for functions in submodules:

  • Polygonization functions
  • Validity

@jorisvandenbossche
Copy link
Member Author

One more naming inconsistency that was noted by @mwtoews in #1522 (comment), is the keyword naming for the distance / radius / width in buffer and offset_curve (parallel_offset):

A few observations: the second parameter is either radius or width, while shapely 1.8, GEOS and JTS use distance. I should probably change this in this PR to distance for consistency.

  • The top-level shapely.buffer(..) (from pygeos) uses radius (although documents it as width), the Geometry class buffer() method uses distance
  • The top-level shapely.offset_curve(..) (from pygeos) uses distance, the Geometry class offset_curve()/parallel_offset method also uses distance
  • The GEOS C API uses width, JTS uses distance, PostGIS (ST_Buffer) uses radius_of_buffer (but that's only in documentation, you don't actually use that in queries I think)

So I think the proposal of @mwtoews to use distance everywhere for consistency seems best (basically only changing that in shapely.buffer, as far as I can see)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants