Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Geometry classes to subclass the C extension type #983

Merged

Conversation

jorisvandenbossche
Copy link
Member

.. and use the pygeos functions for all the Geometry methods/properties.

This is a first step for integrating PyGEOS into Shapely (#962):

  • For now I still use PyGEOS as an external dependency (it will be merged into Shapely in a later step, I first wanted to ensure this essential part is working out)
  • This makes all the Geometry classes (Point, LineString, Polygon, etc) subclasses of the pygeos.Geometry extension type.
  • This also registers those subclasses to pygeos. This means that all pygeos functions will start to return the correct shapely subclass, and not the generic pygeos.Geometry
  • For most of the methods and properties of the Geometry classes, I switched to using pygeos functions (eg geom_factory(self.impl['difference'](self, other)) to pygeos.difference(self, other)). For other functionality in most other modules (eg shapely.ops etc), this is still left as is and keeps using the ctypes interface (this can be cleaned up later in a separate PR)
  • A large chunk of the diff is actually the re-implementation of the Point(), Linestring(..), Polygon(..) etc constructors, as those don't directly map to the pygeos functions to create geometries from coordinates (the shapely ones are more flexible / allow more variation in input types)
  • All tests are passing locally (with pygeos master), but I xfailed a bunch of tests where there are small differences or where we still need to implement something in the pygeos code to support it. But so it can be seen explicitly in the diff which ones are not yet passing.

cc @caspervdw

self._ndim = self.__p__._ndim
self._cseq = lgeos.GEOSGeom_getCoordSeq(self.__p__._geom)
def __init__(self, coords):
self._coords = coords
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly to do with this CoordinateSequence class is something still to be discussed, and I opened #984 for that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been reading up on #984 and I tend to go with this implementation for this PR (making self._coords a ndarray directly)

However I think that sticking a full duplicate of coords to the object by default is a bit too much. Also _coords won't be initialized properly in all construction paths.

Within this approach (I actually like the lazy CoordinateSequency object approach better) I think it is best to use a pattern like this:

@property
def coords(self):
    try:
        coords = getattr(self, "_coords")
    except AttributeError:
        coords = pygeos.get_coordinates(self)
        coords.flags.writeable = False
        self._coords = coords
    return coords

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I think that sticking a full duplicate of coords to the object by default is a bit too much.

Note that this CoordinateSequence is not constructed by default when creating a geometry object. It will only be created once a user accesses the geom.coords attribute.

Now, with the current code, each time the user does geom.coords, it will create a new CoordinateSequence object containing a new array, so making this "cached" is certainly a good idea (as well as making the array not writable)

else:
self._geom, self._ndim = geos_geometrycollection_from_py(geoms)
# TODO better empty constructor
return pygeos.from_wkt("GEOMETRYCOLLECTION EMPTY")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all those constructors, I now use WKT to create an empty geometry, but see pygeos/pygeos#149 to discuss if we need to have a better way for this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pygeos.geometrycollections([]) gives you the empty geometry collection also

@@ -38,16 +40,18 @@
def test_create_from_geojson(geojson):
with pytest.raises(ValueError) as exc:
wkt = shape(geojson).wkt
assert exc.match("Inconsistent coordinate dimensionality")
assert exc.match("Inconsistent coordinate dimensionality|Input operand 0 does not have enough dimensions")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get different error message in several places (we should see to what extent we want to preserve them, sometimes they are more informative though)


@shapely20_todo # logging is not yet implemented
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we need to check in pygeos how this can be possible, if it's a feature we want to preserve

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say all error messages from GEOS (like the one below) should be converted to exception messages, similar to #991 (comment) . I'd assume it would be difficult to capture an error message in pygeos, but also emit the same error message to the logger handler below. I wouldn't expect many other folks to be relying on these log capture methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #998 for a discussion on logging.

@@ -16,6 +18,7 @@ def test_pickle(self):
q = pickle.loads(data)
self.assertTrue(q.equals(p))

@shapely20_todo # big_endian is not yet implemented
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pygeos right now we didn't implement this keyword, but have a byte_order keyword instead (matches more closely the terminology of GEOS). The default is to use the machine's endianness, and one can use the keyword to override this.

@@ -19,6 +19,9 @@ def test_pickle_round_trip(cls, coords):
data = dumps(geom1, HIGHEST_PROTOCOL)
geom2 = loads(data)
assert geom2.has_z == geom1.has_z
# TODO(shapely-2.0) LinearRing type not preserved in roundtrip
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were just discussing this in PyGEOS as well -> pygeos/pygeos#190

There we thought that it might be OK enough to get a LineString back from pickling a LinearRing. However, having the pickling on the python side (still using WKB), we could more easily track the original class as well and ensure it is correctly preserved.
However, having it in python compared to in the base C extension type has a performance penalty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tracked by #988

@@ -43,7 +46,7 @@ def test_binary_predicate_exceptions(self):
p2 = [(339, 207), (280, 311), (460, 138), (399, 242), (459, 277),
(459, 415), (399, 381), (519, 311), (520, 242), (519, 173),
(399, 450), (339, 207)]
self.assertRaises(TopologicalError, Polygon(p1).within, Polygon(p2))
self.assertRaises(pygeos.GEOSException, Polygon(p1).within, Polygon(p2))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, shapely has a bunch of custom error classes in https://github.com/Toblerity/Shapely/blob/master/shapely/errors.py. But in PYGEOS, for now we just have a single GEOSException for errors that are directly raised from one of the GEOS functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising pygeos exceptions is just an intermediate state of the API then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the fact that it is a "pygeos" exception is intermediate state. But basically everywhere you see pygeos you need to mentally replace it with shapely (as in one of the next steps we will move pygeos into shapely and replace the pygeos imports with internal shapely imports).

So the "real" question here is if we are, long term, fine with having only a generic shapely.GEOSException, or if we want to have more fine grained error classes (TopologicalError, WKBReadingError, WKTReadingError).
Note that this is only about the class of the exception, also the generic GEOSException error class will still have the custom, informative error messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #991 to have a dedicated place to discuss this topic.

@jorisvandenbossche
Copy link
Member Author

Gentle ping to get some feedback here.

@mwtoews
Copy link
Member

mwtoews commented Sep 24, 2020

Thanks @jorisvandenbossche this looks great so far. I've built this PR locally, and the majority of the tests indeed pass. I'll drop a few review comments when I get spare moments...

@sgillies
Copy link
Contributor

I won't be able to try this out until the weekend. Will report back then.

Copy link
Collaborator

@caspervdw caspervdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Thanks for this enormous effort of combining pygeos and shapely. I left some remarks.

shapely/geometry/base.py Outdated Show resolved Hide resolved
shapely/geometry/base.py Outdated Show resolved Hide resolved
shapely/geometry/base.py Outdated Show resolved Hide resolved
@@ -838,32 +765,19 @@ class GeometrySequence(object):
_ndim = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that these class-level attributes are not used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__p__ is used below, but I think this would be a good time to rename this private variable to _parent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically for this GeometrySequence class, we also still need to see what to do with it (keep it or not?)

g = geom_factory(sub, parent=self)
g._other_owned = True
return g
return pygeos.get_geometry(self.__p__, i)


class EmptyGeometry(BaseGeometry):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term EmptyGeometry doesn't really cover the fact that GEOS has 'typed' empty geometries.

As `GeometryCollection([]) gives this exact same result, I think this class should best be deprecated.

shapely/geometry/linestring.py Outdated Show resolved Hide resolved
shapely/geometry/polygon.py Outdated Show resolved Hide resolved
shapely/geometry/polygon.py Outdated Show resolved Hide resolved
shapely/geometry/base.py Show resolved Hide resolved
ob._ndim = 2
ob._is_empty = False
return ob
ob = _DummyGeometry(g)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems backwards to me.
In my opinion we should never pass these pointers around as python int objects. Do you think we can remove the geom_factory completely from shapely 2.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is certainly only temporary. I don't refactor the full of Shapely in a single PR, and thus in other parts of shapely, geom_factory is still used together with ctypes. So those changes here with the DummyGeometry hack is just to ensure geom_factory keeps working. But the end goal of Shapely 2.0 for sure means that all usage of geom_factory is eliminated.

@mwtoews
Copy link
Member

mwtoews commented Nov 4, 2020

Would it be fine to remove that? The comment indicates this was done for being able to pass the tests for all GEOS versions, but it seems only for GEOS versions > 3.3, and now we don't test anymore with older versions.

This was an old hack to run the tests with the older library and compare similar WKT, so yes we can update the WKT and remove it. I can volunteer to do this. Aha, I see that you just updated it just now.

@jorisvandenbossche
Copy link
Member Author

Skipped the problematic tests on Appveyor, and updated the WKT/WKB related tests (to no longer be skipped).

I made this call while implementing WKTWriter for shapely, as the meaning of this boolean property big_endian=True/False is more clear than byte_order=0/1. However, I don't have any objections if this is made an alias to align closer to GEOS' terminology.

For now I added big_endian=True/False back as alias for byte_order, we can later discuss if we want to keep both as aliases or choose one long term.

@jorisvandenbossche
Copy link
Member Author

Indeed, already removed it ;)

@jorisvandenbossche
Copy link
Member Author

OK, going to merge this, so we can further work in separate follow-up PRs. Additional comments on the changes here are of course still welcome as well, and I will open issues to track the currently remaining discussion items. Thanks all for the review!

@jorisvandenbossche
Copy link
Member Author

I created some more follow-up issues based on the review discussions in this PR:

And other outstanding issues for which I already created issues earlier:

jorisvandenbossche added a commit to jorisvandenbossche/shapely that referenced this pull request Nov 27, 2020
jorisvandenbossche added a commit to jorisvandenbossche/shapely that referenced this pull request Apr 5, 2021
jorisvandenbossche added a commit to jorisvandenbossche/shapely that referenced this pull request Apr 5, 2021
jorisvandenbossche added a commit to jorisvandenbossche/shapely that referenced this pull request May 27, 2021
jorisvandenbossche added a commit to jorisvandenbossche/shapely that referenced this pull request May 27, 2021
jorisvandenbossche added a commit to jorisvandenbossche/shapely that referenced this pull request Aug 20, 2021
jorisvandenbossche added a commit to jorisvandenbossche/shapely that referenced this pull request Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants