New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Geometry classes to subclass the C extension type #983
Refactor Geometry classes to subclass the C extension type #983
Conversation
self._ndim = self.__p__._ndim | ||
self._cseq = lgeos.GEOSGeom_getCoordSeq(self.__p__._geom) | ||
def __init__(self, coords): | ||
self._coords = coords |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly to do with this CoordinateSequence
class is something still to be discussed, and I opened #984 for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been reading up on #984 and I tend to go with this implementation for this PR (making self._coords
a ndarray directly)
However I think that sticking a full duplicate of coords
to the object by default is a bit too much. Also _coords
won't be initialized properly in all construction paths.
Within this approach (I actually like the lazy CoordinateSequency object approach better) I think it is best to use a pattern like this:
@property
def coords(self):
try:
coords = getattr(self, "_coords")
except AttributeError:
coords = pygeos.get_coordinates(self)
coords.flags.writeable = False
self._coords = coords
return coords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However I think that sticking a full duplicate of coords to the object by default is a bit too much.
Note that this CoordinateSequence is not constructed by default when creating a geometry object. It will only be created once a user accesses the geom.coords
attribute.
Now, with the current code, each time the user does geom.coords
, it will create a new CoordinateSequence object containing a new array, so making this "cached" is certainly a good idea (as well as making the array not writable)
else: | ||
self._geom, self._ndim = geos_geometrycollection_from_py(geoms) | ||
# TODO better empty constructor | ||
return pygeos.from_wkt("GEOMETRYCOLLECTION EMPTY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all those constructors, I now use WKT to create an empty geometry, but see pygeos/pygeos#149 to discuss if we need to have a better way for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pygeos.geometrycollections([])
gives you the empty geometry collection also
@@ -38,16 +40,18 @@ | |||
def test_create_from_geojson(geojson): | |||
with pytest.raises(ValueError) as exc: | |||
wkt = shape(geojson).wkt | |||
assert exc.match("Inconsistent coordinate dimensionality") | |||
assert exc.match("Inconsistent coordinate dimensionality|Input operand 0 does not have enough dimensions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get different error message in several places (we should see to what extent we want to preserve them, sometimes they are more informative though)
|
||
@shapely20_todo # logging is not yet implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we need to check in pygeos how this can be possible, if it's a feature we want to preserve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say all error messages from GEOS (like the one below) should be converted to exception messages, similar to #991 (comment) . I'd assume it would be difficult to capture an error message in pygeos, but also emit the same error message to the logger handler below. I wouldn't expect many other folks to be relying on these log capture methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #998 for a discussion on logging.
tests/test_persist.py
Outdated
@@ -16,6 +18,7 @@ def test_pickle(self): | |||
q = pickle.loads(data) | |||
self.assertTrue(q.equals(p)) | |||
|
|||
@shapely20_todo # big_endian is not yet implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pygeos right now we didn't implement this keyword, but have a byte_order
keyword instead (matches more closely the terminology of GEOS). The default is to use the machine's endianness, and one can use the keyword to override this.
@@ -19,6 +19,9 @@ def test_pickle_round_trip(cls, coords): | |||
data = dumps(geom1, HIGHEST_PROTOCOL) | |||
geom2 = loads(data) | |||
assert geom2.has_z == geom1.has_z | |||
# TODO(shapely-2.0) LinearRing type not preserved in roundtrip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were just discussing this in PyGEOS as well -> pygeos/pygeos#190
There we thought that it might be OK enough to get a LineString back from pickling a LinearRing. However, having the pickling on the python side (still using WKB), we could more easily track the original class as well and ensure it is correctly preserved.
However, having it in python compared to in the base C extension type has a performance penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tracked by #988
@@ -43,7 +46,7 @@ def test_binary_predicate_exceptions(self): | |||
p2 = [(339, 207), (280, 311), (460, 138), (399, 242), (459, 277), | |||
(459, 415), (399, 381), (519, 311), (520, 242), (519, 173), | |||
(399, 450), (339, 207)] | |||
self.assertRaises(TopologicalError, Polygon(p1).within, Polygon(p2)) | |||
self.assertRaises(pygeos.GEOSException, Polygon(p1).within, Polygon(p2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, shapely has a bunch of custom error classes in https://github.com/Toblerity/Shapely/blob/master/shapely/errors.py. But in PYGEOS, for now we just have a single GEOSException for errors that are directly raised from one of the GEOS functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising pygeos exceptions is just an intermediate state of the API then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the fact that it is a "pygeos" exception is intermediate state. But basically everywhere you see pygeos
you need to mentally replace it with shapely
(as in one of the next steps we will move pygeos into shapely and replace the pygeos imports with internal shapely imports).
So the "real" question here is if we are, long term, fine with having only a generic shapely.GEOSException
, or if we want to have more fine grained error classes (TopologicalError, WKBReadingError, WKTReadingError).
Note that this is only about the class of the exception, also the generic GEOSException error class will still have the custom, informative error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #991 to have a dedicated place to discuss this topic.
Gentle ping to get some feedback here. |
Thanks @jorisvandenbossche this looks great so far. I've built this PR locally, and the majority of the tests indeed pass. I'll drop a few review comments when I get spare moments... |
I won't be able to try this out until the weekend. Will report back then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche Thanks for this enormous effort of combining pygeos and shapely. I left some remarks.
@@ -838,32 +765,19 @@ class GeometrySequence(object): | |||
_ndim = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that these class-level attributes are not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__p__
is used below, but I think this would be a good time to rename this private variable to _parent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically for this GeometrySequence
class, we also still need to see what to do with it (keep it or not?)
g = geom_factory(sub, parent=self) | ||
g._other_owned = True | ||
return g | ||
return pygeos.get_geometry(self.__p__, i) | ||
|
||
|
||
class EmptyGeometry(BaseGeometry): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term EmptyGeometry
doesn't really cover the fact that GEOS has 'typed' empty geometries.
As `GeometryCollection([]) gives this exact same result, I think this class should best be deprecated.
ob._ndim = 2 | ||
ob._is_empty = False | ||
return ob | ||
ob = _DummyGeometry(g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems backwards to me.
In my opinion we should never pass these pointers around as python int
objects. Do you think we can remove the geom_factory
completely from shapely 2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is certainly only temporary. I don't refactor the full of Shapely in a single PR, and thus in other parts of shapely, geom_factory
is still used together with ctypes. So those changes here with the DummyGeometry hack is just to ensure geom_factory
keeps working. But the end goal of Shapely 2.0 for sure means that all usage of geom_factory
is eliminated.
This was an old hack to run the tests with the older library and compare similar WKT, so yes we can update the WKT and remove it. |
Skipped the problematic tests on Appveyor, and updated the WKT/WKB related tests (to no longer be skipped).
For now I added |
Indeed, already removed it ;) |
OK, going to merge this, so we can further work in separate follow-up PRs. Additional comments on the changes here are of course still welcome as well, and I will open issues to track the currently remaining discussion items. Thanks all for the review! |
I created some more follow-up issues based on the review discussions in this PR:
And other outstanding issues for which I already created issues earlier:
|
.. and use the pygeos functions for all the Geometry methods/properties.
This is a first step for integrating PyGEOS into Shapely (#962):
geom_factory(self.impl['difference'](self, other))
topygeos.difference(self, other)
). For other functionality in most other modules (eg shapely.ops etc), this is still left as is and keeps using the ctypes interface (this can be cleaned up later in a separate PR)cc @caspervdw