-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading/Creating a shapely.Point
is 10s-100s times slower vs other Point-like objects
#1838
Comments
shapely.Point
is 10s-100s slower vs other Point-like objects
shapely.Point
is 10s-100s slower vs other Point-like objectsshapely.Point
is 10s-100s times slower vs other Point-like objects
A bit of déjà vu from #983 (comment) and following comments. I'm unsure if a "single-point constructor" was ever written that optimizes for these cases. Users are encouraged to create an array of (e.g.) 1_000_000 point geometries, rather than repeating single point geometry operations. |
Thanks for the reference. Also if we provide [ |
Related to this slowdown in scalar operations compared to 1.8 (because of the refactor using array ufuncs for everything), and specifically for the Point constructor example discussed here: I did some effort to speed up the scalar Point(..) constructor(#1547), and currently this specific operation is actually slightly faster compared to 1.8:
Now, that's 1) only one specific case, and 2) it's not because it does OK compared to 1.8 that it can't still be faster.
I am not sure that is doable in general, because if I understand that PR correctly, it is essentially creating a special case implementation for the scalar case. But in shapely we have more than 100 ufuncs binding some GEOS method that all could have a separate scalar version. Specifically for the Point(..) creation, this is certainly doable: we actually already have a scalar C version (and I know you actually propose to stop at just providing coordinate access for Points, but what if then someone else passes by requesting this for one specific other operation?)
In my mind, it is encouraged to use those, if you are working with scalar objects. And if you have a numpy array of points, then the attribute also doesn't work out of the box (you have to write a for loop), and in that case the ufunc is both easier to use and faster. I think in many applications the performance of those attributes won't be a problem (and if it is, it might be worth looking into the vectorized functions anyway, even if we can make those attributes a bit faster). And to be clear, that's not to say that we shouldn't make the Geometry object attributes faster, if that is easy to do we for sure should do it. But it is a tradeoff with the added code complexity and maintenance to do so.
In general, the fact that geometry objects don't have sequence-like behaviours is intentional: this was one of the design changes of shapely 2.0, and we actually provided this feature in 1.x (well, not exactly for Points, but we did for multi-geometries, and we did provide the array-interface for Points). |
@jorisvandenbossche Thanks for your comments. |
Expected behavior and actual behavior.
I've been doing some benchmarks (see below), which lead me to the conclusion that reading x/y/z from a shapely point is extremely slow, especially via the properties but also via
get_coordinates
.Creating a shapely point is also extremely slow
(tens - hundreds times slower than the other containers)
Some rough speed factors compared to a plain tuple (may not be exactly from the output below as I multiple times):
read a single coordinate: tuple(1), Numpy(2.4), NamedTuple(1.14), DataClass(0.9) Shapely.Point(138)
reading 2 coords from a 2d point: tuple(1), Numpy(2.76), NamedTuple(1.14), DataClass(0.9), Shapely.Point(173 | get_coordinates: 45)
creating a 2d point: tuple(1), Numpy(15.4), NamedTuple(9.84), DataClass(7), ShapelyPoint(172-fastest method)
reading 3 coords from a 3d point: tuple(1), Numpy(3.43), NamedTuple(1.3), DataClass(0.9), Shapely.Point(364 | get_coordinates: 41)
creating a 3d point: tuple(1), Numpy(16.48), NamedTuple(10), DataClass(7), ShapelyPoint(173-fastest method)
Expected behavior and actual behavior.
Creating a Point and reading its coordinates should be reasonably fast.
Steps to reproduce the problem.
Run the following code:
My output was:
Operating system
Mac OS X 13.4
Shapely version and provenance
2.0.1 installed from PyPI using pip
The text was updated successfully, but these errors were encountered: