New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: prepared geometry as additional (cached) attribute on GeometryObject #92
ENH: prepared geometry as additional (cached) attribute on GeometryObject #92
Conversation
src/ufuncs.c
Outdated
/* TODO check if nout can be 0 for ufuncs (because in principle we don't need to return something here) */ | ||
PyObject **out = (PyObject **)op1; | ||
Py_XDECREF(*out); | ||
*out = geom_obj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copy pasted some lines of code here to get it working, but in principle we don't need to return anything here (the geoms get modified in place). But I need to check if ufuncs can have zero return values.
(this is the part that can still give segfault after a while, as I suppose the return value here is not properly set up for GC)
@jorisvandenbossche thanks for working on this! Also, very sorry not to have had a chance to make more progress digging into #84 What was the time for preparing the geometries? That might be an interesting data point in thinking through if preparing on the fly makes sense - though likely we'd need to run that on a large collection of polygons to get a good heuristic of the time impact. Where I would be concerned / surprised is if a function that does preparing on the fly modifies the array I pass into it, as opposed to creating prepared geometries exclusively within the bounds of the predicate function. Were you thinking the former or the latter? I've struggled with issues of how users should keep track of prepared geoms too, and I like your solution of keeping the prepared & original geometry together. I think in an ideal world, a user wouldn't need to worry about prepared geometries at all; they'd just be used under the hood in the predicate functions where applicable. Meaning: it is possible that we could convert all the predicate functions to prepare geometry on the fly. I liked the developer ergonomics of using prepared geometries on the fly in #87; one less concept to learn and keep track of, but much faster results. I think the interesting cases to ponder here are the tradeoffs between:
It might be that both of those cases are rare enough that preparing on the fly within the predicate functions provides the best general API. |
I pushed a version of the prepare ufunc with no return value, so that made it possible to time this (before the return value was not properly GCed, so doing that in a time loop segfaulted). And it seems that for this case, the preparing is only taking a tiny bit of time, while being very beneficial for the intersects:
The current version of |
@jorisvandenbossche
If I then do a spatial join to find potentially overlapping features (14.5k potential intersections): Even one of my worst-case datasets at 2M lines can be prepared in ~ 1s. The performance gain of using prepared intersections greatly outweighs the time it takes to prepare geometries. Based on that, I think we should simply prepare on the fly in every predicate function, and also in the I definitely don't want to negate any of your work here, but it seems like it would simplify the API and reduce number of concepts that users need to understand (i.e., when do I prepare? how do I unprepare?), while making it so that the normal predicate functions are blazing fast (compared to their unprepared counterparts). Then, if we later see there is a compelling case NOT to prepare on the fly (e.g., prepare takes longer than predicate), perhaps we could expose that as a parameter? |
Cool, thanks for testing! Indeed, it seems the time it takes is small compared to the other typical operations (and the advantage big for those predicates).
No, no :) The explicit I am wondering if we still want some flag to enable this or not, if not for testing (as we are doing here). BTW, are those test data you used open data? It would be good to gather a set of data examples with varying characteristics to test things like this (also for things like sjoin when we are going to add that)) |
Another thought: how about points, I suppose for those it doesn't matter much to prepare them? But does it harm? |
For points, it looks like very little overhead (if I'm reading the GEOS source correctly): https://github.com/libgeos/geos/blob/master/src/geom/prep/PreparedPoint.cpp I don't think there is any harm there. The data I have been using are mostly derived from open data sources:
(several steps of preprocessing) I think it would be OK to share those derivatives for testing purposes (given proper attribution, disclaimers, etc), though they are pretty big! Is there an existing library of datasets used by geopandas / etc that is not in the codebase? Happy to try and pull out useful extracts for testing... |
OK, I updated this PR, cleaned up the code, to have something minimal that could be merged (to unblock other PRs to use prepared geometries or to further optimize this). |
@pramsey Sorry for pinging you here, but I wanted to ask a quick question for which you might have insight. Based on some basic timings (using polgyons and points layer, with an intersects operation), using the prepared version of the predicate function gives a very clear speed boost (as you advised to us on twitter! :-)) And the overhead of doing the preparation is almost nihil. But so the question we are asking ourselves: should we then just always use the prepared versions of the predicates? (and keep the GEOSPreparedGeometry cached with the GEOSGeometry) Or, based on your experience with this, are you aware of cases where you might not want to use prepared geometries for the predicates? (in which case it could be useful to give some control to the end user on using prepared geometries or not) |
That's a hard call, because it's very pattern based... the overhead isn't really nil, it's around O(n), but once you've done the preparation the computation is around O(log(n)) so doing it the first time you get asked a predicate question is not a bad idea. The trouble is you're then assuming you'll be asked a second predicate question, regarding the same geometry. |
Thanks for the response! Maybe a less vague question: for example in PostGIS, if you do a
That's true. On the one hand, I assume that keeping the prepared geometry "cached" doesn't give much overhead, even if you didn't need it (there is some memory overhead though). But on the other hand, since the preparation itself is so fast, there is maybe also not much to be gained in keeping them cached after the first predicate question. |
I did some timings with a variant where we always prepare the geom when needed (in the predicate functions) but without caching it. |
@jorisvandenbossche thanks for running more timing tests on this! With the caching, are you doing that on the fly, on first instance of Are there other possible use cases for ufunc broadcasting beyond a spatial join type operation that would be useful to consider here? What is the memory overhead for prepared geometries? If it is more than a tiny amount, or variable based on the complexity of the geometry, that might be something for us to consider in terms of memory vs performance. What about copying geometries to new arrays? Do the cached prepared geometries go with them, or need to be recreated? |
Sorry for not being very clear. So the two things I was comparing:
With this PR (using the variables of the top-post):
With the alternative approach (second bullet point above), always recomputing the prepared geometry when needed:
and as comparison, with master without prepared geometries at all:
So for this "brute force" intersects of all combinations of geometries with such a broadcasting ufunc, caching the prepared geoms is clearly beneficial. The only thing I am not sure about is if that is not optimizing the wrong thing.
which is even a bit faster than the caching of prepared geoms (and moving this loop over the points into C for an actual sjoin algo will make it even faster).
That's an advantage of caching the prepared geometries, I think. When copying geometries to a new array (eg taking a subset of an array), those geometries are not actually copied, only new pointers to them are made. So the prepared geoms can be reused and don't need to be recreated. While an STRtree needs to be reconstructed when taking a subset of the array. I should maybe try to combine our PR on STRtree and this one, to see if caching would be beneficial in such a situation (I assume it will not give much benefit there). |
@jorisvandenbossche sorry - I didn't respond to your question in #87. What you outlined in your comment there seems reasonable, and would allow us to get or create cached prepared geometries in that context. I'd be curious to know how combining these two PRs performs differently. In theory, I'd hope that the performance of a single tree query with predicate (e.g., intersects) performs relatively the same with the existing approach or with the use of cached prepared geometries (when accounting for the time to create them in the comparsion). Where I'd expect the cached approach to have an edge is repeated queries against the tree with different subsets and predicates, i.e., first query those that intersect, then of those, query those that are completely contained and query those that cross (I do this in another project, but I think maybe that's an edge case). |
New thought, likely half-baked. Instead of adding a parameter to predicate functions to enable / disable use of prepared geometries, so that user can disable if they know prepared geometries are a bad solution for their specific dataset, what about adding an attribute that could be set on the geometry to disable preparing / caching? Not sure how users would set that, but it seems like something that needs to be handled at the geometry object level. Otherwise various operations that touch a geometry and request the prepared version all need to be called with the same parameter to disable use of prepared. That could also give better granularity: you could disable prepared ops on some but not all of your geometries in an array. |
I'd like to revive this discussion. After not thinking a long time about it, I now arrive at that this PR already has the best strategy:
Note that these |
How do we want to approach this with STRtree with predicates? In the context of querying with predicates, we only really need the geometries whose bounding boxes have overlaps with tree geometries to be prepared in this case, not all input geometries to the query (esp
Meaning that the Also: |
We could let the
Indeed, we should skip preparing geometries if they exist already. It never makes sense to "reprepare" as geometries are immutable. Another feature of this 'in place' preparing is that we can selectively prepare geometries based on some crude estimate:
|
@jorisvandenbossche I resumed the work on this PR. We can iron out the strtree implementation later, but I gathered from above conversations that the stuff in this PR is something we agree on already. Before going through the full list of predicates, this should be reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this rolling again @caspervdw
Overall I think we were agreed about the overall behavior here, and evaluating this for strtree should be a different PR / discussion.
What do you think about adding a function is_prepared
? This would return True
/ False
based on whether or not the _ptr_prepared
is NULL
, and would avoid having to use that attribute during tests or elsewhere in the python API where we want to know if it is prepared. (except in tests of is_prepared
where we need to test against that attribute to know it is working properly).
src/pygeom.c
Outdated
@@ -407,6 +412,14 @@ char get_geom(GeometryObject *obj, GEOSGeometry **out) { | |||
} | |||
} | |||
|
|||
/* Get a GEOSPreparedGeometry pointer from a GeometryObject | |||
This function does not check obj's type */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a note here that this will return NULL
if the geometry has not yet been prepared.
Also - get_geom
has extra processing to handle cases where obj
is None
. It seems like it is important to do the same here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, I implemented get_geom_with_prepared
. This because get_geom_prepared
would always go after get_geom
, better to combine them in one function (to be sure the checks are done).
pygeos/creation.py
Outdated
"""Destroy a previously prepared geometry, freeing up memory. | ||
|
||
Note that the prepared geometry will always be cleaned up if the geometry itself | ||
is dereferenced. This function needs only be called in very specific circumstances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be more guidance here or in docs about what those specific circumstances are?
Thanks for the review @brendan-ward . I like the idea of an OK like this to start implementing the other prepared predicates? |
@jorisvandenbossche @brendan-ward Could you take a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caspervdw thanks for the updates here.
A few suggested changes mostly around docstrings and params.
Added #235 to leverage this for stree.query_bulk.
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @caspervdw - looking forward to having this available!
@caspervdw BTW, thank a lot for reviving and finishing this up! |
…o version 2.0.0 0phoff (1): to_shapely and from_shapely updates (pygeos/pygeos#312) Adam J. Stewart (1): Add Python 3.10 wheels for additional platforms (#1373) Alan D. Snow (2): REF: Use pkg_resources instead of distutils (pygeos/pygeos#393) CI: Use pushd/popd when installing GEOS (pygeos/pygeos#405) Ariel Kadouri (4): Update segmentize tolerance parameter to max segment length (#1506) Alias geometry.base buffer styles to geometry.constructive enums (#1508) ENH: Remove Repeated Points (#1546) ENH: Add directed parameter to line merge (#1592) Bas Couwenberg (1): Fix versioneer configuration (#1654) Ben Beasley (1): MAINT: Remove descartes from requirements-dev.txt (#1591) Brendan Ward (42): ENH: include predicates in STRtree.query() (pygeos/pygeos#87) ENH: Added total_bounds() (pygeos/pygeos#107) ENH: query tree with multiple input geometries (pygeos/pygeos#108) Remove compiler warnings (pygeos/pygeos#115) ENH: only add non-empty geometries to STRtree (pygeos/pygeos#147) Add shapely-rfc link for latest proposal on integration (pygeos/pygeos#170) ENH: Add more predicates to STRtree query (pygeos/pygeos#157) Drop Python 3.5 from Travis-CI matrix (pygeos/pygeos#211) ENH: set up Cython support + get parts of multipart geometries (pygeos/pygeos#197) TST: Add Github Action for CI testing using conda (pygeos/pygeos#222) API change: return 0 for None values in geometry counting functions (pygeos/pygeos#220) Integrate previously prepared geometries into tree query ENH: Reduce compiler warnings from Cython extensions (pygeos/pygeos#228) Update changelog CLN: remove Cythonized and compiled extension files on python setup.py clean (pygeos/pygeos#239) CNL: Remove unnecessary filtering of counts from get_num_geometries (pygeos/pygeos#248) ENH: Adds reverse function for GEOS >= 3.7 (pygeos/pygeos#254) CLN: Migrate resizeable vector types and functions to dedicated files (pygeos/pygeos#247) ENH: Add STRtree benchmarks (pygeos/pygeos#270) ENH: STRtree: use addresses of tree geometries to calculate their index (pygeos/pygeos#265) ENH: Add get / set precision ufuncs (pygeos/pygeos#257) ENH: Add fixed precision set operations for GEOS >= 3.9 (pygeos/pygeos#276) Fix signed char portability issue (pygeos/pygeos#293) TST: Handle GEOS version strings with dev / beta qualifiers in the tests (pygeos/pygeos#301) ENH: Add densify ufunc (pygeos/pygeos#299) ENH: add STRtree nearest neighbor(s) function (pygeos/pygeos#272) DOC: clarify overlaps versus crosses predicates (pygeos/pygeos#288) ENH: Consolidate validation of enum values and raise ValueError on invalid values (pygeos/pygeos#263) Add box ufunc based on create_box C function (pygeos/pygeos#308) Allow s390x to fail; not yet handled in GEOS (pygeos/pygeos#316) raise error on nonscalar inputs to STRtree::nearest_all (pygeos/pygeos#313) TST: Really allow s390x to fail on Travis CI (pygeos/pygeos#319) Raise warning and return None on invalid WKB/WKT instead of raising exception (pygeos/pygeos#321) ENH: add dwithin to STRtree (pygeos/pygeos#425) BUG: Fix STRtree memory leaks (pygeos/pygeos#434) TST: Update benchmarks for pygeos=>shapely (#1424) ENH: Update STRtree API (merge query / query-bulk, return only indices) (#1422) ENH: Remove "shapely" name and duplicate WKT type from __repr__ (#1412) ENH: Update STRtree nearest neighbor API (#1428) ENH: Remove Geometry() constructor usage and use concrete geometry type constructors or from_wkt (#1414) Fix: update geometry repr used in docstrings to pass doctests (#1451) DOC: Improve STRtree docstrings (#1626) Casper van der Wel (256): First commit to pygeos Fix indentation in README Add Gd_G functions (e.g. simplify) Redo the list of functions WIP on uint32 functions Implement integer fucntions Clean Changes for distribution Changes for distribution WIP Add extension type Working contains2 New Geometry constructor Working intersection2 Working YY_b and YY_Y funcs Refactor Add Y_b functions Some tests Add Y_Y functions Y_d functions Add Y_l functions Y_B functions Distance functions Add Yd_Y functions Add Yl_Y More strict int types Project functions Add special functions Add normalize Clean and readme Bump to 0.2 Fix readme and geometry construction Add point construction ufunc Minor fixes Rename Point to points Point from tuples Make use of generalized ufunc Added test for dimensionality Implement points, linearrings, linestrings and improve exception handling Finish linearring Polygons Polygeons and collections Readme Package static files NaN / None handling Deal with nan for predicates Version to dev Skip NaN in collection creation Optimization WKT and WKB writers Finish WKB/WKT IO Typo Tests and safety for GEOSGeometry Readme and bump Setup travis and code formatting Version to dev Add numpy include Add pygeos.ufuncs as package Add build_ext Setup Travis and attempt to set up Appveyor (pygeos/pygeos#13) Rewrite setup.py enabling compilation on windows Readme Repair build, remove construction from ptr Also deallocate arrays Fix warnings Use master badge Attempt fix for win Simplify appveyor Fix RAISE_NO_MALLOC Haussdorf distance Buffer with style Refactor Refactor + buffer Docstrings Fix readme Refactor functions into separate modules Refactor into submodules Add voronoi and delaunay functions Free WKT and WKB buffers; tested with valgrind Appveyor fix attempt Add documentation for Y_b functions Add documentation to unary predicates Finish unary predicates Fix doctests PyPI badge Skip doctest on Py3.5 WIP Docs for predicates Update README.rst (pygeos/pygeos#20) RTD docs build Install . on RTD Fix indentation Only install GEOS using conda Try with setup.py install Typos and fixes in docs Document the boundary function Document buffer Document centroid and remove clone Document convex_hull Add constructive to API docs Add envelope docs Simplify and snap Finish constructive docs Pin numpydoc to 0.7.0 on RTD Rewrite constructive tests + fixes Fix travis Syntax fix for Py3.5 Docs WIP Finish setop docs Fix tests Simplify Fix Tests for set operations Integrate get element functions Finish element counters and getters Also fix get_exterior_ring Move has_z, remove normalize, finish get_x and get_y Docs Unittests Document and test measurement funcs Tests for pygeos.length Fix doctest rounding WIP On separating Missing and Empty Fix and finish Fix readme Some docs PR Changes after review Return NaN for distance to EMPTY Added is_valid_input Docstrings Setup versioneer Add geos_version Add LICENSE to MANIFEST RLS: 0.4 ENH: Get and set coordinates as a array of floats (pygeos/pygeos#44) Finish API docs (pygeos/pygeos#53) Moved to pygeos org (pygeos/pygeos#56) Rewrite Travis CI script (pygeos/pygeos#59) Add extern in header files (pygeos/pygeos#79) ENH: Add STRtree (pygeos/pygeos#58) [ENH] Add bounds function (pygeos/pygeos#69) ENH: Implement from_shapely (pygeos/pygeos#61) Textual changes + test Fix strtree header file (pygeos/pygeos#91) CI: Test on python 3.8 (pygeos/pygeos#101) Gitignore ENH: Add any pygeos/*.dll files for Windows binary .whl files (pygeos/pygeos#103) Benchmarking suite (asv) (pygeos/pygeos#96) BUG: Check types in create_collections (pygeos/pygeos#86) Fix segfault in strtree test_flush_geometries (pygeos/pygeos#100) ENH: Implement hash, eq, neq (pygeos/pygeos#102) Include GEOS license info (pygeos/pygeos#118) RLS: 0.7.1 Version conditional: make_valid (pygeos/pygeos#120) Eliminate global GEOS context and release GIL (pygeos/pygeos#113) Add release notes (pygeos/pygeos#158) Fix memleak in strtree creation (introduced in pygeos/pygeos#144) (pygeos/pygeos#162) Fix memory leak in from_shapely (introduced in pygeos/pygeos#144) (pygeos/pygeos#163) Release the GIL for geometry-creating operations (pygeos/pygeos#156) Memory leak testing using valgrind (pygeos/pygeos#159) DOC: Update docs, add multithreading section (pygeos/pygeos#169) FIX: Accept linearrings in pygeos.multilinestrings (pygeos/pygeos#168) Raise ValueError on to_wkt of multipoint with empty point (pygeos/pygeos#171) [DONE] Fix POINT EMPTY to_wkb (pygeos/pygeos#179) Fix segfault on linestring/linearring/polygon creation (pygeos/pygeos#187) Accept multi geometries in boundary() (pygeos/pygeos#188) Update classifiers in setup.py RLS: 0.8 Fix release date in changelog Add Zenodo badge Limit WKT length in repr (pygeos/pygeos#189) Support pickling (pygeos/pygeos#190) Fix bug when setting coordinates on an empty point (pygeos/pygeos#199) Implement is_ccw for GEOS >= 3.7.0 (pygeos/pygeos#201) Minor doc changes (pygeos/pygeos#202) Re-add GeometryType -1 (but name it MISSING) (pygeos/pygeos#204) Format C code using clang-format with Google style (pygeos/pygeos#203) Release GIL for is_geometry, is_missing and is_valid_input (pygeos/pygeos#207) Fix line_interpolate_point for multilinestrings (pygeos/pygeos#208) Add Cython to RTD build (pygeos/pygeos#236) CI: Cull windows tests and shorten name (pygeos/pygeos#237) FIX: Consistent behaviour for all reduction set_operations (pygeos/pygeos#249) ENH: Add is_prepared (pygeos/pygeos#252) [API] Set None as default axis for reduction operations (pygeos/pygeos#266) Update changes (pygeos/pygeos#282) Fix mistake in changes RLS: 0.9 Prepare 0.10 Add travis for different CPU archs (pygeos/pygeos#296) NaN equality tests for GEOS 3.10 (pygeos/pygeos#304) Pin endianness in to_wkb tests (pygeos/pygeos#305) Also clean build/lib.*/pygeos in setup.py clean (pygeos/pygeos#307) Fix badges Fix creation error handling and release GIL (pygeos/pygeos#310) Accept indices in collection constructors (pygeos/pygeos#290) Enable "return_index" in get_coordinates (pygeos/pygeos#318) Create simple geometries from coordinates and indices (pygeos/pygeos#322) Fix the badges Only run doctests on conda-forge, latest Py38 (pygeos/pygeos#325) Add flake8 linter and format repo (pygeos/pygeos#328) DOC: Add kwargs where necessary and document it (pygeos/pygeos#327) Release the GIL in shared_paths (missing decorator) (pygeos/pygeos#331) Create polygons from indices (pygeos/pygeos#326) Show docstrings for functions decorated with requires_geos (pygeos/pygeos#341) Merge polygons_1d into collections_1d + finish implementation for rls 0.10 (pygeos/pygeos#346) Remove minimum_rotated_rectangle duplicate + fix docstrings (pygeos/pygeos#350) Exclude Cythonized .c files from distributions (pygeos/pygeos#351) Contributors RLS: 0.10 Back to 0.11 dev FIX: Testing on Pypy (pygeos/pygeos#353) FIX: Handle NULL in object-dtype arrays (pygeos/pygeos#374) Build wheels on CI with cibuildwheel (pygeos/pygeos#365) RLS: 0.10.2a1 RLS: 0.10.2a2 Fix requires_geos with methods (pygeos/pygeos#376) RLS: 0.10.2 Back to 0.11dev [skip ci] Handle linearrings in is_closed (pygeos/pygeos#379) Handle 'out' keyword argument in constructors if indices are given (pygeos/pygeos#380) Add pygeos.empty (pygeos/pygeos#381) Replace appveyor with gh actions (pygeos/pygeos#382) Fix WKB/WKT of empty (multi)points on GEOS 3.9 (pygeos/pygeos#392) Write docs for runtime library finding + clean CI runners (pygeos/pygeos#387) Trigger build Disallow linearrings with 3 coordinates in GEOS 3.10 (pygeos/pygeos#378) CI: Fix GEOS main caching (pygeos/pygeos#399) Force 2D/3D (pygeos/pygeos#396) Fix 3D empty WKT serialization (pygeos/pygeos#403) Fix GEOS 3.10 tests for linearring construction (pygeos/pygeos#408) Update docstring of STRTree assert_geometries_equal (pygeos/pygeos#401) Adapt set_precision API and fix remaining GEOS 3.10 tests (pygeos/pygeos#410) Fix segfault when getting coordinates from empty points (pygeos/pygeos#415) Revert GEOS 3.8 version in test runner wheels: Add Python 3.10 and GEOS 3.10.0 (pygeos/pygeos#416) RLS: 0.11 Back to 0.12 development [ci skip] RLS: 0.11.1 GeoJSON IO for GEOS 3.10 (pygeos/pygeos#413) Fix tests for GEOS main (pygeos/pygeos#419) Revert Win32 CI version for faster builds (pygeos/pygeos#420) Reinstate may_segfault for from_geojson (pygeos/pygeos#418) dwithin for GEOS 3.10.0 (pygeos/pygeos#417) Revert changes from pygeos/pygeos#418 (pygeos/pygeos#426) OSX arm64 and universal2 wheels (pygeos/pygeos#427) Fix error handling in STRtree (pygeos/pygeos#432) Change linearring closing logic (pygeos/pygeos#431) Documentation fixes (pygeos/pygeos#430) Solve RuntimeWarnings in tests (pygeos/pygeos#441) Arm64 wheel builds on Travis (pygeos/pygeos#444) Clean the NoticeHandler (#1329) CI: Fix the release workflows (GHA and travis CI) (#1253) Fix segfault in reduction functions (#1517) ENH: Check signals every 10000 ufunc iteration (#1370) COMPAT: Add compat for unpickling shapely<2 geometries (#1657) Ewout ter Hoeven (4): CI: Enable Python 3.11, update used actions (#1560) release CI: Update used actions to latest versions / Python 3.11 wheels (#1561) Add Dependabot configuration for GitHub Actions updates (#1597) release CI: Use release/v1 branch for PyPI publish action (#1646) Geir Arne Hjelle (1): ENH: Add concave_hull() function (#1518) James Gaboardi (1): CI: flake8 has migrated to GH – update .pre-commit (#1614) James Myatt (1): DOC: Fix docstring for get_coordinates (pygeos/pygeos#340) Joris Van den Bossche (199): Split single ufuncs.c file in multiple files cleanup init_geom_type signature + remove unreachable code in GeometryObject_new Add LICENSE BLD: use angle bracket include for numpy Update README to include conda-forge installation instructions Refactor ufuncs module into lib module (pygeos/pygeos#48) ENH: add wkt / wkb ufuncs (pygeos/pygeos#45) ENH: add equals_exact predicate (pygeos/pygeos#57) Ensure to only use GEOS reentrant API (pygeos/pygeos#63) RLS: 0.5 Small updates to the README (pygeos/pygeos#68) RLS: 0.6 RLS: 0.7 ENH: add normalize (pygeos/pygeos#123) Fix spacing in README (pygeos/pygeos#173) Rename get_coordinate_dimension (dimensions -> dimension) (pygeos/pygeos#176) ENH: ability to get z-coordinates in get_coordinates (pygeos/pygeos#178) Release the GIL for STRtree bulk_query (pygeos/pygeos#174) ENH: Add get_z ufunc (pygeos/pygeos#175) ENH: Add subclass registry (enable subclassing pygeos.Geometry) (pygeos/pygeos#182) ENH: add relate() function (pygeos/pygeos#186) Delay shapely import (pygeos/pygeos#193) BLD: allow GEOS_CONFIG env variable to override PATH (pygeos/pygeos#200) BUG: Fix error handling for line_locate_point (GEOSProject(Normalized)) (pygeos/pygeos#216) ENH: add minimum_clearance (pygeos/pygeos#223) TST: Fix make_valid tests for OverlayNG (normalize result/expected) (pygeos/pygeos#232) ENH: offset_curve (pygeos/pygeos#229) ENH: support z-coordinates in apply (coordinate transformation) (pygeos/pygeos#221) CI: move Travis linux builds to Github Actions (pygeos/pygeos#240) ENH: prepared geometry as additional (cached) attribute on GeometryObject (pygeos/pygeos#92) CLN: deduplicate some code with macros in geometry-creating ufuncs (pygeos/pygeos#230) ENH: relate_pattern (pygeos/pygeos#245) ENH: clip_by_rect (pygeos/pygeos#273) Update for compatibility with numpy 1.20 (builtin type aliases, array coercion) (pygeos/pygeos#269) TST: fix from_shapely test for numpy 1.20 (pygeos/pygeos#278) CI: add GEOS 3.9.0 build to linux CI (pygeos/pygeos#279) DOC: fix style issue with numpydoc parameter names and sphinx_rtd_theme (pygeos/pygeos#283) Update ASV configuration (pygeos/pygeos#285) ENH: add contains_properly predicate function (pygeos/pygeos#267) Change default STRtree leaf size (node capacity) to 10 (pygeos/pygeos#286) Update pin for numpy version for Python 3.9 in pyproject.toml (pygeos/pygeos#295) ENH: add polygonize ufunc (pygeos/pygeos#275) Add back pygeos.strtree.VALID_PREDICATES for now (pygeos/pygeos#309) ENH: add polygonize_full (pygeos/pygeos#298) Add pre-commit configuration (pygeos/pygeos#330) DOC: clarify that set_coordinates modifies array of geometries in place (pygeos/pygeos#335) Explode polygons into rings: get_rings (pygeos/pygeos#342) Fix use of C logical operator (pygeos/pygeos#348) ENH: Add shortest_line (nearest_points) ufunc (pygeos/pygeos#334) Fix multi-threaded STRtree query by ensuring it is built on creation (pygeos/pygeos#362) BUG: fix no inplace output check for box and set_precision (pygeos/pygeos#367) RLS: update changelog for bug-fix release (pygeos/pygeos#369) Release workflow: automate sdist creation / GitHub release (pygeos/pygeos#370) Fix tag selector for release workflow RLS: 0.10.1 CI: Fix failing Windows github actions (pygeos/pygeos#406) [2.0] Remove mutability of geometries (#960) [2.0] Remove deprecated asShape / adapter classes (#961) [2.0] Remove public ctypes and array interface (#977) [2.0] Remove iteration / getitem from multi-part geometries (#982) Refactor Geometry classes to subclass the C extension type (#983) Refactor affine_transform: use general function apply on coordinates (#1019) Clean-up use of impl/lgeos/delegated/exceptNull in shapely/geometry/ (#1020) Remove cython code: remove the speedups module + refactor the vectorized module (#1036) Refactor shapely.prepared to use prepared base geometries (#1039) Clean-up impl: remove impl.py + all helper classes (#1052) Remove usage of lgeos in shapely.validation (#1067) [2.0] Remove len() (__len__) for multi-part geometries (#1114) Refactor shapely.ops to not use lgeos (#1065) Refactor pickling of LinearRing to not use lgeos (#1162) [2.0] Disallow setting custom attributes on geometry objects (#1181) Refactor strtree to not use lgeos / ctypes (#1161) [2.0] Remove shapely.geos.lgeos ctypes interface to GEOS (#1163) TST: fix tests on Windows for python >= 3.8 (#1213) Update GEOS url (pygeos/pygeos#435) Clean-up old/unused files and scripts (#1219) PERF: Use GEOSCoordSeq_copyFromBuffer for creating simple geometries (pygeos/pygeos#436) Migration to Shapely: rename conflicting files Move pygeos/* files into shapely/ python package Rename pygeos -> shapely in the python package Minimal changes (setup.py, rename geometry.py(x)) for working package Remove PyGEOS/Shapely conversion layer (from/to_shapely functions) (#1238) Minimal pygeos->shapely renaming in C code to get cython API working (#1240) TST: fix search/replace mistake in test_empty (#1244) BUG: fixup the hashability of geometries (#1239) Remove now unused shapely._buildcfg module (#1222) CI: consolidate the build scripts and github actions tests workflow (#1241) Replace Toblerity/Shapely -> shapely/shapely (#1255) Linting: blacken shapely code (#1242) TST: Move + clean-up tests for the scalar geometry subclasses (#1257) Update .gitignore for Shapely->shapely repo rename Linting: update pre-commit and setup.cfg configuration + pass black / flake8 / isort (#1265) CLN: remove deprecation warning for setattr (which now raises) (#1266) BUG: Polygon constructor with multiple variable-sized holes (#1229) Fix doctests to be more robust with different versions of GEOS (#1228) TST: test that Geometry subclasses properly return NotImplemented in comparison with non-geometry (#1282) BUG: fix Polygon() constructor from a LineString (#1277) BUG: fix linestring/linearring creation (copyFromBuffer usage) to check for dimension (#1274) TST: Move + clean-up more geometry-class specific tests (#1275) TST: remove filterwarnings for numpy 1.21 (#1304) Pickling: fix SRID handling + update linearring test (#1245) PERF: speed-up bounds function with GEOSGeom_getXMin et al (GEOS >= 3.7.0) (#1299) CLN: remove no longer used custom error classes (#1306) REF: Move ShapelyError base class to C, GEOSException becomes subclass (#1314) Consolidate error class for unsupported GEOS version functionality (#1305) Remove logging related tests (#1308) Expose the geometry subclasses in the top-level namespace (#1339) Deprecate calling the BaseGeometry constructor + EmptyGeometry class (#1303) CI: update black to fix click compat issue (#1355) TST: update tests to pass with numpy 1.22 and pytest 8 (#1356) Remove deprecation warning from GeometryTypeError (#1358) API: update STRtree interface (merge shapely/pygeos features) (#1251) TST: fix tests for GEOS main (#1357) Update repr for the Geometry classes (#1302) CI: only build Travis on the main branch (not PRs) (#1382) CI: fix Travis tests (#1384) DOC: Move pygeos doc pages into shapely docs (#1377) DEPR: deprecate the GeometryType attribute (#1375) CI: fix tests with latest setuptools (#1388) CLN: remove unused vendored packaging, consolidate gitignore, remove old pygeos pyproject.toml (#1389) DOC: replace pygeos -> shapely in the reference docs (#1394) DOC: update class (init/new) docstrings of Geometry subclasses (#1395) DOC: fix / consolidate the readthedocs configuration (#1397) Update LICENSE file and copyright holder (#1403) Add aliases for project/interpolate/representative_point methods (#1340) Change simplify() preserve_topology default to True (match the Geometry method) (#1392) API: change the bounds of an empty geometry from empty tuple to tuple of NaNs (#1416) API: all-empty GeometryCollection .geoms to return empty subgeoms (#1420) API: remove __geom__, keep _geom read-only access to GEOS pointer (#1417) Consolidate setup.py metadata, remove old setup.py files (#1376) DOC: restructure the combined docs + new theme (#1402) DOC: update the installation documentation (#1396) Rename shapely.apply to shapely.transform (#1393) CLN: removed unused shape_factory + HeterogeneousGeometrySequence (#1421) CLN: remove unused _geos.pxi GEOS cython declarations (#1419) Consolidate README files (#1378) DOC: update examples in manual.rst (#1391) DOC: fix repr in example (#1452) RLS: 2.0a1 ENH: expose fixed precision overlay (grid_size keyword) in the Geometry methods as well (#1468) CLN: actually remove the base class Geometry() constructor (#1476) DOC: add changelog for Shapely 2.0 (#1442) PERF: use GEOSCoordSeq_copyToBuffer for get_coordinates (GEOS >= 3.10) (#1511) DOC/CLN: clean-up conf.py and unused docs files (#1519) ENH: expose dwithin on the Geometry subclasses (#1496) DOC/RLS: update CHANGES.txt with last releases, move old releases to docs (#1520) PERF: use GEOSGeom_getExtent for GEOS>=3.11 in bounds ufunc (#1477) API: use quad_segs for buffer keyword instead of resolution (#1512) RLS/CI: update GEOS to 3.11.0 for wheels (#1527) COMPAT: keep old exception sublcasses as (deprecated) ShapelyError aliases (#1538) COMPAT: keep (deprecated) geom_factory for downstream packages (cartopy) (#1534) Add unary_union alias for top-level union_all (#1536) CLN: remove custom minimum_rotated_rectangle implementation + add oriented_envelope alias (#1532) DOC: update offset_curve (parallel_offset) documentation regarding direction of the resulting line (#1537) DOC: autogenerate separate reference pages per function with autosummary (#1529) ENH: shapely.plotting module with basic matplotlib-based functionality (+ updated docs to use this) (#1497) DOC/CI: fix doc build for latex + re-enable epub and htmlzip (#1549) PERF: improve performance of Point(x, y) constructor (#1547) Make Geometry objects weakref-able (#1535) CI: add PyPI token for travis wheel uploads (splitted secret) (#1554) API: restore unary_union behaviour on empty list to return empty GeometryCollection (#1553) DOC: various small fixes to sphinx building (#1558) PERF: reintroduce global context and use in GeometryObject dealloc (#1530) DOC: add migration guide for PyGEOS users (#1557) Refactor reducing set operations as gufuncs + return emtpy collection for empty/all-None input (#1562) CI: update cibuildwheel version in travis as well (#1574) RLS: 2.0b1 BUG: move weakref-able implementation to base C extension type (fix PyPy) (#1577) CI: fix Travis deploy step for non-wheel jobs + try avoid building sdist (#1576) API: rename 'radius' -> 'distance' keyword in shapely.buffer() for consistency (#1589) Allow passing len-1 array for x and y coordinates in Point(..) (compat with 1.8) (#1590) RLS: 2.0b2 DOC: automatically fill in correct version (#1595) CI: ensure doctests are running (#1596) Document Point(..) constructor change no longer allowing coordinate sequence of len > 1 (#1600) ENH: allow Geometry methods to return arrays if passed array as argument (#1599) ENH: Add node function (#1431) PERF: restore speed of LineString(..) from numpy array of coordinates (#1602) ENH: expose contains_properly on the Geometry subclasses (#1605) ENH: faster contains_xy/intersects_xy predicates special casing for point coordinates (#1548) CLN: clean c code - remove unused variables / functions (#1619) CI/TST: fix tests for voronoi_diagram for latest GEOS (#1625) CI/TST: fix tests for changed numpy error from creating ragged array (#1627) ENH: convert to / construct from ragged array (flat coordinates and offset arrays) (#1559) ENH: expose flavor keyword in to_wkb (#1628) CI/RLS: set up CircleCI to build Linux aarch64 wheels (instead of Travis CI) (#1624) RLS: 2.0rc1 CI/RLS: ensure to run CircleCI on tags (#1634) TST: fix test_to_wkb_flavor to use fixed byte_order to pass on big endian machine (#1635) CI: fix circle config syntax CLN: remove shapely/examples submodule (#1645) TST: fix tests for GEOS changes in M handling (#1647) RLS: 2.0rc2 DEV: update valgrind Dockerfile (#1649) DOC: add note about prepare + touches to contains_xy/intersects_xy docstrings (#1631) RLS: 2.0rc3 TST: skip intermittent remove_repeated_points failure for GEOS 3.11 (#1656) DOC/RLS: update release notes for final 2.0.0 (#1659) RLS: 2.0.0 Keith Jenkins (1): fix typo (#1465) Kian Meng Ang (1): Fix typos (#1212) Krishna Chaitanya (5): Implement constructive.build_area (GEOS 3.8.0+) (pygeos/pygeos#141) Implement wrappers for Fréchet distance under measurement module (3.7.0+) (pygeos/pygeos#144) Fix typo in haussdorf -> hausdorff (pygeos/pygeos#151) Refactor GEOS_SINCE_3X0 to GEOS_SINCE_3_X_0 (pygeos/pygeos#152) Implement GEOSCoverageUnion_r under set_operations (GEOS 3.8.0+) (pygeos/pygeos#142) Kyle Barron (1): DOC: Fix typo in `from_ragged_array` docstring (#1658) Martin Fleischmann (3): DOC: add missing modules (pygeos/pygeos#136) ENH: add minimum_bounding_circle and minimum_bounding_radius (pygeos/pygeos#315) ENH: add oriented_envelope (minimum rotated rectangle) (pygeos/pygeos#314) Martin Lackner (1): Fix typos in docstring of pygeos.creation.box (pygeos/pygeos#191) Mike Taves (38): MAINT,DOC: change RTD channel to just 'defaults', pin versions (pygeos/pygeos#106) Rename normalize -> normalized in linear referencing functions (pygeos/pygeos#209) TST: Use native CI build tools to allow flexible GEOS versions (pygeos/pygeos#219) FIX: handle non-integer GEOS_VERSION_PATCH like 0beta1 as 0 (pygeos/pygeos#262) CI: bump versions to latest GEOS-3.9.1 (pygeos/pygeos#300) TST: ubuntu-16.04 about to reach EOL; upgrade to ubuntu-latest (pygeos/pygeos#347) TST: rename head branch for GEOS from 'master' to 'main' (pygeos/pygeos#360) Increment testing for geos, python and numpy versions for 2021 (pygeos/pygeos#409) Update README for 'main' branch (#1209) CI: upgrade GEOS versions, fix "Run doctests" (pygeos/pygeos#422) DOC: Update URLs for GEOS, PostGIS and use HTTPS for Code of Conduct TST: linearring closing logic was changed with pygeos (#1232) BUG/TST: fix hashing for Polygon + update tests (#1250) Update MANIFEST.in to enable 'python -m build` to work (#1249) Fix GitHub Actions badge svg, remove appveyor placeholder (#1273) CI: upgrade GEOS patch versions for tests and release (#1408) CI: add testing for 2022, including GEOS-3.11 (#1437) PERF: use numpy matmul for affine transformation (#1418) Require Python 3.7+, NumPy 1.14+ (#1453) CI: macOS-10.15 is deprecated, upgrade to macOS-11 (#1458) Move static metadata from setup.py to pyproject.toml, add sdist check (#1426) MAINT: remove workarounds when numpy was optional (#1461) MAINT: use modern Python coding styles (from pyupgrade) (#1462) CLN: Rename `__p__`→`_parent`, clean-up `gtag`, `_factory` and `__rings__` (#1467) DEPR: deprecate the type attribute (#1492) DOC: update migration guide to use 'packaging' instead of 'distutils' (#1502) TST: change WKT tests with inconsistent dimensionality for GEOS 3.12 (#1542) Add CITATION.cff, remove CITATION.txt (#1455) TST: clean-up test_doctests (with binascii_hex), refactor test_hash (#1586) Raise ValueError for non-finite distance to buffer/offset_curve (#1522) ENH: add `__format__` specification for geometry types (#1556) MAINT: Add Python 3.11 classifier and upgrade to GEOS 3.11.1; other CI upgrades (#1607) MAINT: remove requirements-dev.txt; expand optional dependencies in pyproject.toml (#1606) DOC: clean-up docstrings and Sphinx markup for deprecated functions (#1611) DEP: remove 'preserve_topology' from 'set_precision()' (#1612) DEP: change `almost_equals()` to be removed from version 2.0 to 2.1 (#1604) CLN/DOC: credits, miscellaneous clean-up, file modes, doc touchup (#1629) BLD: pin to Cython~=0.29, ignore .c and .pyx files in wheels (#1632) Phil Chiu (1): PERF: vectorized implementation for signed area (#1323) Tom Clancy (1): Add additional benchmarks (pygeos/pygeos#145) dependabot[bot] (4): Bump actions/setup-python from 2 to 4 (#1640) Bump actions/checkout from 2 to 3 (#1644) Bump pre-commit/action from 2.0.0 to 3.0.0 (#1642) Bump pypa/cibuildwheel from 2.10.2 to 2.11.2 (#1643) enrico ferreguti (3): TST: rewrite tests using pytest assert (#1505) Make parallel_offset an alias for offset_curve (#1510) TST: assert rewriting in tests (#1514) gpapadok (2): ENH: Add segmentize method to BaseGeometry (#1434) ENH: Add reverse method to BaseGeometry (#1457) mattijn (1): Reintroduce shared_paths (pygeos/pygeos#77) odidev (1): Add linux aarch64 wheel build support (pygeos/pygeos#386)
@brendan-ward @caspervdw I experimented today with a possible alternative approach on supporting prepared geometries, compared with #84
I added a second static attribute to the
GeometryObject
struct, which can optionally (for a prepared geom) be populated.For me, the main reason I wanted to test this approach is that this keeps the geometry and its prepared version together. This means that the user doesn't need to keep track of a separate array with the prepared versions of their data (which could then get out of sync, etc): you just have your single array of geometries, that also can be prepared.
The ufuncs that I added here are just a quick hack to get something working. But from a quick test, it seems to be working:
So that shows a POC of having the prepared geometries in the same array, and also shows the nice speed-up that you can get from that!
Some additional thoughts on this:
It gives some additional memory usage, also when not using prepared geometries (the extra static attribute). If your geometries have a certain size (eg for polygons), this will probably not be very significant, but for points it might be more substantial.
I now made a separate
intersects_prepared
version, mainly to be able to compare with the existingintersects
. In principle, this can be a single function that can handle both (it seems the extra check doesn't give much slowdown, at least on this test case)I added a ufunc to prepare the geometries (based on WIP: support for prepared geometries #84). We maybe also want to "unprepare" in case you want to get rid of some memory usage.
But in principle, the predicates that support prepared geoms could also "prepare on the fly". That might make it even easier for the user, but gives a bit less control. Most of the time you probably just want that pygeos does it under the hood for you, but there might be cases where prepared geoms are less interesting (I am not too familiar on this, we should probably try to get some good test data of various kinds for such things).
It modifies the Geometry objects inplace in some way. It doesn't change their meaning, though (it doesn't change coordinates), so I would say this doesn't violate the principle of an immutable geometry object.