Improve Geometry:::Union() performance using clustering #692

dbaston · 2022-09-27T03:33:14Z

This PR implements a pre-clustering step for Geometry::Union() to divide inputs into disjoint sets before performing the union operation. Performance testing shows a large benefit in datasets with many disjoint polygons and no significant penalty in datasets with no disjoint polygons.

I tried using envelope intersection, geometry intersection, and geometry distance as the basis for the clustering. The testing results show that geometry distance is preferred, since its performance is essentially equivalent to geometry intersection and is safe in the case where snap-rounding causes connected outputs from disjoint inputs. Timings are as follows, using geosop union with 5 repeats (-r 5) for all except the most complex dataset where only one iteration was performed.

Dataset	Iterations	Geometry Clusters	`main` [s]	Geom isect [s]	Speedup	Geom dist [s]	Speedup	Env isect [s]	Speedup	Env Clusters
world	5	8226	12.496	1.62	87.0%	1.86	85.1%	7.42	40.6%	2549
watersheds	5	1	11.273	11.1	1.5%	10.94	3.0%	11.33	-0.5%	1
watersheds_buf	5	1	15.414	15.87	-3.0%	15.83	-2.7%	15.18	1.5%	1
counties	5	98	107.776	105.7	1.9%	107.4	0.3%	102.065	5.3%	66
subdiv	1	175694	217	44.6	79.4%	53	75.6%	196	9.7%	7278
voronoi	50	1	2.5			2.8	-10.8%
vt parcels	1	1	90			110	-23.6%	94	-5.6%	1

The datasets are available in https://drive.google.com/drive/folders/1YNDsce_YiewgOiafPeJJOF4_AXlenX1C?usp=sharing

pramsey · 2022-09-27T15:12:28Z

The thing I don't understand here is, isn't CascadedUnion already effectively doing this? The tree puts near-to-each-other things early in the union order. Is this on top of, or in replacement of, cascadedunion?

dbaston · 2022-09-27T16:22:48Z

This is on top of cascaded union. This prevents the pieces that don't intersect from going into the overlay engine together.

pramsey · 2022-09-27T16:25:08Z

Counter-intuitive. I'd have thought, working bottom-up, the disjoint parts would mostly come together near the end and get quickly no-opped with an envelope intersection test.

dbaston · 2022-09-27T16:29:26Z

The first and last columns ("geometry clusters" and "envelope clusters") give some idea of this. Canadian subdivisions (last row) form 175k geometry clusters but only 7k envelope clusters.

dbaston · 2022-09-28T20:50:25Z

Maybe it's possible to do a reasonable algorithm selection based on average number of vertices/polygon. Above a certain number of vertices/polygon the union operation can be expected to be expensive enough that the pre-clustering isn't going to slow things down too much even if it accomplishes nothing. Here are the vertex count distributions for the current test datasets. The datasets that see meaningful slowdown (voronoi, parcels) have ~2 orders of magnitude fewer vertices than the others.

dbaston · 2023-02-21T00:11:06Z

Updated to revert changes to the default union strategy. Instead, GEOSDisjointSubsetUnion is exposed to the C API so clients can use it if they like.

… prepared

dbaston added the Performance label Sep 27, 2022

dbaston mentioned this pull request Sep 27, 2022

Optimize prepared polygon distance #693

Merged

dbaston force-pushed the disjoint-union branch from 4360e90 to db7f0ce Compare September 27, 2022 19:55

libgeos deleted a comment from dr-jts Sep 27, 2022

dbaston mentioned this pull request Jan 28, 2023

DisjointOperation.h: no viable conversion for move #812

Closed

dbaston force-pushed the disjoint-union branch from db7f0ce to 7a7763f Compare February 21, 2023 00:09

dbaston added 3 commits February 28, 2023 19:27

Relax equality test in GEOSUnaryUnionTest

13df463

Add DisjointSubsetUnion

e07ee41

AbstractClusterFinder: Ensure that greatest complexity geometries are…

09d4f2f

… prepared

dbaston force-pushed the disjoint-union branch from 7a7763f to 09d4f2f Compare March 1, 2023 00:29

dbaston closed this Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Geometry:::Union() performance using clustering #692

Improve Geometry:::Union() performance using clustering #692

dbaston commented Sep 27, 2022 •

edited

pramsey commented Sep 27, 2022

dbaston commented Sep 27, 2022

pramsey commented Sep 27, 2022

dbaston commented Sep 27, 2022

dbaston commented Sep 28, 2022 •

edited

dbaston commented Feb 21, 2023

Improve Geometry:::Union() performance using clustering #692

Improve Geometry:::Union() performance using clustering #692

Conversation

dbaston commented Sep 27, 2022 • edited

pramsey commented Sep 27, 2022

dbaston commented Sep 27, 2022

pramsey commented Sep 27, 2022

dbaston commented Sep 27, 2022

dbaston commented Sep 28, 2022 • edited

dbaston commented Feb 21, 2023

dbaston commented Sep 27, 2022 •

edited

dbaston commented Sep 28, 2022 •

edited