Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve geos multipolygon creation performance #251

Merged
merged 3 commits into from
Apr 10, 2021

Conversation

Quiwin
Copy link
Contributor

@Quiwin Quiwin commented Mar 30, 2021

Replace a manual validity check on MultiPolygon, which was iterating on
each polygon combination, by a direct call of geos isValid.

Summary

Hello !

MultiPolygon creation was slower than expected on large MultiPolygon, it was due to an old check for validity which manually iterated on every polygon combination of the given multiPolygon.
Now directly call geos to check for validity.

Other Information

Tested with above geojson:

geojson = IO.read("france.geojson")
require 'micro_bench'
MicroBench.start;RGeo::GeoJSON.decode(geojson, geo_factory: RGeo::Cartesian.preferred_factory); puts "#{MicroBench.duration.round(2)} sec"

Before: 1.95 sec
After: 0.35 sec

I couldn't find the exact geos issue it tried to avoid
but I dont think there is much risk since the previous code written to circumvent it was more than 11 years ago c44767e

Thanks!

@Quiwin Quiwin force-pushed the improve-multipolygon-creation-performance branch from af133c0 to bbef1ec Compare March 30, 2021 13:32
@Quiwin Quiwin marked this pull request as ready for review March 30, 2021 13:45
@BuonOmo
Copy link
Member

BuonOmo commented Mar 31, 2021

Looking at history of the IsValidOp.h I don't see the point of that comment or code either.

@Quiwin knowing your sources, I guess your france.geojson polygon is this one.

I'm more in favor than not : it seems genuinely better.

@keithdoggett do you know the historical reason there, or do you think we should ping Daniel ? Also about the failing tests, we should work on that asap !

Comment on lines 92 to 95
if (collection &&
type == GEOS_MULTIPOLYGON
&& (factory_data->flags & 1) == 0
&& (GEOSisValid_r(geos_context, collection) == 0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small cosmetic change that feels clearer !

Suggested change
if (collection &&
type == GEOS_MULTIPOLYGON
&& (factory_data->flags & 1) == 0
&& (GEOSisValid_r(geos_context, collection) == 0)) {
if (collection
&& type == GEOS_MULTIPOLYGON
&& !(factory_data->flags & 1)
&& !GEOSisValid_r(geos_context, collection)) {

@BuonOmo BuonOmo removed their assignment Mar 31, 2021
@keithdoggett
Copy link
Member

Looks good! I'm not sure of the historical purpose of those extra checks. It seems that GeosIsValid supports all geometries, and this isn't used to identify a specific issue with validity since it doesn't do anything differently based on which check it fails.

@BuonOmo are all of the checks failing because of a floating-point comparison issue? I think I have a fix for that in the v3-dev branch we can probably cherry-pick.

@keithdoggett keithdoggett assigned BuonOmo and unassigned keithdoggett Apr 5, 2021
@BuonOmo
Copy link
Member

BuonOmo commented Apr 6, 2021

As discussed over the phone with @keithdoggett, we should merge this one into the v3-dev to avoid having to backport of all @keithdoggett's work on floating point issues. @Quiwin could you rebase on this branch and we'll merge it there.

That will be one more great reason for using rgeo 3.0 😄

And could you also add a note to the changelog ? (History.md file)

@BuonOmo BuonOmo assigned Quiwin and unassigned BuonOmo Apr 6, 2021
@Quiwin Quiwin force-pushed the improve-multipolygon-creation-performance branch from f1482c1 to d330b27 Compare April 7, 2021 15:45
@Quiwin Quiwin changed the base branch from master to v3-dev April 7, 2021 15:46
@Quiwin Quiwin force-pushed the improve-multipolygon-creation-performance branch from d330b27 to f1482c1 Compare April 7, 2021 16:05
@Quiwin Quiwin changed the base branch from v3-dev to master April 7, 2021 16:05
@Quiwin Quiwin force-pushed the improve-multipolygon-creation-performance branch from f1482c1 to 338331b Compare April 8, 2021 15:41
@Quiwin Quiwin changed the base branch from master to v3-dev April 8, 2021 15:42
@Quiwin
Copy link
Contributor Author

Quiwin commented Apr 8, 2021

As discussed over the phone with @keithdoggett, we should merge this one into the v3-dev to avoid having to backport of all @keithdoggett's work on floating point issues. @Quiwin could you rebase on this branch and we'll merge it there.

That will be one more great reason for using rgeo 3.0 😄

And could you also add a note to the changelog ? (History.md file)

@BuonOmo Done :)

@Quiwin Quiwin removed their assignment Apr 8, 2021
Replace a manual validity check on MultiPolygon, which was iterating on
each polygon combination, by a direct call of geos isValid.
@BuonOmo BuonOmo force-pushed the improve-multipolygon-creation-performance branch from 338331b to 77e543c Compare April 10, 2021 12:01
@BuonOmo BuonOmo force-pushed the improve-multipolygon-creation-performance branch from 77e543c to 61f0ec3 Compare April 10, 2021 12:04
@BuonOmo BuonOmo merged commit 322eceb into rgeo:v3-dev Apr 10, 2021
@BuonOmo BuonOmo deleted the improve-multipolygon-creation-performance branch April 10, 2021 12:09
BuonOmo pushed a commit that referenced this pull request Apr 16, 2021
Replace a manual validity check on MultiPolygon, which was iterating on
each polygon combination, by a direct call of GEOSisValid.
BuonOmo pushed a commit that referenced this pull request Jan 13, 2022
Replace a manual validity check on MultiPolygon, which was iterating on
each polygon combination, by a direct call of GEOSisValid.
BuonOmo pushed a commit that referenced this pull request Jan 13, 2022
Replace a manual validity check on MultiPolygon, which was iterating on
each polygon combination, by a direct call of GEOSisValid.
BuonOmo pushed a commit that referenced this pull request Mar 17, 2022
Replace a manual validity check on MultiPolygon, which was iterating on
each polygon combination, by a direct call of GEOSisValid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants