Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ST_AsMVTGeom: Fix invalid geometries when clipping #382

Closed
wants to merge 17 commits into from

Conversation

@Algunenano
Copy link
Member

commented Mar 12, 2019

Fix for https://trac.osgeo.org/postgis/ticket/4348

This PR does several things, some of which I'm only doing because the alternative is leaving wholes in map and apparently people don't like that:

The main change is that now, when an invalid clipping is detected (inverted as described in https://trac.osgeo.org/geos/ticket/959), we autofix the geometry and try again before discarding it. This is done because GEOSClipByRect gives an invalid output when passing an invalid geometry (I didn't see a way to just fix what we had broken).

When working on this, I saw that the method I was using to detect this was pretty limited, as It checked the bounding box of the whole geometry so in the case of multipolygons it could fail to detect this issue. To workaround this, I decided to clip now per subgeometry (polygons) instead of the whole geometry.

Another important change is the introduction of a snap to grid function specific to MVT polygons, which not only snaps to integer values using a simpler function but also removes spikes from the input. This is done in order to minimize the issues when clipping and ease the calls to make_valid.

A final change is that now, instead of calling make_valid and after calling snap to grid again to make sure that the output coordinates are ints, we do:

  • [1] Snap to grid after clipping.
  • [2] Check if it's valid. If not valid, go to 3. If valid go to 4.
  • [3] Make valid and grid to int again. Go back to 2 to check if it's valid.
  • [4] If it input isn't valid (too many iterations, 3), discard the geometry.

Perf diff: mvt_geos_trunk_vs_mvt_geos_fix.pdf

Perf report:

  • Points and lines are pretty much the same.
  • Multipolygons are now faster (1.2x as fast). I'm not sure why, the same optimization I introduced here in postgis (check bounding box per subpolygon) is present in Geos. Could be the cost of the transformation to geos and back.
  • There is a big impact in the Canada tiles with low zoom levels (0, 3), where the new path is 0.5x as fast as the old one.
    -- Canada [0,0,0]: Checking the query we see that previous to the fix 5/13 output polygons were invalid. After the fix 13 / 13 are valid.
    -- Canada [3,1,1]: Before the fix 2/3 output geometries were invalid. After the fix 3/3 are valid.
    So taking this performance hit is necessary to have valid geometries with GEOS.

Keep in mind that this only affects the GEOS path, Wagyu was already working correctly (and faster).

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 12, 2019

See also comments on https://trac.osgeo.org/geos/ticket/959

GEOS MakeValid is going to be veeery slow due to all the overlay ops it runs. It's probably better to look for a simpler/lower-level way to clip to rectangle for MVT use.

@Algunenano

This comment has been minimized.

Copy link
Member Author

commented Mar 13, 2019

Ok, so any replacement/improvement to GEOS-based MVT clipping has to produce valid integer geoms.

The ideal situation, is it would accept invalid geometries, clip them and return a valid geometry using integer precision. And don't forget the extra unicorns 😺

GEOS MakeValid is going to be veeery slow due to all the overlay ops it runs. It's probably better to look for a simpler/lower-level way to clip to rectangle for MVT use.

It would have to be exposed by MakeValid?, however. And MakeValid? is doing a LOT of complex processing, so not sure that's even feasible. Also, for the kind of invalidity produced by brute-force clipping to a rectangle (as in quick_clip), MakeValid? is probably overkill.

I'm certainly not going to complain if I can get a better Geos based alternative for MVTs in 3.0; just keep in mind that I've already introduced wagyu for that, which does the quick (and invalid clipping) and then a robust validation with int coordinates (which also turns out to be faster than Postgis' MakeValid but I haven't tested performance for Geos implementation yet).

It might be possible to just node the clipped linework (using int precision), and then polygonize. I will try and experiment with this. If that works, it would provide a fairly simple GEOS-based rectangle clipper for MVT use.

I won't stop you, but to be honest I'm on this mess because I want to fix for Postgis 2.4 and 2.5, where we won't apply improvements to Geos.. If this was 3.0 alone I'd be telling people to use wagyu instead, but again having an better alternative based purely on GEOS would be nice.

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 13, 2019

It might be possible to just node the clipped linework (using int precision), and then polygonize. I will try and experiment with this. If that works, it would provide a fairly simple GEOS-based rectangle clipper for MVT use.

I won't stop you, but to be honest I'm on this mess because I want to fix for Postgis 2.4 and 2.5, where we won't apply improvements to Geos.. If this was 3.0 alone I'd be telling people to use wagyu instead, but again having an better alternative based purely on GEOS would be nice.

This probably won't require any changes to GEOS, just a different way of calling the API it provides.

And agreed that having a GEOS alternative would be nice, in case something goes sideways in wagyu-land.

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2019

I have committed a prototype JTS RectangleClipPolygon implementation.

This uses Sutherland-Hodgson polygon clipping (same as in wagyu). This is faster, more robust, and less sensitive to invalid input than using Geometry intersection. It also supports rounding to a precision model.

It uses the buffer(0) trick to create valid topology for the output polygon. This is a bit of a hack, but as long as the input isn't too invalid in the clipping area it should work well and be reasonably fast.

The implementation should be portable into C/C++ using some calls to GEOS. So this might provide an alternative to using wagyu. It would be interesting to see how the performance compares.

(Note there's a few TODOs in the JTS code, but the general approach works and should be easy to extend).

@pramsey

This comment has been minimized.

Copy link
Member

commented Mar 14, 2019

There's already a slot for it in the CAPI, so that can just be backfilled with the new implementation... https://github.com/libgeos/geos/blob/master/capi/geos_c.h.in#L586

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2019

There's already a slot for it in the CAPI, so that can just be backfilled with the new implementation... https://github.com/libgeos/geos/blob/master/capi/geos_c.h.in#L586

Just to be clear, the JTS class currently only handle polygons. And because of the way Sutherland-Hodgson works, probably should be limited to this. But it will be easy to handle line and point clipping, and then bundle everything up in top-level class to clip arbitrary geometry. Then this should be a full replacement for GEOSClipByRect (need to confirm performance and unit tests, of course).

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2019

Further to the idea of replacing GEOSClipByRect, that method has some functionality that might be hard to replace (or decide to abandon). Such as this test, which seems to return a polygon hole in the clip area as a polygon. These tests may be in PostGIS too?

Perhaps @strk should be involved to speak to the functionality.

An easier short-term plan would be to use the JTS approach in the MVT module only.

@Algunenano

This comment has been minimized.

Copy link
Member Author

commented Mar 18, 2019

Further to the idea of replacing GEOSClipByRect, that method has some functionality that might be hard to replace (or decide to abandon). Such as this test, which seems to return a polygon hole in the clip area as a polygon. These tests may be in PostGIS too?

I'm not sure why this functionality was introduced, and it's similar in Postgis' makeValid (a zero width polygon will become a line for example), but I had to actively workaround this kind behaviour. MVT doesn't support geometry collections and switching geometry types is a discouraged/unsupported in the renderers. A polygon should always stay a (multi-)polygon or disappear; and the same is applicable to inner rings / holes.

@@ -983,13 +1130,13 @@ LWGEOM *mvt_geom(LWGEOM *lwgeom, const GBOX *gbox, uint32_t extent, uint32_t buf

resx = width / extent;
resy = height / extent;
res = (resx < resy ? resx : resy)/2;
res = (resx < resy ? resx : resy) / 3;

This comment has been minimized.

Copy link
@Komzpa

Komzpa Mar 19, 2019

Member

why 3?

This comment has been minimized.

Copy link
@Algunenano

Algunenano Mar 19, 2019

Author Member

I was doing some testing around invalidity. It seems that I have to stop using GEOSClipByRect altogether because of how many things are broken with it and it's impossible to detect them all. I'll try switching to GEOSIntersection + handling invalid.

This comment has been minimized.

Copy link
@Komzpa

Komzpa Mar 19, 2019

Member

I am glad you see now why I initially replaced the ClipByBox2D with Intersection globally :)

This comment has been minimized.

Copy link
@Algunenano

Algunenano Mar 19, 2019

Author Member

Yup, I totally get it now.

This comment has been minimized.

Copy link
@dr-jts

dr-jts Mar 19, 2019

Contributor

Will using GEOSIntersection cause problems because it requires valid input? If you have to run MakeValid on the input that will slow things down.

This comment has been minimized.

Copy link
@Algunenano

Algunenano Mar 19, 2019

Author Member

Will using GEOSIntersection cause problems because it requires valid input?

Yes, what I'm doing now in this PR is to use GEOSIntersection and, if I get an error, run MakeValid in the input and retry. It's far from optimal (GEOSIntersection is already slower that GEOSClipByRect) but at least I get correct results for MVT.

This comment has been minimized.

Copy link
@Komzpa

Komzpa Mar 19, 2019

Member

@dr-jts it causes problems as it is more-than-linear complexity. @Algunenano performed benchmarks and it is much slower on box clipping on larger polygons. Would be perfect to just make it faster in newer GEOS.

This comment has been minimized.

Copy link
@Algunenano

Algunenano Mar 19, 2019

Author Member

The only nice part that has come up out of this, is that since now I'm clipping per polygon instead of full multipolygons some I'm seeing performance improvements in the tests where most geometries in the multipolygons were outside. OTOH, I get a 100% slower timings on other tests because of how slow it is validating + instersecting.

This comment has been minimized.

Copy link
@dr-jts

dr-jts Mar 19, 2019

Contributor

@Komzpa Yes, would be very good to improve GEOSClipByRect using the quick-clip (Sutherland-Hodgson) algorithm as in new JTS code. Will need to add a capability to clip lines and points, but that's easy compared to polygons. And will have to decide what (if any) of the original semantics to keep (since the current GEOS clipping code seems to have some functionality which may just be side-effects and not used - e.g. converting clipped holes into lines).

It would be possible to port the code now, but only by using the buffer(0) hack to fix topology. And it isn't 100% reliable. So probably the changeover should wait until there is a proper topology building pipeline - which will come soon I hope.

lwgeom = mvt_clip_and_validate(lwgeom, basic_type, extent, buffer, clip_geom);
if (lwgeom == NULL || lwgeom_is_empty(lwgeom))
clipped = mvt_clip_and_validate(lwgeom, basic_type, extent, buffer, clip_geom);
if (clipped == NULL || lwgeom_is_empty(clipped))

This comment has been minimized.

Copy link
@Komzpa

Komzpa Mar 19, 2019

Member

how about

Suggested change
if (clipped == NULL || lwgeom_is_empty(clipped))
if (!clipped || lwgeom_is_empty(clipped))
@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

For the record, my current hypothesis is that the fastest MVT algorithm will be:

  1. Clip polygon to rectangle (using port of JTS code).
    • The topology-fixing step is omitted since this will be done later
    • Does not require valid input
  2. Transform to tile space
  3. Snap-round to integer coordinates
    • this will fix invalidity introduced in step 1

This isn't all available yet - I will be working on step 3 in the coming months.

@Algunenano

This comment has been minimized.

Copy link
Member Author

commented Mar 19, 2019

For the record, my current hypothesis is that the fastest MVT algorithm will be:

Sounds fine to me. One thing to note is that I switched from clipping and then transforming to the other way around to remove hacks around the bounding box calculation; I had to add an extra 0.499... space around the box since those points would fall into the grid after transforming them.

@Komzpa

This comment has been minimized.

Copy link
Member

commented Mar 19, 2019

for lines and points, we can go with lwgeom_clip_to_ordinate_range twice (for x and y) - it's lwgeom-only and supposedly faster than geos due to no conversion back and forth.

@Algunenano

This comment has been minimized.

Copy link
Member Author

commented Mar 19, 2019

for lines and points, we can go with lwgeom_clip_to_ordinate_range twice (for x and y) - it's lwgeom-only and supposedly faster than geos due to no conversion back and forth.

Thanks for the idea. I'll note it down and test it once I merged this.

I have another improvement pending for 3.0 which is to not simplify/remove repeated points when most of the geometry is outside of the clipping area. I've seen 4x improvements by removing simplification but I need to find a good enough heuristic to decide when it's worth it.

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

@Algunenano Do you really need to simplify at all, if there was an efficient clipping algorithm? Or does the combination of clipping off some points and then rounding the rest (with repeated point removal) suffice? In other words, is there any need to simplify linework inside the rectangle?

@Algunenano

This comment has been minimized.

Copy link
Member Author

commented Mar 19, 2019

@Algunenano Do you really need to simplify at all, if there was an efficient clipping algorithm? Or does the combination of clipping off some points and then rounding the rest (with repeated point removal) suffice? In other words, is there any need to simplify linework inside the rectangle?

No, we simplify because it's faster to presimplify the geometries before the rest of the process, but it might be just because the simplification functions are more optimized (and scale linearly). I found out yesterday that there are some scenarios (big geometry with most points outside the clipping area) where simplificating the input was slower than not doing it, so I'll revisit that asumption soon with more tests.

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

we simplify because it's faster to presimplify the geometries before the rest of the process, but it might be just because the simplification functions are more optimized (and scale linearly).

That makes sense. Simplification is actually slighly slower-than-linear, however. So if clipping can be made faster then it's probably better to not simplify.

there are some scenarios (big geometry with most points outside the clipping area) where simplificating the input was slower than not doing it, so I'll revisit that asumption soon with more tests.

That's interesting. Which clipping algorithm is this using? I'd be pleasantly surprised if GEOSIntersection was faster than simpliflication, even in this case (since GEOSIntersection does some heavy processing on the entire geometry, even the parts which will be removed by clipping). That should be an advantage of the S-H clipping - it quickly removes vertices outside the clip area, without doing any further processing on them.

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

I'd be pleasantly surprised if GEOSIntersection was faster than simpliflication, even in this case (since GEOSIntersection does some heavy processing on the entire geometry, even the parts which will be removed by clipping). That should be an advantage of the S-H clipping - it quickly removes vertices outside the clip area, without doing any further processing on them.

That said, I'm find that in JTS the S-H clipping algorithm usually is not much faster than JTS intersection. This is puzzling.. have to investigate further.

@Algunenano

This comment has been minimized.

Copy link
Member Author

commented Mar 19, 2019

That's interesting. Which clipping algorithm is this using? I'd be pleasantly surprised if GEOSIntersection was faster than simpliflication, even in this case (since GEOSIntersection does some heavy processing on the entire geometry, even the parts which will be removed by clipping). That should be an advantage of the S-H clipping - it quickly removes vertices outside the clip area, without doing any further processing on them.

I tested this with both geos GEOSClipByRect and wagyu and, in that test case, both were faster when the simplification was off.

The test case was a big polygon (1.2m vertex) were the tile was much smaller. Since the simplification is iterating through the whole geometry without removing that many points (they are further away than the tolerance distance), the time spent in the simplification isn't worth it; that is, it's not that clipping is faster, but that the time spent presimplificating the geometry is bigger than the time saved in the clipping process by removing extra points.

Theoretically, if the other operations (snapping to grid and clipping) had a similar performance as simplifying, simplification wouldn't be a net performance improvement and could be removed*.

  • Some functionality would need to be kept for points and lines to remove duplicates, since we don't do a post-validation of those.

@strk strk closed this in b0d9d12 Mar 21, 2019

@dr-jts

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2019

I'm find that in JTS the S-H clipping algorithm usually is not much faster than JTS intersection. This is puzzling.. have to investigate further.

Further testing has shown that in JTS SH clipping is MUCH faster than using intersection() (on the order of 40-100 time faster in some cases). This makes sense - the clipping algorithm is simplere and doing much less work. So this is a direction worth pursuing (especially given the better robustness).

I did note that when using full JTS intersection it's substantially faster to test intersects() first, and only compute intersection() if the geometries do in fact intersect. One reason for this is that rectangle intersects is optimized using special code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.