Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove geometry autofix behavior, for now #268

Closed

Conversation

dbaston
Copy link
Member

@dbaston dbaston commented Jul 10, 2018

This PR reverts the geometry autofixing behavior that was added for 2.5 to avoid conflict with a comprehensive, documented approach in 3.0.

Copy link
Member

@Komzpa Komzpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixing behavior was initially implemented as part of fixing the box clipping issue and was extended to cover some more.

I propose to add a manual entry on "wrap critical geometry processing in ST_IsValid pre-check" to provide users with a rock solid cross-version stable way of opt out instead, since behavior is undefined anyway.

NOTICE: Your geometry dataset is not valid per OGC Specification. Please fix it with manual review of entries that are not ST_IsValid(geom). Retrying GEOS operation with ST_MakeValid of your input.
NOTICE: Self-intersection
5|MULTIPOLYGON(((2 2,5 5,8 2,2 2)),((5 5,2 8,8 8,5 5)))
ERROR: lwgeom_intersection: GEOS Error: TopologyException: Input geom 0 is invalid: Self-intersection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks old guarantees by ST_ClipByBox2D:

Topologically invalid input geometries do not result in exceptions being thrown.

It can also possibly break MVT generation as it's using clip after non-topology simplify in place:

lwgeom = lwgeom_clip_by_rect(lwgeom, x0, y0, x1, y1);

@Algunenano thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, those guarantees were broken by earlier and unrelated work on ST_Subdivide

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work on ST_Subdivide was a bugfix of box clipping that shot down our processing, and was the reason for the need to MakeValid things in lwgeom_intersection :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed the notification, I'll have a look probably on Monday.
TBH, the more I use MVTs the more I wish it would only do coordinate transformations (or all the operations were optional and configurable).

NOTICE: Self-intersection
NOTICE: Your geometry dataset is not valid per OGC Specification. Please fix it with manual review of entries that are not ST_IsValid(geom). Retrying GEOS operation with ST_MakeValid of your input.
NOTICE: Self-intersection
#4037.4|POLYGON((0 0,2 2,2 8,0 10,10 10,8 8,8 2,10 0,0 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disable of this behavior needs NEWS entry and ticket reopen at very least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated NEWS.

@@ -781,24 +745,6 @@ lwgeom_symdifference(const LWGEOM* geom1, const LWGEOM* geom2)

g3 = GEOSSymDifference(g1, g2);

if (!g3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is noding considered OK in offsetcurve or you forgot about it? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're referring to line ~1488? I don't think there is a possibility for the noding to cause incorrect results. Is there?

NOTICE: Your geometry dataset is not valid per OGC Specification. Please fix it with manual review of entries that are not ST_IsValid(geom). Retrying GEOS operation with ST_MakeValid of your input.
NOTICE: Self-intersection
#4103|t
ERROR: lwgeom_pointonsurface: GEOS Error: TopologyException: Input geom 1 is invalid: Self-intersection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another ticket reopen, NEWS update?

NOTICE: Your geometry dataset is not valid per OGC Specification. Please fix it with manual review of entries that are not ST_IsValid(geom). Retrying GEOS operation with ST_MakeValid of your input.
NOTICE: Self-intersection
5|MULTIPOLYGON(((2 2,5 5,8 2,2 2)),((5 5,2 8,8 8,5 5)))
ERROR: lwgeom_intersection: GEOS Error: TopologyException: Input geom 0 is invalid: Self-intersection
6|MULTIPOLYGON(((2.5 2,5 4,5 5,10 5,10 2,2.5 2)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a test clipping an invalid polygon and there was no change due to the revert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary because ST_ClipByBox2D now delegates to ST_Intersection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, input is invalid, and ST_Intersection does not barf on it (and produces a counter-intuitive geometry).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm having trouble understanding the changes you're requesting.

@pramsey
Copy link
Member

pramsey commented Jul 10, 2018

Without looking at the code (I know! I suck!) but just at the test cases exposed here, can we catch the exceptions and run the isvalidreason on them for a nicer output, ala the current approach, but without automatically fixing things?

@pramsey
Copy link
Member

pramsey commented Jul 10, 2018

Actually, from a "user practicality" point of view, one of the issues the old "toss exception" behaviour has is that the user cannot easily figure out precisely what features are at issue. I guess maybe the isvalid reason w/ coordinates would be a quick'n'dirty path to find the approximate areas of concern.

@dbaston
Copy link
Member Author

dbaston commented Jul 10, 2018

can we catch the exceptions and run the isvalidreason on them for a nicer output

I did this initially, but we end up with redundant error messages, unless we decide to swallow the original error message from GEOS. After a failure, GEOS is already checking the validity of its inputs. Maybe it should be modified to include the coordinate in the exception message?

@dbaston
Copy link
Member Author

dbaston commented Jul 10, 2018

Maybe it should be modified to include the coordinate in the exception message?

It already does this, duh. We just don't include it in the _expected files (as we shouldn't)


<para>Performed by the GEOS module.</para>
<note><para>Requires GEOS 3.5.0+</para></note>

<para>Availability: 2.2.0 - requires GEOS &gt;= 3.5.0.</para>
<para>Changed: 2.5.0 - wrapper around ST_Intersection to work around GEOS bugs. </para>
<para>Changed: 2.5.0 - wrapper around ST_Intersection to work around GEOS bugs.
No longer supports invalid input geometry.</para>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks MVT output in cases simplified but not yet clipped geometry is invalid.

Copy link
Member Author

@dbaston dbaston Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess MVT code could be modified to check validity on intersection failure, and then run makevalid if needed? Test case would be great if you have one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently I've been comparing the outputs of using St_AsMVT and Mapnik (node-mapnik + mapnik-vector-tile) with the same geometries as input. Here is an image with some of the tests I've run, or the WIP PR with the geometries tested.

As you can see there, Mapnik drops the geometry when in finds a self-intersection, so the equivalent here would be ST_AsMVTGeom returning NULL. Now, for reasons I don't want to discuss right now (but probably will in the future) it was decided that our implementation was going to autofix geometries and that's what it has been doing up until now.

Take this query as an example:

SELECT 'PG43 - ON ', ST_AsText(ST_AsMVTGeom(
	ST_GeomFromText('POLYGON((-1 -1, 101 101, -1 101, 101 -1, -1 -1))'),
	ST_MakeBox2D(ST_Point(0, 0), ST_Point(100, 100)),
	10, 0, true));

With the current trunk implementation you get some NOTICE messages and the autofixed geometry:

NOTICE:  lwgeom_intersection: GEOS Error: TopologyException: Input geom 0 is invalid: Self-intersection at or near point 50 50 at 50 50
NOTICE:  Self-intersection at or near point 50 50
NOTICE:  Your geometry dataset is not valid per OGC Specification. Please fix it with manual review of entries that are not ST_IsValid(geom). Retrying GEOS operation with ST_MakeValid of your input.
NOTICE:  Self-intersection at or near point 50 50
  ?column?  |                         st_astext                          
------------+------------------------------------------------------------
 PG43 - ON  | MULTIPOLYGON(((0 10,5 5,10 10,0 10)),((5 5,0 0,10 0,5 5)))

With the changes proposed in this PR you get an error and the query is aborted:

ERROR:  lwgeom_intersection: GEOS Error: TopologyException: Input geom 0 is invalid: Self-intersection at or near point 50 50 at 50 50
template_postgis=#

Mainly for performance optimization, I would like St_AsMVTGeom to be more customizable, which in this case it'd mean that you could either choose to autofix the geometry or drop it ASAP, but I won't push for that in 2.5. For 2.5, I can live with either ST_AsMVTGeom returning NULL (better performance and conformance with Mapnik's implementation) or with it autofixing the geometry (current implementation), but I certainly cannot live with it throwing an exception and killing the query.

I guess MVT code could be modified to check validity on intersection failure, and then run makevalid if needed? Test case would be great if you have one.

I'm not very familiar with the geos code, but since you are calling GEOS_FREE_AND_FAIL which ends up calling lwerror, the mvt function doesn't regain control of the program so you can't check for an error and run lwgeom_make_valid. The only option would be to run it before clipping but I'm not sure how big of a performance hit that would mean for valid geometries. I'll try to have a look and make some perf tests.

Some extra notes:

  • In the process of looking into this issue I've noticed that in some corner cases we could avoid the clipping if we recalculate the bbox of the geometry after the simplification steps, but that recalculation isn't worth it in the general case. It's true that forcing validation before clipping would regenerate it, so there is that.,

  • I've also noticed that WIP: MVT_geom: Clip using tile units #267 can be used too for buffer == 0. I'll add a bunch of tests to trunk but that doesn't affect this issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Algunenano , thank you for the analysis. I see two options here:

  • You're right that we can't call lwgeom_intersection in the MVT code, but we could call GEOSIntersection directly and handle the error ourselves (drop or autofix, I have no opinion).
  • Alternatively, we could make lwgeom_clip_by_rect again call GEOSClipByRect (so it would no longer fail on invalid geometry), and then modify ST_Subdivide to call lwgeom_intersection instead of lwgeom_clip_by_rect. @Komzpa if I remember correctly, your changes to lwgeom_clip_by_rect were motivated by a ST_Subdivide issue, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've gone ahead and test the changes forcing validation:

		if (!gbox_contains_2d(&bgbox, lwgeom_gbox))
		{
			if (lwgeom->type == POLYGONTYPE ||
				lwgeom->type == MULTIPOLYGONTYPE ||
				lwgeom->type == COLLECTIONTYPE)
			{
				// TODO: Handle memory and errors properly
				lwgeom = lwgeom_make_valid(lwgeom);
			}
			double x0 = bgbox.xmin;
			double y0 = bgbox.ymin;
			double x1 = bgbox.xmax;
			double y1 = bgbox.ymax;
			lwgeom = lwgeom_clip_by_rect(lwgeom, x0, y0, x1, y1);
			POSTGIS_DEBUG(3, "mvt_geom: no geometry after clip");
			if (lwgeom == NULL || lwgeom_is_empty(lwgeom))
				return NULL;
		}

I don't have a dataset with invalid geometries to test, but TBH it's not the test case I'd want to optimize for; so I have compare the performance using a dataset of valid multipolygons:
mvt_pg_trunk_vs_mvt_pg_geos.pdf

In the first tests there is too big of a variance to extract any useful information based on the number of runs done, but from there on the version without the autofix looks consistently 5-10% faster. Based on that, I'm fine for now with adding the lwgeom_make_valid call and removing the autofix behavior. It needs some work as there are now 2 calls to lwgeom_make_valid which creates a copy of the geometry and I prefer cleaning it instead of leaving it to Postgresql memory management, but it should be easy to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another option: move autofix to validate the input of lwgeom_clip_by_rect and makevalid it if needed, even without the message. This way we don't get a known endless loop into ST_AsMVT/ST_ClipByBox2D and still get new ST_Subdivide.

Copy link
Member Author

@dbaston dbaston Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Komzpa Don't we avoid an endless loop and still incorporate your ST_Subdivide algorithm changes with my proposal? We also avoid a 2.4 -> 2.5 change in the semantics of ST_ClipByBox2D, and a performance regression in MVT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endless loop is in GEOSClipByRect, stacktrace is in #202 (comment) - endless loop is in principle reachable from ST_ClipByBox2D or ST_AsMVT, just needs some patience in building a failing case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's pause discussion until we have a test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this PR (plus tweaks in MVT to avoid exceptions) improves performance, but I'm not sure why. Also, the performance of GEOSClipByRect vs lwgeom_intersection seems equivalent but GEOSClipByRect has a known bug, so I'd rather have make_valid (either in MVT or at the start of lwgeom_clip_by_rect if you want to keep the old guarantees and avoid multiple transformations).

@dbaston
Copy link
Member Author

dbaston commented Jul 25, 2018

Objections to merging this PR as-is?

@pramsey
Copy link
Member

pramsey commented Jul 25, 2018

None from me. I would like to leave behaviour the "old way" and address this problem globally in Boston.

@Algunenano
Copy link
Member

Objections to merging this PR as-is?

Ok for me, I agree this should be discussed so leaving the behaviour as it was is the way to go for now. As mentioned before, this will break MVTs (I've already added a test to check it, so expect broken builds) so, when you merge this, please open a issue and I'll have a look to either call make_valid before lwgeom_clip_by_rect or implement a handle to catch this error and retry with validation (whatever ends up being faster in normal path).

@Komzpa
Copy link
Member

Komzpa commented Jul 25, 2018

@Algunenano also, if ST_MakeValid is added inside lwgeom_clip_by_rect at the start, that will keep "works on invalid" semantics for it.

@Algunenano
Copy link
Member

@Algunenano also, if ST_MakeValid is added inside lwgeom_clip_by_rect at the start, that will keep "works on invalid" semantics for it.

Yes, but I was getting ahead of myself and thinking about avoiding the double conversion to geos (one in lwgeom_make_valid and another in lwgeom_intersection), but I'll do that in a different PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants