Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid union result from valid polygon inputs #107

Closed
strk opened this issue Jul 28, 2017 · 43 comments
Closed

Invalid union result from valid polygon inputs #107

strk opened this issue Jul 28, 2017 · 43 comments

Comments

@strk
Copy link

strk commented Jul 28, 2017

I found a case in which the union of two valid small polygons result in an invalid output.
NOTE I had to tweak the JTS testrunner slightly to get the HEXWKB printed in order to check for validity, I might send a PR to help with this in the future.

The XML test file can be found on https://trac.osgeo.org/geos/ticket/838 (WARNING: the test expects the invalid result, so you'll need more care to check the bug)

@mukoki
Copy link
Contributor

mukoki commented Aug 5, 2017

Hi strk,

In ticket https://trac.osgeo.org/geos/ticket/838, you mention that the test has failed since
8e6abed

Did you notice that your test worked before this commit ? It seems to me that the union of your polygons already produces an invalid polygon with JTS 1.14 (just tested from PostGIS/OpenJUMP).
More over, the commit should just be about z/m flag interpretation in jts

Your polygons have points located at just 1 ulp from each other, which is the main cause of failure in this kind of operation
x = 945169.6 y= 1937162.2
and
x = 945169.6000000001 y= 1937162.2

Michaël M

@strk
Copy link
Author

strk commented Aug 6, 2017 via email

@dr-jts
Copy link
Contributor

dr-jts commented Aug 6, 2017

It's definitely interesting that the NodingValidator did not catch the failure. I'll try and look into why that is.

This is the kind of problem that Snap Rounding will solve, if and when it gets implemented.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 7, 2017

I think what is happening is that the Overlay snapping heuristic is failing to snap correctly. The Noding Validator checks whether noding is correct, but not that the result topology is correct. I guess in this case the noding is correct, but the final topology is assembled incorrectly.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 7, 2017

BTW, you might want to determine the correct expected result and put that in the test case. See attached.
bug838.zip

That way the TestRunner can detect the failure and report appropriately. (And when the output is made input-format-sensitive, a full-precision WKB actual result will be reported.)

@strk
Copy link
Author

strk commented Aug 7, 2017 via email

@dr-jts
Copy link
Contributor

dr-jts commented Aug 7, 2017 via email

@dr-jts
Copy link
Contributor

dr-jts commented Aug 7, 2017

re how to determine correct output: for a true unit test usually the correct output is known a priori, or can be determined from the intended semantics of the algorithm.

This case however is demonstrating unexpected behaviour of the overlay algorithm. The current overlay algorithm is known to have edge cases with sub-optimal behaviour, so it's not totally clear what the "correct" output should be.

As I said, this may be a case where only a snap-rounding approach makes sense. In that case it's relatively easy to synthesize what would be the desirable output under a limited precision model (which is a precondition of snap-rounding).

@strk
Copy link
Author

strk commented Aug 8, 2017 via email

@strk
Copy link
Author

strk commented Aug 8, 2017 via email

@mukoki
Copy link
Contributor

mukoki commented Aug 9, 2017 via email

@dr-jts
Copy link
Contributor

dr-jts commented Aug 9, 2017

@mukoki Thanks for reminding me of the details!

The call to OverlayResultValidator is VERY slow, so when Noding Validation was introduced this was disabled to improve performance. Noding validation is performed by the call to EdgeNodingValidator:

https://github.com/locationtech/jts/blob/master/modules/core/src/main/java/org/locationtech/jts/operation/overlay/OverlayOp.java#L231

The noding validation has proven to be pretty reliable, but perhaps this is a case where it is failing to catch the failure.

@strk Agreed, the overlay should throw a TopologyException in this case. If the noding validation was working this is what should happen. And if an exception occurs, the current code does try and run snapping before giving up.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 9, 2017

It turns out that it's possible to complete the union if the components are combined into a GeometryCollection and then using UnaryUnion.

I created a simplified test case that still exhibits the issue, along with a correct expected result. XML file attached.
bug838-simple.zip

@strk
Copy link
Author

strk commented Sep 1, 2017

So what's the plan here ? The invalid result hit during UnaryUnion results in a show-stopper on next iteration (invalid output becomes next input).

@dr-jts
Copy link
Contributor

dr-jts commented Sep 1, 2017

Ultimately the solution is to implement Snap-Rounding for JTS overlay. Some good progress has been made on this, but it's not ready for production.

Shorter term - I guess some close inspection of the EdgeNodingValidator and the topology graph to see if there's something that can be fine-tuned to catch this problem. Failing that (and I am not optimistic), I guess a better topology check. But no idea at the moment what this might be.

@strk
Copy link
Author

strk commented Sep 16, 2017

For the record, enabling OverlayResultValidator in GEOS does not catch the invalidity either!:

EdgeNodingValidator accepted the noding
OverlayResultValidator considered result valid
Using an overlay tolerance of 4.996e-07
bug838.xml: case1: test1: union(A, B):  invalid geometry (result): Ring Self-intersection at or near point 945169.60000000009 1937162.2

@strk
Copy link
Author

strk commented Sep 16, 2017

Spotted this message when enabling debug:

OverlayResultValidator: testpoint 945270.6 1937063 is on the boundary, blindly returning a positive answer (is valid)

Could that be one of the reasons for the missing invalidity catching ?

@strk
Copy link
Author

strk commented Sep 16, 2017

There is also the exact vertex that is later found as being invalid by the custom result validity checker:

OverlayResultValidator: testpoint 945169.6 1937162.2 is on the boundary, blindly returning a positive answer (is valid)

So the blindness of handling that case seems to be the culprit here.
The last port of the class from JTS in GEOS is from "rev 1.4 (JTS-1.10)", not sure exactly which version of the class it references. Unfortunately there's no history kept in the GIT version of this repository so I'm a bit lost here. For sure a quick look at the JTS code shows different symbol names, so something must have changed on the JTS side that needs to be still ported.

@dr-jts
Copy link
Contributor

dr-jts commented Sep 16, 2017

The history back to 2010 is in the repo, in the _old/history branch. But after the package renaming you have to starting browsing at an older commit to see the old files. For instance:

https://github.com/locationtech/jts/tree/f67d35c1da06922c8165f66a919490ee94a04649

However, there's no commits of any interest against the OverlayResultValidator in this history.

@dr-jts
Copy link
Contributor

dr-jts commented Sep 16, 2017

I don't see that error message in the current code, and I have no recollection of having implemented it. Is this a GEOS-only thing?

Anyway, the OverlayResultValidator is a simple heuristic meant for rough confirmation of correctness for regression testing. As it says in the Javadoc, it may fail to catch invalid results! And I don't think the algorithm it uses is worth trying to improve to be more accurate.

@dr-jts
Copy link
Contributor

dr-jts commented Sep 16, 2017

I guess one check that could be made that would catch this particular case is to check for self-intersection at a vertex, by checking if vertices occur more than once in a given ring. Essentially this just performs a partial polygon topology validity check (which should be quite a bit faster than a full IsValid check).

But this still feels like playing wack-a-mole with robustness failures...

@strk
Copy link
Author

strk commented Jan 19, 2018

@dr-jts I'm available to test a patch shall you provide it

@dr-jts
Copy link
Contributor

dr-jts commented Jan 19, 2018

Thanks, @strk . Not sure when my time will permit developing a patch for this. Will post on this ticket when I have a branch with a patch.

@dr-jts
Copy link
Contributor

dr-jts commented Jan 19, 2018

@strk I made a patch at https://github.com/dr-jts/jts/tree/overlay-ring-self-touch. If you are able to test this out for effectiveness and performance that would be great.

I'm particularly concerned about:

  • Do No Harm - this should not change any existing correct overlay cases
  • Performance - there will be a small performance impact, but it is linear in the number of coordinates, and so should be relatively low impact

(Of course I did test the above patch on the test case in this ticket, and it works. If you have other test cases it would be good to see them)

@dr-jts
Copy link
Contributor

dr-jts commented Jan 19, 2018

A couple of observations from testing the patch on the world.wkt dataset using unaryUnion:

  • the initial test failed, since the dataset is actually invalid. When it was fixed to be valid, the unaryUnion completes correctly. So that confirms correctness of the patch. (Note that prior to the fix the unaryUnion completed correctly on the original dataset. This is nice, but not actually in spec, since JTS operations are only designed to work on valid inputs. This might actually be considered to be a feature of the patch - it will catch more invalid inputs).
  • Performance went from about 3000 ms to 3700 ms. So there is about a 25% performance impact (on a very large, complex dataset).

@dr-jts
Copy link
Contributor

dr-jts commented Jan 22, 2018

I worked up another fix, one that is a bit simpler and more peformant. It involves broadening the search for invalid noding situations in FastNodingValidator, by enhancing InteriorIntersectionFinder to also check for Endpoint-Interior intersections. This should be faster than the previous Ring-Self-Touch check. It does solve this test case, and does not cause failures in all the cases I've tried so far.

@strk Can you try this patch out first, to see if it is correct and performant?

https://github.com/dr-jts/jts/tree/overlay-fix-noding-validator

@strk
Copy link
Author

strk commented Jan 23, 2018

It'll take some time as it looks like GEOS does not even have an InteriorIntersectionFinder class, while I found a SingleInteriorIntersectionFinder one supposedly ported from JTS-1.8 (and not found in current JTS master branch)

@strk
Copy link
Author

strk commented Jan 23, 2018

Porting the conceptual change to our SingleInteriorIntersectionFinder class indeed fixes the issue, but the GEOS result is different from the expected result you encoded in the simplified XML file.
Here's what GEOS produces:

0103000000010000000A000000000000005BD72C4167666666098F3D413333333359D72C41000000806D8F3D416766666621D82C41CDCCCC4C6E8F3D41CDCCCCCCB1D92C419A999919708F3D41F7E80CD5D4D92C41129CC3FBA78E3D4133333333EDD82C4100000000A78E3D410100000025D82C419A999919A68E3D413333333323D82C41333333330A8F3D410000000025D82C419A999919A68E3D41000000005BD72C4167666666098F3D41

It is a slightly more "snapped" output. GEOS heuristic ends up using an overlay tolerance of 2.02e-07 upon receiving the TopologyException - not sure where JTS ends up going. Anyway the result seems acceptable to me. Did you try running the test again on the JTS side with your patch ? Chances are it is just UnaryUnion giving a slighly different result than binary union...

@strk
Copy link
Author

strk commented Jan 23, 2018

You can find updated XML test as part of this PR: https://git.osgeo.org/gitea/geos/geos/pulls/22 (or more specifically here)

@dr-jts
Copy link
Contributor

dr-jts commented Jan 23, 2018

This is good news. The output in JTS for the geos838-simple case is identical to the output you gave above. (The output in the test case file was determined manually, so not too surprising it's different.).

So this looks promising. Are you able to run more tests to confirm no regressions, and maybe more improvements?

@dr-jts
Copy link
Contributor

dr-jts commented Jan 23, 2018

If this looks like a solid improvement, then I'm tempted to roll it into the main JTS codebase. However, I might add a runtime flag to allow disabling it, just in case some regressions turn up.

The flag could be in the form of a class with global variables that can be set, and also perhaps ability to set them via JVM parameter.
@jnh5y @jodygarnett any thoughts on this approach for "feature flags"?

@strk
Copy link
Author

strk commented Jan 23, 2018 via email

@strk
Copy link
Author

strk commented Jan 23, 2018

@vpicavet do you want to try the GEOS branch with some bigger dataset ?

@dr-jts
Copy link
Contributor

dr-jts commented Jan 23, 2018

I moved the Noding Validator fix branch in this repo. Also added Javadoc, did some refactoring, added the GEOS test file, and added a unit test for FastNodingValidator.

https://github.com/locationtech/jts/tree/overlay-fix-noding-validator

@jnh5y
Copy link
Contributor

jnh5y commented Jan 24, 2018

@dr-jts Using Java System properties for that is somewhat common in GeoTools/GeoServer and happens some in GeoMesa.

What's the point of the flag? Is the 'old' way faster and less correct? (So that previously happy users would prefer to retain the 'speed' benefits?)

@dr-jts
Copy link
Contributor

dr-jts commented Jan 24, 2018

Yes, the old way is slightly faster, and somewhat less correct. The real issue is that it's not possible to test the new code on every single geometry out there. So there is a small but significant concern of a regression bug occurring. In that case, it would be nice to have a preventive fallback in place, so a new release is not required.

Although I guess clients that experience this can always fall back to the previous version of JTS (and might be motivated to fund a bug fix, if they want the new version badly enough. So perhaps this is good enough.

@jnh5y
Copy link
Contributor

jnh5y commented Jan 24, 2018

I'm +1 on having a System property. Maintaining those in a consistent and useful way can take some effort. Would this be the first one for JTS?:)

@dr-jts
Copy link
Contributor

dr-jts commented Jan 24, 2018

Yes, agreed, they take effort (in documentation, if nothing else).

JTS has one System property now, to enable debug mode.
https://github.com/locationtech/jts/blob/master/modules/core/src/main/java/org/locationtech/jts/util/Debug.java#L48

I think I have talked myself out of feeling like this needs a feature flag, though...

@dr-jts
Copy link
Contributor

dr-jts commented Jan 31, 2018

@strk any progress testing with other data? Am very curious to see if this handles any current failure cases.

@strk
Copy link
Author

strk commented Jan 31, 2018

I hadn't found the time to test more datasets. For sure we know it fixes at least one:

https://trac.osgeo.org/geos/ticket/838

The GEOS issue tracker has two more "invalid output from valid input" tickets, those would be worth testing:

https://trac.osgeo.org/geos/ticket/523
https://trac.osgeo.org/geos/ticket/789

They are both tagged as "fails on JTS too" so maybe you can give them a try on your side.
I was hoping @vpicavet would help testing

@dr-jts
Copy link
Contributor

dr-jts commented Jan 31, 2018

I'll try those out.

I did try this fix out on a fairly extensive set of nasty large polygon overlay cases I keep caged in the JTS dungeon. They all... still failed. So this is not a full fix for robustness problems. I'm pretty sure this is because the snapping heuristic is just not powerful enough to handle certain kinds of failures. (In fact, counter-intuitively this patch may actually make things worse, because it will catch more noding failures, and the snapping will only handle a portion of those - resulting in a net increase of topology failures!)

As a side observation, the crude snap-rounding overlay in the JTS Lab worked on 100% of those nasty datasets...

@dr-jts
Copy link
Contributor

dr-jts commented Jan 31, 2018

This patch fixes https://trac.osgeo.org/geos/ticket/523

The case in https://trac.osgeo.org/geos/ticket/789 is a different issue. It's not casued by numerical precision issues. It seems to be some kind of bug in the topology creation for difference() and symDifference(). That is definitely a surprise - haven't seen such a simple failure case before.

@dr-jts
Copy link
Contributor

dr-jts commented Aug 9, 2019

Closed by #257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants