Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopologyException in multipolygons (intersection function) #794

Closed
carlosmolinaguerra opened this issue Jul 13, 2018 · 35 comments
Closed

TopologyException in multipolygons (intersection function) #794

carlosmolinaguerra opened this issue Jul 13, 2018 · 35 comments

Comments

@carlosmolinaguerra
Copy link

carlosmolinaguerra commented Jul 13, 2018

Hi,
I have a large sfc (MULTIPOLYGON) objects which throw several TopologyException errors when I try to use the intersection function on them. I have a couple of questions so, If anyone has a moment, I would really appreciate it if you could take a look at this. I've attached the files in RDS format (all). Below is some example code.

all<-readr::read_rds("all.rds")

None of these lines seems to work:

all %>% st_intersection
all %>% st_set_precision(1000000) %>% lwgeom::st_make_valid %>% st_intersection
all %>% st_set_precision(1000) %>% lwgeom::st_make_valid %>% st_intersection

  1. sf::st_is_valid returns TRUE for all objects. I've tried using mapview, lwgeom::st_make_valid and modifying the precision of the calculations. Only when I set precision to 100 or lower, the "rounded coordinates" become valid geometries to execute the st_intersection function. This issue is related to discussions in #600 and #603. The problem for me is that rounding the coordinates (two decimal places) does not seem appropriate for the work I'm doing. I tried several strategies including rounding (and using st_make_valid) the coordinates just for next rows (sfg):
    all %>% st_set_precision(100) %>% lwgeom::st_make_valid %>% sf::st_is_valid
    However, the issue remains.

  2. A subset of the data that generates one of the TopologyException errors is:

subset<-all %>% subset(id %in% c(47,98,158))
st_intersection(subset)

I imagine I am off-track here but It seems tricky to me that the st_intersection of these three multipolygons do not work but pairwise-intersections of these rows do not generate any error. Could you please let me know what I am missing?

int12<-st_intersection(subset[c(1,2),])
int13<-st_intersection(subset[c(1,3),])
int23<-st_intersection(subset[c(2,3),])

Apologies if I've posted this in the wrong place. I hope I am not completely off-track. In case, feel free to delete this.

@dbaston
Copy link
Contributor

dbaston commented Aug 14, 2018

If you can output the offending geometries in WKB format, you could submit it as a bug report to GEOS.

@edzer
Copy link
Member

edzer commented Aug 20, 2018

Read the documentation section starting with "when called with a missing ‘y’,", and consider calling

out <- st_intersection(all, all)

to avoid the behavior described there (and get pairwise intersections, which you seem to want).

@carlosmolinaguerra
Copy link
Author

Thanks for the reply and your time.
@edzer: Apologies if was not clear. I do not want pairwise intersections. Each polygon in input "all" represents a spoken language. But, since several languages may be spoken in a location, the polygons overlap. My ideal output is all non-empty intersections of the geometries and thus, each location would only have one polygon that indicates which language(s) is(are) spoken, which as far as I understand it is made by "st_intersection" with a missing ‘y’ (and the field "origins" indicating the indexes of the languages).

I have been working in this a bit and below what I find. I'll assume just for a moment that the input has 3 features as here:

all<-readr::read_rds("all.rds")
x<-all %>% subset(id %in% c(47,98,158),select=geometry)

In this case, I create all non-empty intersections of these 3 geometries "manually" as:

i123<-st_intersection(x[1,],x[2,]) %>% st_intersection(.,x[3,])
i12<-st_intersection(x[1,],x[2,]) %>% st_difference(.,i123 %>% st_combine %>% st_union)
i13<-st_intersection(x[1,],x[3,]) %>% st_difference(.,i123 %>% st_combine %>% st_union)
i23<-st_intersection(x[2,],x[3,]) %>% st_difference(.,i123 %>% st_combine %>% st_union)
i1<-st_difference(x[1,],x[2:3,] %>% st_combine %>% st_union)
i2<-st_difference(x[2,],x[c(1,3),] %>% st_combine %>% st_union)
i3<-st_difference(x[3,],x[1:2,] %>% st_combine %>% st_union)

So that the final output is:
manual<-rbind(i123,i12,i13,i23,i1,i2,i3)

Which should be equivalent to:
st_intersection(x)

However, note that unlike the "manual" program, the line below throws a TopologyException. I wonder if "st_intersection" should be throwing that TopologyException. Any answer here would be super appreciated.

In additional, all non-empty intersections of the main database "all" were created using the first method ("manual").

@rsbivand
Copy link
Member

Please try in rgeos to see whether the precision model affects this, after converting all to sp classes. Further, try lwgeom::st_make_valid(all) to see whether there is something in all that is a topological issue.

@rsbivand
Copy link
Member

rsbivand commented Aug 21, 2018

With:

library(sp)
library(sf)
all_sp <- as(all, "Spatial")
x_sp <- all_sp[all_sp$id %in% c("47", "98", "158"),]
library(rgeos)
xt1 <- gIntersection(x_sp, x_sp, byid=TRUE, drop_lower_td=TRUE)

apart from generating geometry collections (point and line intersections needing dropping as having no area) with no apparent problems. Is id the variable identifying languages? However:

> i1 <- gIntersects(all_sp, byid=TRUE, returnDense=FALSE, checkValidity=TRUE)
> table(sapply(i1, length))

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
 299  968 1035 1113 1075  865  615  415  306  201  129   81   91   55   37   18 
  17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
  22   29   28    9   11    9   15    9    8    7    3    6    3    6    2    4 
  34   36   37   38   39   41   42   44   46   47   49   56   57   62   65   73 
   2    2    5    3    1    2    3    1    1    1    1    1    1    1    1    2 
  88   91   92   95  127  141  205  288 
   1    1    1    1    1    1    1    1 

so one of your polygons intersects with 280 others. How precise do you need your output to be?

@rsbivand
Copy link
Member

rsbivand commented Aug 21, 2018

Further, this completes:

t1 <- gIntersection(all_sp, all_sp, byid=TRUE, drop_lower_td=TRUE)

so the underlying issue may be the precision setting.

> sum(sapply(i1, length))
[1] 44004
> length(t1)
[1] 15930

suggest that care may be needed, but we are dropping line and point intersections. I needed about 15GB on a 16GB machine. There are also 7510 self-intersections, I think.

@carlosmolinaguerra
Copy link
Author

Thanks @rsbivand (and @dbaston for submitting the report)!

  • Yes, "id" is the variable identifying languages.
    -There is a polygon that intersects with 280 others (correct me if I am wrong but I think this should not be a limitation to do the intersection, plus this does not means that 280 polygons relies in the same location). I agree that precision may be import here. In fact, in some cases, two polygons barely intersects (which would create sliver polygons if an intersection is carried out).
    -lwgeom::st_make_valid(all) does not make changes to the features.
    -Thanks for your explanation using gIntersection (if I understand correctly, it also shows that pairwise intersection between features should not throw a TopologyException).

@rsbivand
Copy link
Member

The original file is on http://spatial.nhh.no/misc/i794/, together with a WKT version (zip-compressed, otherwise 13.7GB), and a GPKG (difficult0.gpkg). This was read into GRASS with snap=1e-09, and written out including unattached geometries -c as difficult_out.gpkg Then:

library(sf)
diff_all <- st_read("difficult_out.gpkg")
# drop geometries not assgned to GRASS cat
diff_all0 <- diff_all[!is.na(diff_all$cat),]
# create 4-character strings to sort properly
diff_all0$id4 <- formatC(as.integer(diff_all0$id), flag="0", width=4, format="d")
library(units)
lim <- 1
units(lim) <- as_units("km2")
units(lim) <- "m2"
diff_areas <- st_area(diff_all0)
# drop all geometries with  an area of less than 1 km2
diff_all0a <- diff_all0[diff_areas > lim,]
# aggregate GRASS output geometries by language code
diff_all1 <- aggregate(diff_all0a, list(id=diff_all0a$id4), head, n=1) 
# impose GEOS-validity on geometries (about 5 were invalid)
diff_all2 <- st_make_valid(diff_all1)
# carry out the unary intersection
diff_all_int <- st_intersection(diff_all2)
# create the language code look-up table
lut <- unique(sort(diff_all_int$id4))
# convert list column `origins` to a string column
origins_str <- sapply(diff_all_int$origins, function(x) paste(lut[x], collapse=":"))
diff_all_int$origins_str <- origins_str
# drop output areas less than 1 km2
out_areas <- st_area(diff_all_int)
diff_all_int1 <- diff_all_int[out_areas > lim,]
# extract polygons from GEOMETRYCOLLECTION output
gts <- st_geometry_type(diff_all_int1)
oo <- st_geometry(diff_all_int1)
oo_GC <- oo[gts == "GEOMETRYCOLLECTION"]
for (i in seq(along=oo_GC)) { oo_GC[[i]] <- st_sfc(st_union(st_collection_extract(oo_GC[[i]], "POLYGON")))[[1]] }
oo[gts == "GEOMETRYCOLLECTION"] <- oo_GC
# and restore to object
st_geometry(diff_all_int1) <- oo
st_write(diff_all_int1, "diff_all_int1.gpkg", delete_layer=TRUE)

and

diff_all_int1 <- st_read("diff_all_int1.gpkg")
library(mapview)
mapviewOptions(fgb = FALSE)
mapview(diff_all_int1)

It is not clear that the assignement of the language codes from diff_all_int$origins is correct, and one would have to have the actual language strings to know what they may mean.

The input polygons had multiple problems, not just that they overlapped. Comments welcome: @dr-jts @edzer @dbaston

@edzer
Copy link
Member

edzer commented Oct 20, 2020

I can run this, but cannot follow 100% of what you are doing. What is question?

@rsbivand
Copy link
Member

rsbivand commented Oct 20, 2020 via email

@dr-jts
Copy link

dr-jts commented Oct 20, 2020

The original file is on http://spatial.nhh.no/misc/i794/, together with a WKT version (zip-compressed, otherwise 13.7GB), and a GPKG (difficult0.gpkg).

Note that it may be the case that the WKT version of the data has been pointwise precision truncation. This means that it won't reproduce this issue correctly for two reasons:

  • several of the polygons are now invalid, due to precision truncation causing vertices "jumping" across nearby line segments
  • The change in precision tends to eliminate robustness occurs that might otherwise occur

Testing should be done with a full-precision version of the data.

@dr-jts
Copy link

dr-jts commented Oct 20, 2020

I can run this, but cannot follow 100% of what you are doing. What is question?

@edzer the goal is to determine if the original TopologyException errors are eliminated by the use of OverlayNG. This appears to be the case when using JTS and GEOS directly (for example, this GEOS unit test is an extract of two of the polygons from the data in this issue).

My understanding is that @rsbivand reran this case using GEOS OverlayNG and still encountered TopoEx errors. If this is correct I would like to find out why that is. To do that I will need a WKB dump of the two geometries causing the error in a GEOS overlay function. (And if there are multiple errors, ideally a dump of all the cases). This might need to be done by adding some debug tracing to the rsf C code?

@edzer
Copy link
Member

edzer commented Oct 21, 2020

I didn't see any strange errors running this on GEOS 3.9.0dev, compiled a few days ago from github for OverlayNG testing.

@rsbivand
Copy link
Member

rsbivand commented Oct 21, 2020

Do you mean that:

all <- readRDS("all.rds")
oo <- st_intersection(all)

succeeds for you with OverlayNG? For me it still fails for the snapshot of 9 October; rebuilding now with 20 October - report later:

Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (28.2168 45.4666, 28.2153 45.4669) and LINESTRING (28.2153 45.4669, 28.2168 45.4666) at 28.215309983502991 45.466850983501708.

This is an error occurring at some point beyond the pairwise intersections, where intersection products are submitted to further intersection.

@rsbivand
Copy link
Member

Same error for me with geos-20201020.tar.bz2, mkdir obj && cd obj && ../configure --enable-overlayng; ad-hoc test for OverlayNG indicates that it is active:

> sf:::is_overlayng()
[1] TRUE
Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (28.2168 45.4666, 28.2153 45.4669) and LINESTRING (28.2153 45.4669, 28.2168 45.4666) at 28.215309983502991 45.466850983501708.

Without OverlayNG:

Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (29.6806 45.2723, 29.6797 45.2827) and LINESTRING (29.6797 45.2827, 29.6806 45.2723) at 29.680570134978936 45.272265134997795.
> sf:::is_overlayng()
[1] FALSE
> sf:::is_overlayng()
[1] FALSE
> all_err <- st_read("all_err_20_58_71.gpkg")
Reading layer `all_err_20_58_71' from data source `/home/rsb/tmp/bigshape/all_err_20_58_71.gpkg' using driver `GPKG'
Simple feature collection with 3 features and 1 field
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 20.26439 ymin: 43.6273 xmax: 40.20739 ymax: 52.36936
geographic CRS: WGS 84
> st_intersection(all_err)
although coordinates are longitude/latitude, st_intersection assumes that they are planar
Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (29.6806 45.2723, 29.6797 45.2827) and LINESTRING (29.6797 45.2827, 29.6806 45.2723) at 29.680570134978936 45.272265134997795.

Intersection of these three objects generate the two linestrings which then have a non-noded intersection both with and without OverlayNG. I'll put the GPKG on my server, same directory. I haven't found out how to write out WKB.

@rsbivand
Copy link
Member

And:

> library(sf)
Linking to GEOS 3.9.0dev, GDAL 3.2.0dev-40bb56f3f8, PROJ 7.2.0
> all <- readRDS("all.rds")
> sf:::is_overlayng()
[1] TRUE
> oo <- st_intersection(all[-c(20, 58, 71),])
although coordinates are longitude/latitude, st_intersection assumes that they are planar
Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: side location conflict at 0.41851800000000594 14.97339999999997.

@edzer
Copy link
Member

edzer commented Oct 21, 2020

I no longer have access to all.rds.

@rsbivand
Copy link
Member

rsbivand commented Oct 21, 2020 via email

@edzer
Copy link
Member

edzer commented Oct 21, 2020

Thanks; running st_intersection(all), OverlayNG gives me the very long error starting with::

CBR: result (after common-bits addition) is INVALID: Self-intersection at or near point 83.8125 50.88305600000001 (83.8125 50.883056000000010499)
<A>
MULTIPOLYGON (((68.9958649999999807 55.4176659999999970, 69.0074149999999804...

GEOS 3.8.1 gives me:

Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (29.6806 45.2723, 29.6797 45.2827) and LINESTRING (29.6797 45.2827, 29.6806 45.2723) at 29.680570134978936 45.272265134997795.

@rsbivand
Copy link
Member

rsbivand commented Oct 21, 2020 via email

@edzer
Copy link
Member

edzer commented Oct 21, 2020

Ah, the end of the OverlayNG error message:

 -58.1694770468632072 51.4687769821861920)))
</A>
Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (28.2168 45.4666, 28.2153 45.4669) and LINESTRING (28.2153 45.4669, 28.2168 45.4666) at 28.215309983502991 45.466850983501708.

@rsbivand
Copy link
Member

rsbivand commented Oct 21, 2020

And with s2:

> table(st_is_valid(all))

TRUE 
7510 
> ooo <- st_as_s2(all)
Error in s2_geography_from_wkb(x, oriented = oriented, check = check) : 
  Loop 29 is not valid: Edge 1182 crosses edge 1188

@rsbivand
Copy link
Member

From GRASS:
grass_input.log

@edzer
Copy link
Member

edzer commented Oct 21, 2020

> which(!s2_is_valid(st_as_s2(all, check = FALSE)))
[1] 42

@dr-jts
Copy link

dr-jts commented Oct 23, 2020

Update: it turned out that the errors occurring when using OverlayNG were due to a small bug in the code. This is in the process of being fixed in JTS. Once it is ported to GEOS I will update this issue, and the dataset can be retested.

@strk
Copy link

strk commented Oct 26, 2020

As the JTS fix landed, I'll be working on the GEOS port of the fix: https://trac.osgeo.org/geos/ticket/1062

@strk
Copy link

strk commented Oct 26, 2020

GEOS port completed

@rsbivand
Copy link
Member

@strk : thanks! Unfortunately, I see the same error as before, after pulling the gitea master, removing, rebuilding and installing GEOS, rebuilding and installing sf. @edzer could you please try as well, in case I've missed something?

@dr-jts
Copy link

dr-jts commented Oct 26, 2020

@rsbivand That's unfortunate. As before, I wonder if perhaps OverlayNG is not actually being used? And if so, if there is any way to add debug code to the n-ary intersection to dump out the failing geometries.

@rsbivand
Copy link
Member

Even with the latest commits by Paul Ramsey to the GEOS git repo, still:

> all <- readRDS("all.rds")
> i794_out <- st_intersection(all)
although coordinates are longitude/latitude, st_intersection assumes that they are planar
Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (28.2168 45.4666, 28.2153 45.4669) and LINESTRING (28.2153 45.4669, 28.2168 45.4666) at 28.215309983502991 45.466850983501708.

For reference @dr-jts , my ../configure --enable-overlayng output finishes with OverlayNG: true, and in config.log I see:

AM_CFLAGS='  -Wsuggest-override -pedantic -Wall -Wno-long-long  -ffloat-store -DUSE_UNSTABLE_GEOS_CPP_API -DUSE_OVERLAYNG'
AM_CXXFLAGS=' -DGEOS_INLINE  -Wsuggest-override -pedantic -Wall -Wno-long-long  -ffloat-store -DUSE_UNSTABLE_GEOS_CPP_API -DUSE_OVERLAYNG'

sf also reports that OverlayNG is in use - the test compares sliver output order which differs for the same input between pre-OverlayNG and OverlayNG.

Of course, it may be that the sf nary intersection itself is broken in edge cases, or that the GEOS port doesn't pick up all of the JTS changes.

I also tried 1e10 precision, and that fails too:

> st_precision(all) <- 1e10
> i794_out <- st_intersection(all)
although coordinates are longitude/latitude, st_intersection assumes that they are planar
Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection between LINESTRING (4.00948 16.1236, 4.00668 16.1158) and LINESTRING (4.00671 16.1159, 4.00668 16.1158) at 4.0066794493918865 16.115791550838019.

@dr-jts
Copy link

dr-jts commented Oct 27, 2020

Seems pretty clear that sf is using OverlayNG. Puzzling why the fix hasn't worked.

My ideas for investigating now are:

  • Add tracing code to the sf code to dump out the exact geometries causing an overlay op to fail. This will need to be done by the sf team
  • Implement the n-ary intersection algorithm in C++ using GEOS directly, and see if it fails. Not sure when I'll have time to do that.

@rsbivand
Copy link
Member

rsbivand commented Nov 4, 2020

@dr-jts Sorry not to have responded properly. I agree that tracing code needs adding to the nary intersection function. I can't do that now, as I'd have to learn several things that I do not understand first (the Rcpp package using templates, I write ony C-style Cpp). I think @edzer is also busy.

I can on the other hand report success with OverlayNG on a messy 49000 multipolygon archaeological data set, but with binary interssection - it included apparently invalid polygons which stopped it before OverlayNG, but noe succeeded.

@dr-jts
Copy link

dr-jts commented Nov 4, 2020

I can on the other hand report success with OverlayNG on a messy 49000 multipolygon archaeological data set, but with binary interssection - it included apparently invalid polygons which stopped it before OverlayNG, but noe succeeded.

Good to hear. Any chance that dataset is available?

@dr-jts
Copy link

dr-jts commented Nov 5, 2020

A few (8) of the polygons in the rawhic dataset are invalid, many (or all) due to "inverted shells" - i.e. polygons which self-touch at a point, forming an enclosure which should be a hole under OGC structural rules. So that's probably why this fails using the old overlay. Happily OverlayNG is less sensitive to this issue.

edzer added a commit that referenced this issue Dec 7, 2020
this turns of the GEOS error+warning mechanisms, and drops
geometries that cause trouble.
* addresses #794
* motivated by #1549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants