Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopologyException in set intersection functions #603

Closed
jeffreyhanson opened this issue Dec 23, 2017 · 6 comments
Closed

TopologyException in set intersection functions #603

jeffreyhanson opened this issue Dec 23, 2017 · 6 comments

Comments

@jeffreyhanson
Copy link
Contributor

Hi,

I've encountered two sfg (MULTIPOLYGON) objects which throw a TopologyException error when I try to use set intersection functions on them (e.g. sf::st_difference). I've tried using lwgeom::st_make_valid to fix the data (though sf::st_is_valid returns TRUE for both objects), buffering them by a distance of zero, and modifying the precision of the calculations (using sf::st_set_precision after converting them to an sfc object).

If anyone has a moment, I would really appreciate it if you could take a look at this. I've attached the files in RDS format (polygon_data.zip). Below is some example code that imports the objects and generates the error.

library(sf)
p1 <- readRDS("p1.rds")
p2 <- readRDS("p2.rds")
p3 <- sf::st_difference(p1, p2)
# Error in CPL_geos_op2(op, st_geometry(x), st_geometry(y)) : 
# Evaluation error: TopologyException: found non-noded intersection between LINESTRING
# (1.58836e+06 4.27835e+06, 1.58836e+06 4.27835e+06) and LINESTRING (1.58836e+06 
# 4.27835e+06, 1.58836e+06 4.27835e+06) at 1588363.7352041774 4278353.7855906803.

Also, here is my session information (using the latest version of sf on GitHub).

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_0.5-6            testthat_1.0.2.9000 devtools_1.13.4    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14     clisymbols_1.2.0 class_7.3-14     digest_0.6.13   
 [5] withr_2.1.0.9000 grid_3.4.3       R6_2.2.2         DBI_0.7         
 [9] magrittr_1.5     units_0.4-6      e1071_1.6-8      rlang_0.1.4     
[13] tools_3.4.3      udunits2_0.13    compiler_3.4.3   classInt_0.1-24 
[17] memoise_1.1.0  

Apologies if I've posted this in the wrong place, and please let me know the correct forum to post this.

@edzer
Copy link
Member

edzer commented Dec 23, 2017

@tim-salabim @timelyportfolio this happens a lot, and is a point where a user would like to fire up mapview to check, and then mapedit to repair the broken geometry. Any ideas? I could load these geometries in mapview, but had a hard time finding the particular point the error message points to. Can this be done easily, or can we make it easy?

@lbusett
Copy link
Contributor

lbusett commented Dec 23, 2017

This seems somehow related to discussions in #600 regarding precision.

It appears that the p2 object contains valid geometries, but "barely". In fact:

readRDS("D:/p2.rds") %>% st_sfc() %>% st_is_valid()
[1] TRUE

, but :

> readRDS("D:/p2.rds") %>% st_sfc() %>% st_set_precision(10000000) %>% st_is_valid()
[1] FALSE
Warning message:
In evalq((function (..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE,  :
  Too few points in geometry component at or near point 1602556.2014496 4270010.0747427

Looking at p2 in mapview shows that you have some "slivers" in the dataset, which I guess become invalid as soon as precision is reduced, since very near coordinates are collapsed on the same values (maybe wrong on this, but it seems plausible). Note that 10000000 is already a quite high accuracy - subsubsubsub-centimetric if we are speaking of a metric projection).

Now, if I understand correctly the st_set_precision help page, routines in GEOS and lwgeom always receive "rounded coordinates", even if you do not specify manually a precision. I am guessing here therefore that the "automatic" rounding is sufficient to invalidate your geometries, thus leading to the error in st_difference.

A workaround on this appears to be to call st_make_valid AFTER you set the precision:

p1 <- readRDS("D:/p1.rds") %>% st_sfc()
p2 <- readRDS("D:/p2.rds") %>% st_sfc() %>% st_set_precision(1000000) %>% lwgeom::st_make_valid()
p3 <- sf::st_difference(p1, p2)
p3
Geometry set for 1 feature 
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 1586298 ymin: 4277065 xmax: 1588823 ymax: 4278701
epsg (SRID):    NA
proj4string:    NA
MULTIPOLYGON (((1588730.25642744 4277695.439676...

, which at least does not throw an error (although it is difficult for me to know if it does what you wish).

PS: @edzer : I hope I am not completely off-track, here. In case, feel free to delete this.

@tim-salabim
Copy link
Member

@edzer I am on holidays until early Jan so bare with me. I will have a look at this as soon as I'm back. Just as a thought, would it be possible to translate the error message to geometry ids somehow and provide them as part of the error message? Would make finding the culprit a little easier at least.

@edzer
Copy link
Member

edzer commented Dec 23, 2017

@lbusett great that you solved this! You are nearly on track; by default, precision is set to 0, which means do nothing to the coordinates (see here). So no: GEOS/GDAL/liblwgeom all receive the raw coordinates. I remember discussing with @rsbivand a year ago that it might be a good idea to do something else, i.e. set a non-zer default precision, but that didn't lead to action.

@tim-salabim sure no hurries; yeah, fishing the problematic coordinates out of the GEOS emitted error will be tricky, and is on my plate. I'll promise to not do that this year.

@jeffreyhanson
Copy link
Contributor Author

Brilliant - yeah that fixes the problem @lbusett. Thank you very much everyone.

@nakoafarrant
Copy link

I'm running into a similar error when I try to "erase" one shape file from another using the st_erase function that I can't seem to resolve.

The data is found online here http://planning.hawaii.gov/gis/download-gis-data/. Once you follow that link, click on the first drop down labeled "Agriculture and Farming" and download the two shape files: "Agricultural Land Use Baseline (2015)" and "Agricultural Land Use Maps (ALUM)"

library(sf)
library(lwgeom)

ag2015_sf <- st_read("data/2015AgBaseline.shp") %>% 
  st_transform(st_crs("+proj=utm +ellps=GRS80 +datum=WGS84")) %>% 
  st_set_precision(1000000) %>% 
  st_make_valid() # class sf and data.frame

ag1980_sf <- st_read("data/alum.shp") %>% 
  st_transform(st_crs("+proj=utm +ellps=GRS80 +datum=WGS84")) %>% 
  st_set_precision(1000000) %>% 
  st_make_valid()

# a helper function that erases all y from x
st_erase = function(x, y) 
  st_difference(st_geometry(x), st_union(st_combine(st_geometry(y))))

ag_erase <- st_erase(ag1980_sf, ag2015_sf)

#ag_erase <- st_erase(ag1980_sf, st_buffer(ag2015_sf, 0))

Running this code results in the error "Error in CPL_geos_op2(op, st_geometry(x), st_geometry(y)) : Evaluation error: TopologyException: Input geom 1 is invalid: Self-intersection at or near point 2947191.1423960002 2626263.8412870001 at 2947191.1423960002 2626263.8412870001."

I also tried running the above code with the final commented out line instead that uses an st_buffer of 0 based on suggestions I saw on other forums, but the same error was thrown

Here is the session info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lwgeom_0.2-1      rgeos_0.5-2       ggspatial_1.0.3   forcats_0.4.0     stringr_1.4.0    
 [6] dplyr_0.8.3       purrr_0.3.3       readr_1.3.1       tidyr_1.0.0       tibble_2.1.3     
[11] ggplot2_3.2.1     tidyverse_1.3.0   sf_0.8-1          PBSmapping_2.72.1 rgdal_1.4-8      
[16] gpclib_1.5-5      maptools_0.9-9    raster_3.0-12     sp_1.4-0         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         lubridate_1.7.4    lattice_0.20-38    class_7.3-15       assertthat_0.2.1  
 [6] zeallot_0.1.0      R6_2.4.1           cellranger_1.1.0   backports_1.1.5    reprex_0.3.0      
[11] e1071_1.7-3        httr_1.4.1         pillar_1.4.3       rlang_0.4.2        lazyeval_0.2.2    
[16] readxl_1.3.1       rstudioapi_0.10    foreign_0.8-71     munsell_0.5.0      broom_0.5.3       
[21] compiler_3.6.1     modelr_0.1.5       xfun_0.12          pkgconfig_2.0.3    tidyselect_0.2.5  
[26] codetools_0.2-16   fansi_0.4.1        crayon_1.3.4       dbplyr_1.4.2       withr_2.1.2       
[31] grid_3.6.1         nlme_3.1-140       jsonlite_1.6       gtable_0.3.0       lifecycle_0.1.0   
[36] DBI_1.1.0          magrittr_1.5       units_0.6-5        scales_1.1.0       KernSmooth_2.23-15
[41] cli_2.0.1          stringi_1.4.5      fs_1.3.1           xml2_1.2.2         vctrs_0.2.1       
[46] generics_0.0.2     tools_3.6.1        glue_1.3.1         hms_0.5.3          colorspace_1.4-1  
[51] classInt_0.4-2     rvest_0.3.5        knitr_1.27         haven_2.2.0

I'm not sure if I have to some how cast the points from one of the geometries into a multipolygon or something? What I noticed is that the geometries between the two data sets seem slightly different ag1980 uses list(c(points)) and ag2015 uses list(list(c(points)). Not sure if this will make the difference.

Alternatively, is this maybe an issue with the size of the files?
When I was running the following code

ag_difference <- st_difference(ag1980_sf, ag2015_sf)

I received a different error about bad_alloc which seems to be related to memory allocation for the processing or something?

Thanks for any advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants