Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_intersection behaves differently depending on the order of polygons #936

Closed
jebyrnes opened this issue Jan 3, 2019 · 7 comments
Closed

Comments

@jebyrnes
Copy link

jebyrnes commented Jan 3, 2019

Hello! First off, thanks as always for an amazing package. And now, a funky error. I've been working with a citizen science data and having a real problem using st_intersection on it. For full background (and where @spacedman and I discovered the issue) see https://gis.stackexchange.com/questions/307423/st-intersection-failing-for-overlapping-multipolygons-in-sf/307456

There are two odd things. First, in general, st_intersection fails on this dataset no matter what I do

library(dplyr)
library(purrr)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
#> this is sf 0.7-2 from CRAN

#load the data
test_set <- devtools::source_gist("c95701bd444cda3e342838fd9a090fb3",
 filename = "test_set.R")$value

I've tried

overlap <- st_intersection(test_set)

which throws

TopologyException: Input geom 0 is invalid: Self-intersection at or
  near point 314406.13208707399 -5762352.8343122564 at 
  314406.13208707399 -5762352.8343122564.

which I tried to resolve with either

overlap_buff <- test_set %>%
  st_buffer(0) %>%
   st_intersection()

or

overlap_snap <- test_set %>%
    lwgeom::st_snap_to_grid(10)%>%
    lwgeom::st_make_valid() %>%
    st_intersection()

but both create errors along the lines of

Error in CPL_nary_intersection(x) : 
  Evaluation error: TopologyException: found non-noded intersection
 between LINESTRING (312816 -5.76168e+06, 312816 -5.76168e+06) and
  LINESTRING (312816 -5.76168e+06, 312816 -5.76168e+06) at 
  312815.57772721699 -5761684.4256249201.

I was worried it might be a data problem, but @spacedman found something curious - depending on the order in which you run the operations, you get different results.

They sampled down to a small set of polygons and found a set of 5 that will fail, but if you reverse their order, st_intersection will be fine.

download.file("https://gist.githubusercontent.com/barryrowlingson/4665509c3c00c65ccdc0cc73b4ff10dd/raw/2132f4caf3a4629569f2ccc0727b6f5c4180f743/p.R",destfile="p.R")

p = dget("./p.R")

#does not work - failed with a
#non-noded intersection error
st_intersection(p)

#but this works
st_intersection(p[5:1])

That st_intersection behaves differently depending on order of polygons seems like a bug, and that the underlying issue of the failures above might be something deeper.

@edzer
Copy link
Member

edzer commented Jan 3, 2019

@rsbivand pointed me earlier to this thread: https://lists.osgeo.org/pipermail/geos-devel/2019-January/008794.html , which doesn't give much hope. The only thing sf does is pass on polygons to GEOS and see what comes back. I'm not surprised that the order of geometries in a st_intersection call matters, as it goes through a double loop, and follows the order in which the geometries come; see here.

@jebyrnes
Copy link
Author

jebyrnes commented Jan 3, 2019

Ugh - a known problem, and a recent one that requires new algorithms. Hrm. Weird. I wonder if https://github.com/tudelft3d/pprepair (mentioned in the thread) can somehow be used for a solution? I admit, we're quickly going outside of my knowledge zone here...

(relevant paper seems to be https://3d.bk.tudelft.nl/hledoux/pdfs/14_cgeo_prepair.pdf - and is fascinating!)

@jebyrnes
Copy link
Author

jebyrnes commented Jan 3, 2019

The other slower and less satisfying solution is to rasterize (with fasterize), calculate sums, and then re-polygonize. But that's slooooow. Or was with sp.

@edzer
Copy link
Member

edzer commented Jan 3, 2019

Yes, it is fascinating stuff, and so is your problem - very different from the pprepair stuff AFAIR.

However, here we try to solve issues with sf. If you consider this to no longer be an issue of sf, then please close here.

@jebyrnes
Copy link
Author

jebyrnes commented Jan 3, 2019

No prob - just wondered if it could be a solution within sf. Will close. FWIW, fasterize + spex::polygonize seem to work OK, as you can see here - https://gist.github.com/jebyrnes/77a0efd00f7f5a1f142859ac0153f271 - but it's not an ideal solution and quite fiddly.

@jebyrnes jebyrnes closed this as completed Jan 3, 2019
@rsbivand
Copy link
Member

rsbivand commented Jan 3, 2019

Just a comment: test_set is:

image

so maybe this is a hard task, and certainly rather outside the intentions of SF (the standard) for polygons as objects tessellating space. What are the data? From the SO thread, in which there is an explanation, we see that each respondent draws geometries (1 and 2 in test_set):

image

and you are interested in the area-counted intersections (which polygons are marked by 1, 2, 3 ... respondents). So actually going to a raster representation is possibly a better choice anyway, and GEOS intersection (intended for finding the shared polygon but not necessarily how many overlap) isn't a good fit.

@jebyrnes
Copy link
Author

jebyrnes commented Jan 4, 2019

We are indeed ultimately interested in the area of overlapping areas at different thresholds. For more background, this comes from Floating Forests, a citizen science project using Landsat data to derive time series of giant kelp - https://www.zooniverse.org/projects/zooniverse/floating-forests and we also have a preliminary paper with some validation of the approach (that describes it more clearly) at https://arxiv.org/abs/1801.08522 .

The reason we ultimately want vector rather than raster representations of consensus classifications is twofold. First - ease of storage (file size!) for sharing data among our citizen scientists. Given that we have 30 years of bi-monthly photos of the world's coastlines, we're trying to be as efficient as possible with respect to storage to better enable accessibility as well as re-use for public visualization products. Second - intersecting polygons is just generally faster than going from polygons to rasters and back (that polygonize step can be a doozy computationally - although I am thanking my lucky stars for spex, as for our previous pipeline, that library wasn't around yet), so, we're trying to make as efficient a data pipeline as possible for rapid turnaround of data - primarily for educational purposes (i.e., we are working on curricula for grades 7 through PhD marine ecology and GIS classes where we'd love to have using the platform and resulting data be part of the process).

Third (I know, I said two), the more efficient a pipeline we can develop, the better for future Zooniverse (our collaborating organization - the citizen science wing of Adler) projects that use this freehand-select tool. We're hoping that the methods we develop for this project can be used by other projects in the future, so no wheel-reinvention necessary (their freehand selection tool is rather a new addition). Heck, if you think there's something we should suggest to the developers there on the interface side, I'm happy to forward that along.

OK, that was a lot of background, but, i thought you might be interested and/or it might give you some insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants