Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Self-intersection (improper polygon definition) #197

Closed
retostauffer opened this issue Dec 7, 2020 · 6 comments
Closed

Problem with Self-intersection (improper polygon definition) #197

retostauffer opened this issue Dec 7, 2020 · 6 comments
Assignees
Labels

Comments

@retostauffer
Copy link
Contributor

Dear all,

Problem description

One of our students is using your lovely package to interpolate spatial data to NUTS areas. We encountered a problem when using the sf object returned by get_eurostat_geospatial(). In combination with e.g., st_interpolate_aw() (areal weighted interpolation). The error thrown:

Error in CPL_geos_op2(op, x, y) :                                                                        
  Evaluation error: TopologyException: Input geom 1 is invalid: Self-intersection at or near point .....

Cause of the problem: The polygons are not defined properly; holes and islands are not treated as such. This inherits from the definition of the boundaries from the original *.geojson data sets downloaded via https://ec.europa.eu/...; objects/boundaries defined as LINESTRING instead of MULTIPOLYGON.

Possible fix: As I assume changing the original data set is not an option one fix could be applied in get_eurostat_geospatial(). The problem can be fixed by applying a buffer with width 0 (sf::st_buffer(x, dist = 0)) to the polygons. This returns a MULTIPOLYGON (if needed).

Reproducible example

Reproduce error:

library("eurostat")
library("sf")
     
# Spatial (poligonized) data set minimal; One single square with a value of 1234.
poly <- st_polygon(list(matrix(c(0, 0, 60, 60, 0, 0, 60, 60, 0, 0), ncol = 2, byrow = FALSE)))
data <- st_sf(data.frame(geom = st_sfc(poly), data = 1234), crs = st_crs(4326))

# Loading eurostat data set, NUTS2
europe <- get_eurostat_geospatial(output_class = "sf", resolution = "60", nuts_level = 2, year = 2013, crs = 4326)

# Interpolation: Fails due to Self-intersection of some of the polygons in 'europe'
st_interpolate_aw(data, europe, extensive = FALSE)

Possible fix

# Apply st_buffer to each geometry, keep CRS
st_geometry(europe) <- st_sfc(lapply(st_geometry(europe), function(x) st_buffer(x, 0)), crs = st_crs(europe))
st_interpolate_aw(data, europe, extensive = FALSE)

Software used:

  • R version: 4.0.2
  • Package version sf: 0.9.6
  • Package eurostat: 3.6.85 (installed via github; today).

As I cannot attach an .html or .Rmd document, here is a link to a slightly more extensive description of the issue highlighting the problem by showing some of the polygons causing the problem.

Thanks a lot,
Reto

@antagomir
Copy link
Member

Thanks!

The solution seems feasible to me, and I think this could be added in the tutorial.

Additionally, we could consider adding an informative error message with a suggested solution (or even an optional automated fix through function arguments?) in st_interpolate_aw.

Would be curious to hear if @muuankarski has comments on this before we proceed.

@retostauffer
Copy link
Contributor Author

retostauffer commented Dec 7, 2020

Thank you very much for your fast response. "Having some fun in the morning" turned into spending a decent amount of time on this :).

Your answer motivated me to dig a bit further and I have found some issues on r-spatial/sf#347 discussing the same issue. I assume a fix in sf::st_interpolate_aw() is not necessary/the best option as the problem is the geometry of the polygons returned by get_eurostat_geospatial() and not the sf package/functions. Suggestions by one of the maintainers of sf (@edzer):

  • st_intersection(points, st_make_valid(poly)) (not available on older versions of the package)
  • st_intersection(points, st_buffer(poly, 0))

As I now better understand where the problem comes from, the error message is actually quite informative (Input geom 1 is invalid: Self-intersection; GEOS library).

Thoughts

  • Adding it to the tutorial might be a good interim solution.
  • As it relates the geometry returned by this package it might be better to include it in get_eurostat_geospatial(). However, I can understand thoughts regarding backwards compatibility.
  • Adding it to get_eurostat_geostpatial() would ensure that the geometry returned is proper (proper multipolygon definition). Any idea how many problems this might cause (backwards compatibility)?
  • Adding an additional flag make_valid = FALSE to get_eurostat_geospatial(); check for Self-intersections (see function find_dup() https://retostauffer.org/trash/eurostat-improper-polygons.html) and throw a warning/deprecation warning and switch to make_valid = TRUE in X months?

Not sure what's the most handy way. Appreciate your efforts!

All best,
R

@antagomir
Copy link
Member

This seems a good solution to me (including the last point). I would not be too worried with backward compatibility when implemented this way.

Any chance that you could make a PR..? Would be highly appreciated & acknowledged.

@retostauffer
Copy link
Contributor Author

@antagomir appreciate your response; will try to work on a PR before Christmas. No promise as the schedule looks somehow dense, but hope I can find some time for that. I'll come back to you a.s.a.p.

@antagomir
Copy link
Member

Great!

@muuankarski
Copy link
Contributor

Greatly appreciated @retostauffer. Keep us posted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants