New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_erase_overlaps #598

Merged
merged 2 commits into from Dec 20, 2017

Conversation

Projects
None yet
2 participants
@jeffreyhanson
Contributor

jeffreyhanson commented Dec 17, 2017

Hi,

First of all, thank you very much for developing the sf R package. I really appreciate all the work you've done, and I think this is a game-changer for processing spatial data in R.

I would like to see the functionality to systematically and quickly remove overlapping areas from geometries an sf/sfc object. To help start a discussion about this, I've put together this pull request includes an attempted implementation---st_erase_overlaps---and corresponding documentation and unit tests. Below, I've included two plots from the examples that show the current implementation.

Briefly, the st_erase_overlaps R function is implemented in geom.R. This function uses the compiled function CPL_erase_overlaps in geos.cpp to do most of the processing. The CPL_erase_overlaps function works by iterating over each geometry, finding out if the i'th geometry overlaps with the previously indexed geometries, erasing the overlapping parts, and appending the (potentially) updated geometry to output. Currently, geometries are assumed to be valid, geometries completely contained in previous geometries are removed, empty geometries are discarded, and trees are used to run overlapping tests faster.

Also, just in case you're not aware, I noticed that running devtools::document() caused some changes to the NAMESPACE and the documentation for existing functions in the package. I have not committed changes that do not pertain to the st_erase_overlaps functionality to avoid confusion.

Finally, I've tried following this packages conventions, but please tell me know if I've missed anything or if there is a better name for the function than st_erase_overlaps . I don't have any prior experience in programming with GEOS, so please let me know if you have any suggestions for improving the functionality or performance.

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Dec 17, 2017

Member

Thanks for expressing interest and getting your hands dirty on sf & GEOS programming! This is one of the first functions doing geom operations over sets of features (besides st_union), and we may want to put a bit more thought in this before going here. First of all, what is the motivating use case behind this? Since the outcome clearly depends on the order of features, what would be a good order?

Another instances of the set problem I've seen is e.g. getting all self-intersecting sub-polygons (see here), and related to that, counting the number of overlaps; these two operations do not depend on order. (There are a few more posts on SO when you search for counting overlapping polygons.)

Name: st_erase_overlaps does the same as st_difference for a pair of geometries; from the name it is not clear what the difference is. Also, a method for sfg does not seem to make sense.

I've managed to be quite conservative, so far, in accepting new function/method names; I would like to have an overview first of all kind of things we can do on sets of features / feature geometries before accepting implementations / name space additions for special cases.

Member

edzer commented Dec 17, 2017

Thanks for expressing interest and getting your hands dirty on sf & GEOS programming! This is one of the first functions doing geom operations over sets of features (besides st_union), and we may want to put a bit more thought in this before going here. First of all, what is the motivating use case behind this? Since the outcome clearly depends on the order of features, what would be a good order?

Another instances of the set problem I've seen is e.g. getting all self-intersecting sub-polygons (see here), and related to that, counting the number of overlaps; these two operations do not depend on order. (There are a few more posts on SO when you search for counting overlapping polygons.)

Name: st_erase_overlaps does the same as st_difference for a pair of geometries; from the name it is not clear what the difference is. Also, a method for sfg does not seem to make sense.

I've managed to be quite conservative, so far, in accepting new function/method names; I would like to have an overview first of all kind of things we can do on sets of features / feature geometries before accepting implementations / name space additions for special cases.

@jeffreyhanson

This comment has been minimized.

Show comment
Hide comment
@jeffreyhanson

jeffreyhanson Dec 18, 2017

Contributor

The motivating case for me is to write an R package for downloading and cleaning the World Database on Protected Areas (WDPA; https://www.protectedplanet.net/). This database describes the spatial distribution of protected areas globally. Each protected area is also associated with additional information including: year established, designation, administrative authority, IUCN category, and much more. For an example, check out the entry for the Diamantina protected area in Australia (note that the online version does not provide all of the fields associated with each protected area).

In my field of research, this database is commonly used for "gap analyses" (see Rodrigues et al. 2004, Butchart et al. 2015, Runge et al. 2015 for examples). These "gap analyses" aim to identify species or ecosystems that are not adequately conserved in protected areas. This is generally achieved by setting a target (desired) amount of habitat for each species, and determining if the total amount of habitat for a given species inside protected areas exceeds this target. However, before these analyses can be performed, the World Database on Protected Areas needs to be cleaned. Since the WDPA is rather large (approx. 1.1 GB), the methods for cleaning the WDPA also need to be efficient.

Among the various steps involved in cleaning the WDPA (also see the supporting information for Runge et al. 2015), one of the key steps involves removing overlapping geometries from protected areas. Otherwise, in cases where multiple protected areas overlap, the gap analysis will "double count" the same habitat as being protected, and the results may erroneously indicate that a species is adequately protected when it is not. Conventionally, overlapping areas are removed by dissolving the data after removing all invalid areas (e.g. protected areas that are currently "proposed" and not "implemented"). However, this approach removes all other information (e.g. year established, IUCN category, protected area name) that can be useful for interpreting the results of a gap analysis. Therefore, I would like a function for erasing overlapping areas from existing geometries, while still retaining information on the identity and attribute data for each protected area.

I think the order for removing overlapping areas depends on the data set of interest. Therefore, I would imagine users would need to sort their data prior to removing overlapping areas. For instance, when processing the WDPA, I would first sort the data by IUCN category and year established, so that overlapping geometries are retained for areas with better management and areas with historical precedence. The code for achieving this might look something like this:

wdpa2 <- wdpa %>% arrange(IUCN_CATEGORY, YEAR) %>% st_erase_overlaps()

Also, with regards to sfg objects, yeah, when I was working on this yesterday, I didn't think that st_erase_overlaps would make sense for sfg objects. But I was thinking more about this last night, and I think that POLYGONs in a MULTIPOLYGON could overlap with each other, so st_erase_overlaps could and should also work with sfg objects. However, the current implementation of st_erase_overlaps does not handle this---and neither does it handle this for sf/sfc objects---so I would need to fix this.

I'm only beginning to switch over to sf from sp/rgeos, and I'm not sure how st_difference could be used to achieve the same functionality as st_erase_overlaps. Could you please provide a short example? If the proposed functionality is already in sf, then I'll just close this pull request as I don't think it's worth having multiple functions that do the same thing.

Sorry for the long post, I would be really keen to hear what you think, especially if this functionality is already present in sf.

References

Butchart, S. H., Clarke, M., Smith, R. J., Sykes, R. E., Scharlemann, J. P., Harfoot, M., ... & Brooks, T. M. (2015). Shortfalls and solutions for meeting national and global conservation area targets. Conservation Letters, 8(5), 329-337.

Rodrigues, A. S., Akcakaya, H. R., Andelman, S. J., Bakarr, M. I., Boitani, L., Brooks, T. M., ... & Hoffmann, M. (2004). Global gap analysis: priority regions for expanding the global protected-area network. BioScience, 54(12), 1092-1100.

Runge, C. A., Watson, J. E., Butchart, S. H., Hanson, J. O., Possingham, H. P., & Fuller, R. A. (2015). Protected areas and global conservation of migratory birds. Science, 350(6265), 1255-1258.

Contributor

jeffreyhanson commented Dec 18, 2017

The motivating case for me is to write an R package for downloading and cleaning the World Database on Protected Areas (WDPA; https://www.protectedplanet.net/). This database describes the spatial distribution of protected areas globally. Each protected area is also associated with additional information including: year established, designation, administrative authority, IUCN category, and much more. For an example, check out the entry for the Diamantina protected area in Australia (note that the online version does not provide all of the fields associated with each protected area).

In my field of research, this database is commonly used for "gap analyses" (see Rodrigues et al. 2004, Butchart et al. 2015, Runge et al. 2015 for examples). These "gap analyses" aim to identify species or ecosystems that are not adequately conserved in protected areas. This is generally achieved by setting a target (desired) amount of habitat for each species, and determining if the total amount of habitat for a given species inside protected areas exceeds this target. However, before these analyses can be performed, the World Database on Protected Areas needs to be cleaned. Since the WDPA is rather large (approx. 1.1 GB), the methods for cleaning the WDPA also need to be efficient.

Among the various steps involved in cleaning the WDPA (also see the supporting information for Runge et al. 2015), one of the key steps involves removing overlapping geometries from protected areas. Otherwise, in cases where multiple protected areas overlap, the gap analysis will "double count" the same habitat as being protected, and the results may erroneously indicate that a species is adequately protected when it is not. Conventionally, overlapping areas are removed by dissolving the data after removing all invalid areas (e.g. protected areas that are currently "proposed" and not "implemented"). However, this approach removes all other information (e.g. year established, IUCN category, protected area name) that can be useful for interpreting the results of a gap analysis. Therefore, I would like a function for erasing overlapping areas from existing geometries, while still retaining information on the identity and attribute data for each protected area.

I think the order for removing overlapping areas depends on the data set of interest. Therefore, I would imagine users would need to sort their data prior to removing overlapping areas. For instance, when processing the WDPA, I would first sort the data by IUCN category and year established, so that overlapping geometries are retained for areas with better management and areas with historical precedence. The code for achieving this might look something like this:

wdpa2 <- wdpa %>% arrange(IUCN_CATEGORY, YEAR) %>% st_erase_overlaps()

Also, with regards to sfg objects, yeah, when I was working on this yesterday, I didn't think that st_erase_overlaps would make sense for sfg objects. But I was thinking more about this last night, and I think that POLYGONs in a MULTIPOLYGON could overlap with each other, so st_erase_overlaps could and should also work with sfg objects. However, the current implementation of st_erase_overlaps does not handle this---and neither does it handle this for sf/sfc objects---so I would need to fix this.

I'm only beginning to switch over to sf from sp/rgeos, and I'm not sure how st_difference could be used to achieve the same functionality as st_erase_overlaps. Could you please provide a short example? If the proposed functionality is already in sf, then I'll just close this pull request as I don't think it's worth having multiple functions that do the same thing.

Sorry for the long post, I would be really keen to hear what you think, especially if this functionality is already present in sf.

References

Butchart, S. H., Clarke, M., Smith, R. J., Sykes, R. E., Scharlemann, J. P., Harfoot, M., ... & Brooks, T. M. (2015). Shortfalls and solutions for meeting national and global conservation area targets. Conservation Letters, 8(5), 329-337.

Rodrigues, A. S., Akcakaya, H. R., Andelman, S. J., Bakarr, M. I., Boitani, L., Brooks, T. M., ... & Hoffmann, M. (2004). Global gap analysis: priority regions for expanding the global protected-area network. BioScience, 54(12), 1092-1100.

Runge, C. A., Watson, J. E., Butchart, S. H., Hanson, J. O., Possingham, H. P., & Fuller, R. A. (2015). Protected areas and global conservation of migratory birds. Science, 350(6265), 1255-1258.

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Dec 18, 2017

Member

Thanks for the explanation.

Single feature MULTIPOLYGON geometries with overlapping sub-polygons are invalid, and can be repaired by lwgeom::st_make_valid or st_buffer with radius zero. I said that st_difference can do the same as st_erase_overlaps only for a single pair of geometries. st_erase_overlaps essentially applies st_difference in a double loop over the whole set.

My real point is that (i) your use case is narrow, adding a function or method for every narrow use case lets the package namespace explode, and (ii) the name st_erase_overlaps may need more thinking, in particular if we want to go for operations on sets of feature geometries in a more general way (count overlaps, find all unique polygons).

Member

edzer commented Dec 18, 2017

Thanks for the explanation.

Single feature MULTIPOLYGON geometries with overlapping sub-polygons are invalid, and can be repaired by lwgeom::st_make_valid or st_buffer with radius zero. I said that st_difference can do the same as st_erase_overlaps only for a single pair of geometries. st_erase_overlaps essentially applies st_difference in a double loop over the whole set.

My real point is that (i) your use case is narrow, adding a function or method for every narrow use case lets the package namespace explode, and (ii) the name st_erase_overlaps may need more thinking, in particular if we want to go for operations on sets of feature geometries in a more general way (count overlaps, find all unique polygons).

@edzer edzer referenced this pull request Dec 19, 2017

Closed

n-ary geometry functions #600

@edzer edzer merged commit a052e49 into r-spatial:master Dec 20, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Dec 20, 2017

Member

I moved st_erase_overlaps to st_difference(x), i.e. with a single argument. I also removed some memory leaks, and added similar behavior for st_intersection.

Member

edzer commented Dec 20, 2017

I moved st_erase_overlaps to st_difference(x), i.e. with a single argument. I also removed some memory leaks, and added similar behavior for st_intersection.

@jeffreyhanson

This comment has been minimized.

Show comment
Hide comment
@jeffreyhanson

jeffreyhanson Dec 21, 2017

Contributor

Awesome - thank you very much!

Contributor

jeffreyhanson commented Dec 21, 2017

Awesome - thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment