Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
First of all, thank you very much for developing the
I would like to see the functionality to systematically and quickly remove overlapping areas from geometries an
Also, just in case you're not aware, I noticed that running
Finally, I've tried following this packages conventions, but please tell me know if I've missed anything or if there is a better name for the function than
Thanks for expressing interest and getting your hands dirty on sf & GEOS programming! This is one of the first functions doing geom operations over sets of features (besides
Another instances of the set problem I've seen is e.g. getting all self-intersecting sub-polygons (see here), and related to that, counting the number of overlaps; these two operations do not depend on order. (There are a few more posts on SO when you search for counting overlapping polygons.)
I've managed to be quite conservative, so far, in accepting new function/method names; I would like to have an overview first of all kind of things we can do on sets of features / feature geometries before accepting implementations / name space additions for special cases.
The motivating case for me is to write an R package for downloading and cleaning the World Database on Protected Areas (WDPA; https://www.protectedplanet.net/). This database describes the spatial distribution of protected areas globally. Each protected area is also associated with additional information including: year established, designation, administrative authority, IUCN category, and much more. For an example, check out the entry for the Diamantina protected area in Australia (note that the online version does not provide all of the fields associated with each protected area).
In my field of research, this database is commonly used for "gap analyses" (see Rodrigues et al. 2004, Butchart et al. 2015, Runge et al. 2015 for examples). These "gap analyses" aim to identify species or ecosystems that are not adequately conserved in protected areas. This is generally achieved by setting a target (desired) amount of habitat for each species, and determining if the total amount of habitat for a given species inside protected areas exceeds this target. However, before these analyses can be performed, the World Database on Protected Areas needs to be cleaned. Since the WDPA is rather large (approx. 1.1 GB), the methods for cleaning the WDPA also need to be efficient.
Among the various steps involved in cleaning the WDPA (also see the supporting information for Runge et al. 2015), one of the key steps involves removing overlapping geometries from protected areas. Otherwise, in cases where multiple protected areas overlap, the gap analysis will "double count" the same habitat as being protected, and the results may erroneously indicate that a species is adequately protected when it is not. Conventionally, overlapping areas are removed by dissolving the data after removing all invalid areas (e.g. protected areas that are currently "proposed" and not "implemented"). However, this approach removes all other information (e.g. year established, IUCN category, protected area name) that can be useful for interpreting the results of a gap analysis. Therefore, I would like a function for erasing overlapping areas from existing geometries, while still retaining information on the identity and attribute data for each protected area.
I think the order for removing overlapping areas depends on the data set of interest. Therefore, I would imagine users would need to sort their data prior to removing overlapping areas. For instance, when processing the WDPA, I would first sort the data by IUCN category and year established, so that overlapping geometries are retained for areas with better management and areas with historical precedence. The code for achieving this might look something like this:
Also, with regards to
I'm only beginning to switch over to
Sorry for the long post, I would be really keen to hear what you think, especially if this functionality is already present in
Butchart, S. H., Clarke, M., Smith, R. J., Sykes, R. E., Scharlemann, J. P., Harfoot, M., ... & Brooks, T. M. (2015). Shortfalls and solutions for meeting national and global conservation area targets. Conservation Letters, 8(5), 329-337.
Rodrigues, A. S., Akcakaya, H. R., Andelman, S. J., Bakarr, M. I., Boitani, L., Brooks, T. M., ... & Hoffmann, M. (2004). Global gap analysis: priority regions for expanding the global protected-area network. BioScience, 54(12), 1092-1100.
Runge, C. A., Watson, J. E., Butchart, S. H., Hanson, J. O., Possingham, H. P., & Fuller, R. A. (2015). Protected areas and global conservation of migratory birds. Science, 350(6265), 1255-1258.
Thanks for the explanation.
My real point is that (i) your use case is narrow, adding a function or method for every narrow use case lets the package namespace explode, and (ii) the name