Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend sf over sp? #47

Closed
maelle opened this issue Apr 11, 2018 · 19 comments
Closed

Recommend sf over sp? #47

maelle opened this issue Apr 11, 2018 · 19 comments

Comments

@maelle
Copy link
Member

maelle commented Apr 11, 2018

@jhollist @ateucher @Robinlovelace @mpadge cc @sckott

Dear geospatial+rOpenSci R people I can think of right now, do you think it would make sense for rOpenSci to recommend sf over sp in the onboarding guidelines?

Here are our current package recommendations, e.g. xml2 over XML.

@Robinlovelace
Copy link
Member

My 10 cents: yes!

@jhollist
Copy link
Member

I agree with @Robinlovelace. That being said some flexibility should be allowed given the state of flux with spatial, especially since raster doesn't yet fully support sf objects out of the box and requires coercing to sp.

@mpadge
Copy link
Member

mpadge commented Apr 11, 2018

My 👍 to @jhollist

recommend_sf <- FALSE
if (!"raster" %in% pkg.deps)
  recommend_sf <- TRUE

if (!recommend_sf){
  # be careful
}

That condition could be modified through extending @noamross's fasterize?

@maelle
Copy link
Member Author

maelle commented Apr 11, 2018

thanks for the feedback everyone! I forgot to tag @mdsumner 🤦‍♀️ @mdsumner, what's your opinion?

@ateucher
Copy link
Member

I also agree it's best at this point to recommend sf, with the same caveats as @jhollist. There is a lot of the r world that is still firmly in the sp/rgal/rgeos ecosystem though. Lately I've been making sf the "first-class citizen" and include sp in Suggests, and support it by way of sf's pretty great methods for coercion to/from.

@mpadge
Copy link
Member

mpadge commented Apr 11, 2018

To complicate matters just a bit, there is the beastly nature of sf to consider. To borrow some code from @hrbrmstr:

library (fs)
library (tidyverse)
installed.packages() %>%
    as_data_frame() %>%
    mutate(pkg_dir = sprintf("%s/%s", LibPath, Package)) %>%
    select(Package, pkg_dir) %>%
    filter(Package %in% c ("sf", "sp")) %>%
    mutate(pkg_dir_size = map_dbl(pkg_dir, ~{
              fs::dir_info(.x, all=TRUE, recursive=TRUE) %>%
              summarise(tot_dir_size = sum(size)) %>% 
              pull(tot_dir_size)
    }) %>%
       format(big.mark = ",")) %>%
    select (Package, pkg_dir_size)
#  Package  pkg_dir_size
#  sf       18,902,203
#  sp       2,093,160

Recommending a package that is almost 10 times the size of its predecessor; that represents one of the largest installs in anybody's site-library; and that by my reckoning is close to 20GB, by @hrbrmstr's more like 113GB, would suggest something like:

if (recommend_sf){
  # be careful
}

@hrbrmstr
Copy link

ohai!

(MB not GB :-)

I was shocked when I saw the results of the lib dir sizes.

There's also this to consider:

sp dep tree
image

vs

sf dep tree
image

sf brings a ton of heavy hitters along for the ride.

@maelle
Copy link
Member Author

maelle commented Apr 14, 2018

Thanks all! A lot of food for thought! 🤔 💥

@Robinlovelace
Copy link
Member

Cannot compete with Bob's viz (awesome stuff - shows how interdependent we're all becoming which I recall you saying is a tendency to resist!) but just thought I'd flag that many of those pkgs will already be installed on many useRs computers, especially magrittr + Rcpp given their popularity. Not sure about e1071 but I'd argue that uptake of units is no bad thing (notwithstanding its potentially painful dependency on the C library udunits2):

image

Code: https://github.com/Robinlovelace/geocompr/blob/master/code/cranlogs.R but starting out with this:

pkgs = c("Rcpp", "MASS", "e1071", "units", "magrittr")
dd = cranlogs::cran_downloads(packages = pkgs,
                              from = "2013-01-01", to = Sys.Date())

@mdsumner
Copy link

I don't think it's possible to make a blanket recommendation for onboarding, sf is really not for package devs, it's just not that kind of tool.

@cboettig
Copy link
Member

Just wondering if it is worth re-framing the question somewhat: is there some approach that developers of packages working with spatial data (e.g., accessing a data repository or API that distributes spatial data) can do to make their packages more sf-ready?

Or more generally, would it make sense to have some recommendations about spatial data standards in rOpenSci not necessarily tied to any particular package ecosystem? e.g. for starters, "packages working with spatial data should declare the projection used". Would it make sense to suggest something like geojson as a preferred format for vector spatial data (unless the data needs one of those unusual simple features not part of the geojson standard?) Or is that unnecessarily prescriptive (i.e. users are better off always firing up the sf / libgdal swiss-army knife and enjoy agnosticism to filetype?)

@noamross
Copy link
Contributor

A few points:

  1. I was fascinated when @mdsumner submitted a PR to remove sf dependency from a package of mine that does nothing but transform sf objects. It made me realize that the sf package is more like a reference implementation of this object type, along with a number of tools for dealing with it, and other geospatial package developers are building and modifying sf objects with custom code.

  2. @jhollist's framed the issue well in his NLMR review, where he recommended authors have a plan to "future-proof" the package to deal with the changing geospatial ecosystem. I think we should include some guidance to authors and reviewers that they consider how their code will stay compatible with volatile parts of the R ecosystem. This would be a highly general top-level concept, (along with performance, documentation, etc.), but then we can provide some more specific guidance on areas such as geospatial, text, etc.

  3. @cboettig I do think that those recommendations are a little too prescriptive, but I do think that recommendations for future-proofing/compatibility consider file format choice and sufficient storage of metadata to deal with changes in the future.

@mpadge
Copy link
Member

mpadge commented Nov 18, 2021

For the same of completeness here, here is a link to an issue thread where @edzer states:

the maintainer of sp has made it clear that new software should build upon sf, not sp. There is no longer a reason to continue building new software using sp classes.

@edzer
Copy link

edzer commented Nov 18, 2021

Now that @rsbivand has retired, rgdal, maptools and rgeos will retire as well, by the end of 2023. To do that, sp will have to depend on sf: right now sp uses rgdal and rgeos for important tasks (validation of CRS, hole-in-polygon detection, although rgdal and rgeos are "just" in Suggests:. sp is not the way forward.

@rsbivand
Copy link

rsbivand commented Nov 19, 2021

Any role for a maintenance-mode only sp going forward is to keep packages using spatial vector representations and unwilling to migrate to sf directly from breaking. For similarly placed packages using S4 classes inheriting from sp gridded classes, the picture is more complex, as stars and terra are arguably both useful alternatives, and should both be supported, but with no recommended preference (use cases should decide for themselves). So for vector data, yes, only recommend sf; for raster data, probably no call should be made beyond pointing out that raster's use of rgdal to access GDAL has EOL to the end of 2023 latest.

@mpadge
Copy link
Member

mpadge commented Nov 19, 2021

Thanks @edzer and @rsbivand for useful clarification! We'll update our recommended scaffolding section of our Development Guide to restate what you've clarified here. Much appreciated, and all the best to you @rsbivand for your days and years ahead! The entire R community will long remain deeply indebted to your contributions!

@rsbivand
Copy link

Thanks @mpadge ! The proposed scaffolding text looks well-balanced.

@mpadge
Copy link
Member

mpadge commented Nov 19, 2021

Ping our Associate Editor for Spatial Software, @Paula-Moraga, so you're aware of these updates 😄 Our automated checking system will notify whenever these (potentially) obsolete packages are used, so you'll see any such cases straight away.

@maelle
Copy link
Member Author

maelle commented Jan 4, 2022

Resolved by @mpadge's PR to the dev guide ropensci/dev_guide#362

Thanks to all those who chimed in!

@maelle maelle closed this as completed Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests