Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A spatial tidyverse? #13

Closed
tim-salabim opened this issue Jun 26, 2017 · 43 comments
Closed

A spatial tidyverse? #13

tim-salabim opened this issue Jun 26, 2017 · 43 comments

Comments

@tim-salabim
Copy link
Member

tim-salabim commented Jun 26, 2017

This idea has come up several times, most notably in r-spatial/rspatial_spark#5 by @jhollist and in #11 (comment) by @Robinlovelace.

I think it's worth a separate issue, so here we go.

The idea is simple (and quite intriguing), to have an equivalent to the tidyverse meta-package for spatial analyses needs.
Assuming we would consider creating such a meta-package think the most obvious and important question is:

  • which packages do we think constitute the spatial analysis skeleton?

I have made an earlier attempt in visualising this (heavily adopted from the tidyverse)

rminimum_workflow

If we can answer this (i.e. if we can somehow agree on a core set of packages for spatial workflow) then there are obviously other questions that arise:

  1. shall we try to have a common lexical structure (like the tidyverse), or is spatial enough as a common denominator?
  2. how can we ensure smooth un-interrupted development (with potentially many different developers involved)?
  3. how can we feasibly handle all the external libs (proj, GDAL, etc), with e.g. docker?
  4. ...

I leave this here, as I think many more questions will arise but it is only feasible to think about those if there is any solid plan to introduce a spverse, spidyverse, ...

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Jun 26, 2017

I've been calling it the sfverse - well trying not to call it that as I know the risks of pre-emptive labelling:

This era (which we refrain from labelling the sfverse with any sersiousness, awaiting a better name!) clearly has the wind in its sails and is set to dominate future developments in R's spatial ecosystem for years to come.

But I think sfverse is better than your suggestions - too close to pverse, which some people think this idea is! sverse or geoverse are also good options I think.

geocompx/geocompr@0a8f9c4#diff-759a34bebfed70ddfb67a5477a359888R126

@Robinlovelace
Copy link
Contributor

Reflecting on the options I've seen/considered I think geoverse would be the best at communicating the intentions of such a move if it were to happen.

@edzer
Copy link
Member

edzer commented Jun 26, 2017

I don't see why the word should end in -verse, we're not subordinate to another verse are we?

Also, spatial (space) is wider than geo (Earth) - lots of spatial data is not Earth-bound.

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Jun 26, 2017

True but it could be an useful HT (tip of the hat, but in a non macho, pro collaborative way) to the tidyverse which clearly inspired the idea.

I suspect people wishing to map the surface of Pluto would not be put off by the word geo if they are not put-off by the word Geographical in QGIS, as seems to be the case for at least one person.

@Robinlovelace
Copy link
Contributor

But I do see that there are spatial applications, e.g. cell biology which are not geographical. spatialverse would resolve that issue and be pretty unambiguous. BTW rapid prototyping @tim-salabim, nice!

@tim-salabim
Copy link
Member Author

@Robinlovelace as I say above, this was done a few months ago.
In any way, I think it is more important to think about content first than trying to find a suitable name.

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Jun 26, 2017

Although I recently watched Donald Knuth's keynote speech at UseR who spoke of the importance of names. Discussing names can help clarify what the aim is, although I can see that it risks bikeshedding.

Thanks to Edzer for introducing me to that concept - only just discovered it has an interesting etymology!

@tim-salabim
Copy link
Member Author

Another dimension is the external lib dependencies (proj4, GDAL, GEOS, etc.). I've updated the first comment accordingly (bullet-point 3.)

@edzer
Copy link
Member

edzer commented Jun 26, 2017

See also this thread with ideas about modularizing sf; maybe add lwgeom as another external dependency some might like to have in CRAN binary builds, and see these docker files.

@jhollist
Copy link

Since you mentioned docker in bullet point 3, there is also rocker/geospatial. And thanks for starting this issue!

@Robinlovelace
Copy link
Contributor

Heads-up: I've just registered https://github.com/geoverse before anyone (non R-y) does in the event that it's useful. Happy to pass-over ownership of this over to someone currently in r-spatial. I think

install.packages("geoverse")
library(geoverse)

to load a selection of R packages that work together nicely (e.g. without dplyr::select boshing raster::select) would be really useful.

@Robinlovelace
Copy link
Contributor

@tim-salabim I believe you created r-spatial. 1 option would be to transfer stuff to geoverse - I have no issue with passing over the keys 🔑

@edzer
Copy link
Member

edzer commented Sep 11, 2017

I suggest to first create these packages that work together nicely; see also tidyverse/tidyr#360 .

@tim-salabim
Copy link
Member Author

Why do we need another organisation repo for a geoverse package?

@Robinlovelace
Copy link
Contributor

We probably don't - you could just have r-spatial/geoverse I guess. Just a suggestion in the hope it would be useful. Agree with Edzer that the priority should be getting stuff that worked. Hopefully it will be simpler and therefore less prone to issues than the tidyverse but that will be quite a mission.

@Robinlovelace
Copy link
Contributor

What's in a name? A lot actually, but seeing as things have gravitated here I'm now thinking that it will could cause more confusion that clarity to create a new org so I take back my proposal - orgs are easy to delete! On that note, what is the procedure by which pkgs go here? Could be useful to have an 'onboarding' procedure. The current pkgs certainly look like some of the building blocks of a geoverse.

@edzer
Copy link
Member

edzer commented Sep 11, 2017

I will not start thinking about an "onboarding" procedure before we have received five non-trivial requests for packages wanting to join the r-spatial org.

@ateucher
Copy link
Contributor

ateucher commented Sep 11, 2017

Note that rOpenSci has also just expanded their scope to explicitly include geospatial packages. Competing for packages shouldn't be a goal, so it would be worth thinking about how to harmonize with them if there are packages that rightly belong in rOpenSci and the geo/spatialverse (or whatever it ends up being called).

@edzer
Copy link
Member

edzer commented Sep 11, 2017

@ateucher being active in both communities, what would you suggest as of when a package rightly belongs in one or the other?

@ateucher
Copy link
Contributor

@edzer it's hard to say. rOpenSci's scope is definitely much wider - they aren't trying to create a 'verse in that their packages aren't explicitly designed to work together for a single purpose, whereas the tidyverse packages are. Is this the goal of the spatialverse as well? I.e., is it a suite of spatial packages that are designed to work really well together? Or is it a suite of packages that constitute the basic framework for spatial analysis regardless of style (E.g., would raster and sf coexist in the spatialverse, or would we be waiting for stars)?

I don't think that answers your question, but it's hard to say until we know what the spatialverse really represents...

@edzer
Copy link
Member

edzer commented Sep 11, 2017

Thanks! You're right, we're looking into the stars...

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Sep 12, 2017

Here's a request for a pkg: a geoverse that would include, in the 1st instance, raster, and sf. I imagine stars would get added to that, as would plotting pkgs such as mapview, tmap and leaflet.

Sound like a plan? Should be a relatively lightweight pkg that could build on the experience of tidyverse, the core script of which seems to be this: https://github.com/tidyverse/tidyverse/blob/master/R/attach.R

Not sure if that is trivial or not but I'd be up for helping. @edzer would you recommend waiting to see how things pan out with stars and other things before proceeding with this and to everyone, do you think such a pkg would be best placed here or elsewhere?

Great to talk about things before doing them and think there is latent demand for this geoverse idea. If done well I think it could make methods for handling and plotting a variety of spatial data forms more accessible to more R users, an aim I'm sure we all share.

@edzer
Copy link
Member

edzer commented Sep 12, 2017

I would like to see it before I comment, so please go ahead, develop a package, and share with everyone why you think it is useful. Reasons why I would not do this now are (i) raster and sf don't work together, (ii) by calling something xxverse, what exactly would users reasonably expect, and (iii) is life right really so complicated that users need such a package? I always forget what is in dplyr and what is in tidyr, there tidyverse helps, but is this the case for spatial?

Will users expect tidyverse-like behavior? For sf this can work to some extent, but for array data the underlying assumption that data consists of a simple sequence of records does not hold. @mdsumner 's tidync might have solutions. Raster's select and dplyr's select are incompatible.

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Sep 12, 2017

Thanks for the comments and agree it would probably be worth waiting until sf and raster work together before doing it. The fact that there is a name clash between the select function is precisely the kind of issue that I would hope such a metapackage could resolve.

In answer to the 3 points: (i) point taken - any news on latest thinking on this welcome, or where to keep an eye on developments in this area? (ii) users can expect the Earth (or the stars ; ) but I think reasonable expectations would include consistency between and clarity about functions (e.g. decide between raster::crs and sf::st_csr - I'd favour the latter), zero name clashes (or at least a way of dealing with and warning users about them consistently) and an effort to document how the different pkgs in the 'verse' can be used together; iii) I don't think it's a matter of need so much as accessibility and user friendliness and trying to make life easy for people who are not experts in programming and namespace memorisation.

I find the pipe a really useful feature of the tidyverse and, as illustrated in these slides that @Nowosad and I put together, they can help improve readability: http://robinlovelace.net/presentations/spatial-tidyverse.html#1

My default plan is to hold fire, wait for further info/feedback, if people agree would be useful, could hack together such a package and 'submit' it here. Of course the proposal may get shredded at that stage which is fine but the hope would be that in the shredding process improvements or an alternative to the proposed pkg solution will be found.

Robinlovelace referenced this issue in geocompx/geocompr Sep 13, 2017
@jannes-m better to delete that comment to avoid confusion. Good to load at outset as it's reasonable to assume raster will be loaded after previous chapters.
@tim-salabim
Copy link
Member Author

tim-salabim commented Sep 13, 2017

For reference, I just listened to the quarterly ROpenSci call where they talked about editor work and code reviews. One side-topic was related to their scope so I asked how narrow/wide their geospatial scope is.
@sckott was so kind to point out a few of their key foci which he summed up here (3rd page, second last bullet).

There he also mentions a blog post with more detail on the geospatial efforts they undertook(-take).

Other points I gathered from the answer provided in the call:

  • No algorithms or geostats
  • Data manipulation and retrieval focused
  • Mainly focused on geojson

@jsta
Copy link

jsta commented Sep 13, 2017

One question I had was about naming prospective geoverse packages. I am considering creating an sf extension package that calculates polygon shape metrics. Should the package have sf in the name similar to ggplot extensions? Should functions have the st_ prefix?

@edzer
Copy link
Member

edzer commented Sep 13, 2017

I am not dogmatic when it comes to package names, but both make some sense, yes.

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Sep 14, 2017

Out of interest: @tim-salabim would you also be interested in creating such a pkg later down the line? Asking as you started this thread and seem to have a pretty clear vision of what it would be like. The diagram at the top of this thread is really useful for visioning the advantages of such a pkg and I'd be happy to contribute to such a vision/pkg as an alternative to hacking one together myself. On that note I've plenty of work to be getting with rather than gazing at the stars, transitioning stplanr to support sf: 12 down 9 to go!

@tim-salabim
Copy link
Member Author

At some stage, yes.
Though currently I feel that we're not ready for such a thing. breaking it down into the two major data models, I think vector focused things are coming along nicely, but I feel that there is still a void regarding raster focused analysis tools (other than the raster package) - not talking about stars here, rather from my vis perspective (e.g. performant raster rendering is still too immature).
One vision I had when calling for r-spatial was to get people like @bhaskarvk and @timelyportfolio on board to pitch in their expertise to extend the dev possibilities of us geo-folk. I think this has worked out ok so far. Yet, focussed efforts towards proper and stable tools is hard with a bunch of part-time hackers.

Personally, I am very curious about what the stars project will evolve into as the full workflow from input to output/vis is in the scope right from the beginning and it is (i guess) partly embedded in a larger research project with quite some funding. @edzer correct me if I'm wrong...

In a nutshell, I am hesitant to think about a meta-package before we have a solid block of modules to constitute such a thing. Tieing together loose ends is fine, but they should be ends, not unfinished "somewhere-in-the-middles".

@tim-salabim
Copy link
Member Author

In my opinion, things have progressed substantially since the last activity here (e.g. stars is way more mature, raster updates, terra on its way, all the recent developments by @SymbolixAU, ... the list goes on I'm sure).
Still, I cannot clearly envision a meta-package that would bundle all these developments neatly and take away the burden of having a good overview of what is available from the user.

I feel we should leave this open, as the discussion here is very informative and a solid base to build upon. Who knows, maybe there will be some spark that leads to something substantial someday...

@pat-s
Copy link
Member

pat-s commented Jan 15, 2020

As a hint, one could think about releasing a geoverse package bundling certain core infrastructure packages such as

  • rgdal
  • sf
  • mapview
  • etc

via the "Depends" section of a DESCR file like we do in the {mlr3verse} package.

@edzer
Copy link
Member

edzer commented Jan 15, 2020

Thanks for the hint! Technically realizing this is IMO the smallest problem, but this "verse" would only be meaningful if there were agreement on which packages go in, and which do not (your "etc."): raster? terra? RStoolbox, which builds on sp and rgdal? For this to answer we need to identify

  • which commonality is required for these packages?
  • who is going to put in the resources to make sure these commonalities are maintained and enforced, with which mandate?

My suggestion is to start working on these questions before doing the package that suggests having the answer.

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Jan 15, 2020

Raster and vector are the two most common data types so I think packages that handle those, work well together and are future proof would be key. It's still not crystal clear to me which raster processing package is most future-proof and compatible with sf, which I would see as the cornerstone of such a metapackage. Couple of questions about specific packages:

  • sf::gdal_utils() has some raster processing capabilities and I think the GDAL components of sf overall supercede rgdal. What can rgdal do that sf cannot?
  • Which packages would people recommend for raster data? My understanding is that raster is still the most feature complete and stable so my starting point would tentatively be just sf and raster, see how it goes and add additional dependencies later down the line.
  • What about stars?

@pat-s
Copy link
Member

pat-s commented Jan 15, 2020

My suggestion is to start working on these questions before doing the package that suggests having the answer.

Definitely.

My thoughts:

  • Only use the common ones everybody needs for spatial tasks into DEPENDS (because they are auto-attached)

    • rgdal
    • sf
    • raster
    • terra
    • stars
  • the extended core could list packages which should be installed when calling install.packages("geoverse") but are not automatically attached

    • mapview
    • ggspatial
    • RSAGA
    • RQGIS
    • rgrass7
    • etc

I am somewhat concerned about packages with a large dependency chain such as {RStoolbox} or {sentinel2r}. When having such in IMPORTS, installation will also install all recursive dep. We can discuss which pkg still qualifies and which not but I just want to raise awareness here.
Such pkgs could go into "SUGGESTS" since they are not automatically installed when calling install.packages() but only for install.packages(dependencies = TRUE).


On a side note, I'd like to mention that CRAN accepted the {mlr3verse} package with the label "expection". Therefore idk if a {geoverse} or {spatialverse} will be accepted.
This might trigger more fields to create a wrapper package just for loading.
However, if there is one "exception", everybody should be allowed to do so. And there is also the {tidyverse} pkg.

@Robinlovelace
Copy link
Contributor

I'd say reducing duplication of functionality between geoverse packages should be a priority.

@pat-s
Copy link
Member

pat-s commented Jan 15, 2020

I'd say reducing duplication of functionality between geoverse packages should be a priority.

This is always welcome but a completely different point here.

I'd say if there is an overall interest to do this then

@edzer @tim-salabim @rsbivand

could maybe take the lead in narrowing down a list for the three sections of the DESCR file outlined above.
To not loose overview, a gdoc or similar might be a better place to this than a GH issue.

@tim-salabim
Copy link
Member Author

One point to consider here is that, at least in my understanding, tidyverse is to some degree aimed at novices coming to R. Though not oficially stated, many design choices would suggest so. With this in mind I think the scope of such a meta package is much clearer than what we are envisioning here so far. I don't think we can serve everyones interest with one-package-to-rule-them-all.

@pat-s
Copy link
Member

pat-s commented Jan 15, 2020

With this in mind I think the scope of such a meta package is much clearer than what we are envisioning here so far. I don't think we can serve everyones interest with one-package-to-rule-them-all.

That is why I am unsure the package will be accepted by CRAN. Nevertheless it could be at least live on GH.
For {mlr3} we have a package suite that works together and even needs to be loaded in a combined way to actually use the functionality of single pkgs.
Pkgs in the {geoverse} are actually standalone and do not really follow an overarching design philosophy.

@edzer
Copy link
Member

edzer commented Jan 15, 2020

sf and stars are not standalone and do follow an overarching design, and so do sp, rgdal and (to some extent) raster.

@pat-s
Copy link
Member

pat-s commented Jan 15, 2020

@jl5000 This is completely off-topic and has been discussed at several places already. If you can't find any, feel free to open an issue in the {mlr3} repo. (Please mark these comments as off-topic)

@gisma
Copy link
Member

gisma commented Jan 15, 2020

An interesting and also outside the R-world again and again heard discussion and for sure it is beyond my R-capabilities but nevertheless some thoughts...

Of course there are some strong arguments for a streamlined architecture (which maybe finally leads to the click button "solve me"...)

According to my experience and opinion, a large part of the mentioned problems and confusion results from the professional and technical ignorance of the users (not only the R-users!) with regard to spatio-temporal concepts as well as poor knowledge of possible, reasonable and resilient approaches to solve them. A "verse" approach like tidyverse is good and simple for established workflows but will not solve this shortcomings.

In fact, I see no reason to create a geo/spatio/whatever verse because even a lot of the underlying libraries, GI software packages and spatio-temporal concepts are not homogenized or comparable. The multitude of competing and established algorithms are not even mentioned here. Partly the above argumentation is struggling around that point why and what to integrate.

From my point of view they have not been and will not be homogenized (and thus integrated into a "verse") because beyond vanities there are a lot of good reasons to have, know and use different and competing concepts.

And for this it needs experience and knowledge.

If we want to support users then through comparative tutorial workshops learning and teaching offers that take care of the technical implementation of conceptual and scientific solutions I believe approaches like Geocumputation with R are more sustainable than a streamlined meta package

@mdsumner
Copy link
Member

My take is smaller packages means more flexibility and power. The monoliths do so much but at some point don't work and you are endlessly plumbing around their assumptions (though yes they work for most). Focus on tiny packages that do one thing, so others can choose for themselves.

@pat-s
Copy link
Member

pat-s commented Aug 26, 2020

I think in summary there are too many subgroups / a too complicated which packages would belong to the r-spatial "core" and who makes the decisions/maintains such a package in the long run.

Probably too much overhead for the gain in the end?
I'll close here for now, feel free to re-open if more discussion is wanted.

@pat-s pat-s closed this as completed Aug 26, 2020
edzer pushed a commit that referenced this issue Apr 9, 2021
Add Lorena Abad to the list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants