Skip to content

spatula: a sane, user centric (in the mental model sense) spatial operations package for R

Ben Marwick edited this page Mar 11, 2016 · 5 revisions

Background

R is great as a general purpose GIS and excels at GIS-centric statistical operations.

Getting from "0" to "GIS User" in R is not so great an experience. Yes, it's possible to make a basic visualization with a simple call to maps::map() but going beyond that is anything but easy for the inexperienced user and especially for those not steeped in the wisdom and lore of GIS systems.

Given the reliance on and interdependence of core Spatial objects in R, the best way to describe this package would be that it would be the equivalent of what xml2 & rvest are to the XML package/ecosystem or what curl & httr are to the RCurl ecosystem. That is, xml2, rvest, curl and httr approach the problem of acquiring and working with XML/HTML content from the mental models of the average R user (a large % of R user base). The operations are focused, limited, thoughtful and work seamlessly in a way that users can immediately see and understand.

Similarly, it would also have the attributes of dplyr, tidyr and purrr where there are concise, clear and logically expressive ways to work with the objects.

spatula would initially take core operations (not necessarily the code or place reliance on these packages) found in sp, maptools, raster, rgeos, rgdal, mapproject, proj4 & deldir and build an intuitive, consistent interface for the import, manipulation, transformation and export of spatial data. Because so much existing community R code has reliance on existing classes & objects, whatever object/class system used in spatula would have to provide as_Spatial… and to_spatula-like functions to support the use and interchange of these new objects until package authors of existing geospatial packages augment them to work with the spatula directly (or more modern, logical, usable alternatives are developed to supercede them).

An extended goal would be to provide direct support for developing visualization with leaflet and ggplot2 (which is probably out of scope for a 3-month project).

Ideally, spatula would be Rcpp-backed with heavy reliance on rgeos, rgdal & proj.4 (all currently supported libraries under OSGeo). C/C++ code in existing packages could be leveraged to speed up development but the idea is not to copy/paste years of layered on code & functionality but to use it as a quickstart base to create a whole new intuitive spatial paradigm for R users.

Related work

##Details of your coding project

The goals for this 3-month project are to develop and implement:

  • Core input operations (i.e. enable intuitive query/import of shapefiles, e.g. read_sp, sp_info, and avoid the need to specify layer)
  • Good summary & print functions for spatula objects
  • Core creation operations (i.e. be able to create spatula objects programmatically; pipe-able syntax a +)
  • Core transformation operations (i.e. projections, joining & separation of spatial objects & components, e.g. add_projection, reproject)
  • Core output operatons (i.e. writing to shapefiles and at least a start as converting to existing Spatial objects, write_sp, to_Spatial, to_sp)

This work would be focused more on foundational design and coding with no expectation of robust documentation or vignettes (basic core pkg docs would be nice).

Expected impact

Both experienced and long-time R users are regularly frustrated with the complexity of working with Spatial objects. Look at any code sample in repos or just on the internet (or vignettes, even) and you'll see that solutions are fragmented, some don't work anymore and some are just overly complex.

This project will make spatial operations highly accessible to the R community and provide a way to rapidly integrate basic spatial operations into projects.

It will also be much faster than many existing Spatial operations since this should work like xml2 where the binary object is in memory with a pointer in an R object and the majority of the operations being in C/C++ (using Rcpp).

Bonus points if we can find a way to speed up spatial object rendering in ggplot2.

Mentors

  • Bob Rudis (@hrbrmstr)
  • Bhaskar Karambelkar (@bhaskar_vk) [Totally not a domain expert but glad to help].
  • Scott Chamberlain (@sckott)

Tests

  • Can you scaffold an R package?
  • Can you work with third party libraries? (if so, can you do it cross-platform?)
  • Can you write C/C++ code?
  • Can you work with Rcpp in a package context?
  • Are you a current user of geospatial software?

Solutions of tests

Students, please post a link to your test results here.