geoplyr - dplyr style manipulation for geospatial data #22

Open
eamcvey opened this Issue Mar 7, 2016 · 15 comments

Comments

Projects
None yet
6 participants
@eamcvey
Contributor

eamcvey commented Mar 7, 2016

I'm somewhat new to geospatial analysis, and while the tools in R can do all kinds of things, I feel like they operate the "old" R way, not the "new" way I'm now accustomed to from using dplyr, tidyr, and friends. I think there's room for a package that could make working with geospatial data easier and more elegant -- in particular, handling the sp objects intuitively.

@karthik

This comment has been minimized.

Show comment
Hide comment
@karthik

karthik Mar 7, 2016

Member

💯 I would love a dplyr for spatial data. There are some tools in the rOpenSci suite from @sckott that we should discuss.

Member

karthik commented Mar 7, 2016

💯 I would love a dplyr for spatial data. There are some tools in the rOpenSci suite from @sckott that we should discuss.

@hrbrmstr

This comment has been minimized.

Show comment
Hide comment

hrbrmstr commented Mar 7, 2016

YES! @eamcvey want to help also with a GSoC 2016 R proposal? https://github.com/rstats-gsoc/gsoc2016/wiki/spatula:-a-sane,-user-centric-(in-the-mental-model-sense)-spatial-operations-package-for-R (geoplyr sounds way cooler than spatula)

@hrbrmstr

This comment has been minimized.

Show comment
Hide comment
@hrbrmstr

hrbrmstr Mar 9, 2016

FYI: Just learned abt this from Roger https://github.com/edzer/sfr. Super cool (and well-written) idea!

hrbrmstr commented Mar 9, 2016

FYI: Just learned abt this from Roger https://github.com/edzer/sfr. Super cool (and well-written) idea!

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Mar 9, 2016

Rumours go that the ISC proposal for sfr might get funded -- of course, the ISC still has to announce this first publicly. @karthik and @eamcvey : as I use R mostly the "old" way, could you provide some use cases or mock ups how you would like (new) sp classes to behave more intuitively?

edzer commented Mar 9, 2016

Rumours go that the ISC proposal for sfr might get funded -- of course, the ISC still has to announce this first publicly. @karthik and @eamcvey : as I use R mostly the "old" way, could you provide some use cases or mock ups how you would like (new) sp classes to behave more intuitively?

@eamcvey

This comment has been minimized.

Show comment
Hide comment
@eamcvey

eamcvey Mar 31, 2016

Contributor

@hrbrmstr The GSoC proposal is a great description of what I was thinking - did this get submitted? Some other examples of doing things the "new" way would be:

  • the equivalent of bind_rows() for spatial objects (without having to manually make the IDs unique)
  • the ability to chain operations nicely
  • operations that work appropriately on the data and spatial components of a spatial object (where possible) - for example, filtering
Contributor

eamcvey commented Mar 31, 2016

@hrbrmstr The GSoC proposal is a great description of what I was thinking - did this get submitted? Some other examples of doing things the "new" way would be:

  • the equivalent of bind_rows() for spatial objects (without having to manually make the IDs unique)
  • the ability to chain operations nicely
  • operations that work appropriately on the data and spatial components of a spatial object (where possible) - for example, filtering
@eamcvey

This comment has been minimized.

Show comment
Hide comment
@eamcvey

eamcvey Mar 31, 2016

Contributor

@edzer It looks like the sfr proposal has gotten funded. I'm working on mocking up how I wish geospatial data manipulation worked in R.

Contributor

eamcvey commented Mar 31, 2016

@edzer It looks like the sfr proposal has gotten funded. I'm working on mocking up how I wish geospatial data manipulation worked in R.

@sckott

This comment has been minimized.

Show comment
Hide comment
@sckott

sckott Mar 31, 2016

Member

@eamcvey i think we're still waiting to hear back on whether @hrbrmstr student's project gets picked as well

Member

sckott commented Mar 31, 2016

@eamcvey i think we're still waiting to hear back on whether @hrbrmstr student's project gets picked as well

@eamcvey

This comment has been minimized.

Show comment
Hide comment
@eamcvey

eamcvey Mar 31, 2016

Contributor

Start of mockup of how spatial analysis could be if it were possible to work with geometry columns in dataframes: https://github.com/ropenscilabs/geoplyr/blob/master/ideal_mockups.Rmd

Contributor

eamcvey commented Mar 31, 2016

Start of mockup of how spatial analysis could be if it were possible to work with geometry columns in dataframes: https://github.com/ropenscilabs/geoplyr/blob/master/ideal_mockups.Rmd

@eamcvey

This comment has been minimized.

Show comment
Hide comment
@eamcvey

eamcvey Mar 31, 2016

Contributor

@edzer, I don't understand the simple features stuff entirely, but in my optimistic imagination, it could provide the types of simple geometric objects (polygons, points, etc.) that would go into the geometry columns of dataframes as I imagine in this mockup above. If so, I think there could be huge benefit to fitting spatial objects into columns of dataframes and then having access to the existing spectacular "new R" tools in dplyr, tidyr, and purrr to manipulate them (with special functions for operating on the geometry columns). Hadley assures me that there's no fundamental reason this isn't possible : )

Contributor

eamcvey commented Mar 31, 2016

@edzer, I don't understand the simple features stuff entirely, but in my optimistic imagination, it could provide the types of simple geometric objects (polygons, points, etc.) that would go into the geometry columns of dataframes as I imagine in this mockup above. If so, I think there could be huge benefit to fitting spatial objects into columns of dataframes and then having access to the existing spectacular "new R" tools in dplyr, tidyr, and purrr to manipulate them (with special functions for operating on the geometry columns). Hadley assures me that there's no fundamental reason this isn't possible : )

@mdsumner

This comment has been minimized.

Show comment
Hide comment
@mdsumner

mdsumner Apr 1, 2016

My efforts in this area are in two packages, gris and spbabel:

https://github.com/mdsumner/gris

https://github.com/mdsumner/spbabel

Gris is well-developed but I'm not happy with the overall design and user-view yet. It provides a db-like "normalized" structure for spatial objects in multiple linked tables. The point is that you can more easily work on the components (vertices, pieces*, objects) individually, generate other forms like edge-based or primitives-based meshes, and ultimately back-end it with a generic database.

Spbabel is simpler, and starts "in the middle" with something like the ggplot2::fortify (or raster::geom) table of vertices without enforcing uniqueness.

I'm trying to build it into a bigger story but these two blog posts are about as far as it goes:

http://mdsumner.github.io/2015/12/28/gis3d.html

http://mdsumner.github.io/2016/03/03/polygons-R.html

Very keen to explore this idea more, sp is fundamentally limiting in several ways (just like GIS is) but I'm not saying we should disown it, I feel we just need to be able to transform between different forms much more easily.

I'm still catching up with this discussion, just wanted to drop this in :)

I also have done some work on using dplyr with ODBC, which allows me to read in from Manifold GIS directly, amongst other things. I see this all fitting together really nicely with dplyr as the new centre.

https://github.com/mdsumner/dplyrodbc

https://github.com/mdsumner/manifoldr

Cheers, Mike.

mdsumner commented Apr 1, 2016

My efforts in this area are in two packages, gris and spbabel:

https://github.com/mdsumner/gris

https://github.com/mdsumner/spbabel

Gris is well-developed but I'm not happy with the overall design and user-view yet. It provides a db-like "normalized" structure for spatial objects in multiple linked tables. The point is that you can more easily work on the components (vertices, pieces*, objects) individually, generate other forms like edge-based or primitives-based meshes, and ultimately back-end it with a generic database.

Spbabel is simpler, and starts "in the middle" with something like the ggplot2::fortify (or raster::geom) table of vertices without enforcing uniqueness.

I'm trying to build it into a bigger story but these two blog posts are about as far as it goes:

http://mdsumner.github.io/2015/12/28/gis3d.html

http://mdsumner.github.io/2016/03/03/polygons-R.html

Very keen to explore this idea more, sp is fundamentally limiting in several ways (just like GIS is) but I'm not saying we should disown it, I feel we just need to be able to transform between different forms much more easily.

I'm still catching up with this discussion, just wanted to drop this in :)

I also have done some work on using dplyr with ODBC, which allows me to read in from Manifold GIS directly, amongst other things. I see this all fitting together really nicely with dplyr as the new centre.

https://github.com/mdsumner/dplyrodbc

https://github.com/mdsumner/manifoldr

Cheers, Mike.

@mdsumner

This comment has been minimized.

Show comment
Hide comment
@mdsumner

mdsumner Apr 1, 2016

@eamcvey do you have the data from your Ideal doc in concrete form? I'd like to work through your document and use it to explain how I see things. If you have those actual data and can share that would be awesome. This is helping me focus somewhat. :)

mdsumner commented Apr 1, 2016

@eamcvey do you have the data from your Ideal doc in concrete form? I'd like to work through your document and use it to explain how I see things. If you have those actual data and can share that would be awesome. This is helping me focus somewhat. :)

@eamcvey

This comment has been minimized.

Show comment
Hide comment
@eamcvey

eamcvey Apr 1, 2016

Contributor

@mdsumner You're calling my bluff -- I don't actually have that data ; ) But if it's helpful, I can get it, or something quite similar, fairly easily. If the document is helping provide focus, then it's doing its job!

Contributor

eamcvey commented Apr 1, 2016

@mdsumner You're calling my bluff -- I don't actually have that data ; ) But if it's helpful, I can get it, or something quite similar, fairly easily. If the document is helping provide focus, then it's doing its job!

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Apr 1, 2016

Thanks for the mockup, @eamcvey ! I agree with @mdsumner that we'd need some sample data with it in order to get more concrete.

For your information, sp::aggregate does aggregate polygon information, for the case of nested polygons (say, from districts to provinces) as well as non-nested polygons (assuming constant value throughout the polygons). Your last example would now look like

new_district_df <- aggregate(census_bg_df, list(new_district_df$assigned_district), sum)

which follows the stats::aggregate semantics. Pretty compact, and it dissolves polygons.

Anyway, it would be great if you could for instance provide a census_bg_df shapefile to start with.

edzer commented Apr 1, 2016

Thanks for the mockup, @eamcvey ! I agree with @mdsumner that we'd need some sample data with it in order to get more concrete.

For your information, sp::aggregate does aggregate polygon information, for the case of nested polygons (say, from districts to provinces) as well as non-nested polygons (assuming constant value throughout the polygons). Your last example would now look like

new_district_df <- aggregate(census_bg_df, list(new_district_df$assigned_district), sum)

which follows the stats::aggregate semantics. Pretty compact, and it dissolves polygons.

Anyway, it would be great if you could for instance provide a census_bg_df shapefile to start with.

@mdsumner mdsumner referenced this issue in mdsumner/spbabel Apr 2, 2016

Closed

Address Ideal doc #3

@mdsumner

This comment has been minimized.

Show comment
Hide comment
@mdsumner

mdsumner Apr 2, 2016

@eamcvey you must be motivated by real-world data here so of course it's helpful to have actual examples - I don't consider it bluffing here :)

Thinking about the "geometry column" thing - I think that's pretty easy to do, but what I don't like about it is that it doesn't naturally provide a topological data structure - there's no way to share vertices between objects, they all just get copied out in a recursive structure, just in text or in a binary blob - you might as well serialize a Polygons object for example, and store that in a column. It's not hard, it just doesn't really help from my perspective. Topology is what is missing from sp and from most GIS implementations. Also "Polygons" are really just lines with a fill-rule, so you can't pop them out into X-Y-Z - we really need proper surfaces that can be decomposed to triangles, and "polygons" defined by cycles in the mesh are a special case.

There's no way of avoiding the need for at least two tables, one for the vertices and the identifiers for object, part, holiness, and path-ordering, and one for the objects. I just like to take it further, so you can really "normalize" and have vertices (x, y, z, time, etc. with no limit) plus an ID, for that you need at least a vertex table, a branches (or "parts" or pieces" table), and the objects. To normalize the vertices (store only unique rows) you need a vertex-link-branches table.

Gris does this, but it's not dplyr-able yet - I'm working on that. Gris should have the choice of "topology" model - it has branches (the poly-ring, line-string, point, multi-point stuff), and primitives (triangles and/or edges for lines) and it should also have edges (line segments for polys or lines) and the ability to switch between. The constrained Delaunay triangulation in RTriangle is so fast that I think it's worth doing all of this upfront. Then the user can go further to decompose to smaller triangles, shorter line segments , triangles with nicer angles etc. etc. - but the branches, edges and primitives should always be available. It might be a special case to not triangulate, but it's easy enough anyway.

Spbabel is dplyr-pipeable, and has examples to work with the basic verbs on objects, and on the vertices using the sptable(x)<- trick (suggested by @hadley). I think the "vertex-table" in the middle view is a better place to start than gris - it's essentially the ggplot2 fortify table, plus the linked objects. I can go from that to the more-tables more-normalized gris view though for dplyr-abling that's probably not necessary.

mdsumner commented Apr 2, 2016

@eamcvey you must be motivated by real-world data here so of course it's helpful to have actual examples - I don't consider it bluffing here :)

Thinking about the "geometry column" thing - I think that's pretty easy to do, but what I don't like about it is that it doesn't naturally provide a topological data structure - there's no way to share vertices between objects, they all just get copied out in a recursive structure, just in text or in a binary blob - you might as well serialize a Polygons object for example, and store that in a column. It's not hard, it just doesn't really help from my perspective. Topology is what is missing from sp and from most GIS implementations. Also "Polygons" are really just lines with a fill-rule, so you can't pop them out into X-Y-Z - we really need proper surfaces that can be decomposed to triangles, and "polygons" defined by cycles in the mesh are a special case.

There's no way of avoiding the need for at least two tables, one for the vertices and the identifiers for object, part, holiness, and path-ordering, and one for the objects. I just like to take it further, so you can really "normalize" and have vertices (x, y, z, time, etc. with no limit) plus an ID, for that you need at least a vertex table, a branches (or "parts" or pieces" table), and the objects. To normalize the vertices (store only unique rows) you need a vertex-link-branches table.

Gris does this, but it's not dplyr-able yet - I'm working on that. Gris should have the choice of "topology" model - it has branches (the poly-ring, line-string, point, multi-point stuff), and primitives (triangles and/or edges for lines) and it should also have edges (line segments for polys or lines) and the ability to switch between. The constrained Delaunay triangulation in RTriangle is so fast that I think it's worth doing all of this upfront. Then the user can go further to decompose to smaller triangles, shorter line segments , triangles with nicer angles etc. etc. - but the branches, edges and primitives should always be available. It might be a special case to not triangulate, but it's easy enough anyway.

Spbabel is dplyr-pipeable, and has examples to work with the basic verbs on objects, and on the vertices using the sptable(x)<- trick (suggested by @hadley). I think the "vertex-table" in the middle view is a better place to start than gris - it's essentially the ggplot2 fortify table, plus the linked objects. I can go from that to the more-tables more-normalized gris view though for dplyr-abling that's probably not necessary.

@mdsumner

This comment has been minimized.

Show comment
Hide comment
@mdsumner

mdsumner Apr 2, 2016

@edzer thanks for the aggregate example, I actually forget sp has some of this manipulation built-in. I can see we could usefully have options for group_by() %>% summarize () that unioned objects together using this. I wonder if we need extra arguments to differentiate the summarize function/s from the topological tasks, or if it's best done with new verbs? I need to try this out - I'll be able to in the next few weeks, and as ever very keen to hear from anyone interested in doing this.

mdsumner commented Apr 2, 2016

@edzer thanks for the aggregate example, I actually forget sp has some of this manipulation built-in. I can see we could usefully have options for group_by() %>% summarize () that unioned objects together using this. I wonder if we need extra arguments to differentiate the summarize function/s from the topological tasks, or if it's best done with new verbs? I need to try this out - I'll be able to in the next few weeks, and as ever very keen to hear from anyone interested in doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment