Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] dplyr::bind_rows() for sf #49

Closed
tiernanmartin opened this issue Nov 4, 2016 · 16 comments
Closed

[Request] dplyr::bind_rows() for sf #49

tiernanmartin opened this issue Nov 4, 2016 · 16 comments

Comments

@tiernanmartin
Copy link

@tiernanmartin tiernanmartin commented Nov 4, 2016

Building on the list of dplyr verbs requested in edzer/sfr#42, could you please consider adding dplyr::bind_rows()?

This function makes it easy to combine dataframe-like objects, which would also be useful for sf objects.

Unlike rbind() or spRbind(), the bind_rows() function allows the merger of objects with non-matching columns, filling any unshared columns with NA. I find this convenience feature saves me a lot of time, even if it does lead to the creation of the occasional ugly dataframe.

There are probably some details that would need to be worked out with the geom list-columns, especially in cases where sf objects with differing geometry classes are present. Perhaps you could coerce the geom column to geometry type: GEOMETRY?

@edzer
Copy link
Member

@edzer edzer commented Nov 4, 2016

I don't see how this can be done, since neither bind_rows nor bind_rows_ (which does the work), are generics:

> methods(bind_rows_)
Error in methods(bind_rows_) : object 'bind_rows_' not found

@hadley maybe I'm overlooking something?

@kendonB
Copy link
Contributor

@kendonB kendonB commented Nov 4, 2016

bind_cols also likely has a use case. Say I'm doing some distributed operation by column, and I want to bring the pieces back together, for example.

@hadley
Copy link
Contributor

@hadley hadley commented Nov 4, 2016

The problem is that I don't know how to make an efficient generic. I could possibly provide a generic for restoring attributes afterwards. I'm not sure what the best approach is.

@tiernanmartin
Copy link
Author

@tiernanmartin tiernanmartin commented Nov 5, 2016

I put together a test exploring the way sf objects interact with both rbind() and bind_rows(). The test shows two fairly common situations when working with vector data:

  1. Needing to combine datasets with different geometry types
  2. Needing to combine datasets with non-matching columns

It sounds like the larger question of efficient generics needs to be resolved before the bind_*
verbs can be adapted for sf. In the meantime, perhaps a function could be added to make it easier to combine two sfc's with different geometry types into a single sfc with a GEOMETRY type.

edzer added a commit that referenced this issue Nov 6, 2016
Signed-off-by: Edzer Pebesma <edzer.pebesma@uni-muenster.de>
@edzer
Copy link
Member

@edzer edzer commented Nov 6, 2016

Thanks for your test, @tiernanmartin ! cbind and rbind now work the way they do in base, except that a cbind on two sf objects generates a warning that multiple geometries are not allowed and that it is dropping all but the first geometry list-column.

Interestingly, bind_cols(sf_pol2,sf_mpol2) works, although it retains the secondary geometry, which is what st_sf would not do. It will be dropped with a warning when we do st_sf(bind_cols(sf_pol2,sf_mpol2)).

Also, unlike base::cbind and dplyr::bind_cols, sf::cbind renames duplicate variables. Maybe I missed it, but I don't see how duplication of variable names fits in a tidyverse.

> bind_cols(data.frame(a=1:2), data.frame(a=4:5))    
  a a
1 1 4
2 2 5

As of bind_rows: I don't see anything I can do in sf for this. @hadley : is it on purpose that bind_cols retains all attributes (of object, as well as of its columns) but bind_rows does not?

@hadley
Copy link
Contributor

@hadley hadley commented Nov 6, 2016

The duplicate name issue is definitely a bug. I'm not sure the semantics on bind_rows are well defined, but it should probably preserve the attributes, at least of the first df. Maybe we can preserve the performance of bind_rows but make it generic by creating a method for preserving attributes.

@edzer
Copy link
Member

@edzer edzer commented Nov 6, 2016

As we see in @tiernanmartin 's test above and rbind.sf, the geometry needs to be postprocessed anyway in case two different geometry types are rbind-ed, so for bind_rows we also need a mechanism where sf can provide a method instance for this, and take care of geometry type mixing.

@hadley
Copy link
Contributor

@hadley hadley commented Nov 6, 2016

Maybe bind_rows() should fall back to a bind_rows() that works with a pair of data frames. But then you lose a lot of the efficiency - but at least that's better than not working

@edzer
Copy link
Member

@edzer edzer commented Apr 23, 2017

With dplyr 0.5.0.9004, this still doesn't work:

library(sf)
a  = st_sf(a=1, geom=st_sfc(st_point(0:1)))
library(dplyr)
b = bind_rows(a, a)
# Warning messages:
# 1: In bind_rows_(x, .id) :
#   Vectorizing 'sfc_POINT' elements may not preserve their attributes
# 2: In bind_rows_(x, .id) :
#   Vectorizing 'sfc_POINT' elements may not preserve their attributes
b
# Error in .subset2(x, i, exact = exact) : 
#   attempt to select less than one element in get1index
attributes(b)
# $names
# [1] "a"    "geom"
# 
# $row.names
# [1] 1 2
# 
# $class
# [1] "sf"         "data.frame"
### -> sf_column and agr are missing

I don't see anything we can do about this on the sf side, and propose to close this issue here.

@hadley
Copy link
Contributor

@hadley hadley commented Apr 23, 2017

Yeah, it needs some pretty deep changes on our side.

@jsta
Copy link
Contributor

@jsta jsta commented Oct 30, 2018

My workaround for this issue is to temporarily remove geometries, bind, and rejoin.

library(sf)
library(dplyr)

a                     <- st_sf(a=1, geom=st_sfc(st_point(0:1)))
a_nogeom              <- a
st_geometry(a_nogeom) <- NULL

b <- bind_rows(a_nogeom, a_nogeom)
b <- dplyr::left_join(b, a, by = "a") 
b
#  a        geom
# 1 1 POINT (0 1)
# 2 1 POINT (0 1)

@Robinlovelace
Copy link
Contributor

@Robinlovelace Robinlovelace commented Oct 30, 2018

what's wrong with rbind()? out of interest - see here for context: https://geocompr.github.io/geocompkg/articles/tidyverse-pitfalls.html

@jsta
Copy link
Contributor

@jsta jsta commented Oct 30, 2018

Binding more than 2 objects. Looks like we both arrived at roughly the same strategy: https://geocompr.github.io/geocompkg/articles/tidyverse-pitfalls.html#pitfall-binding-rows

@adrfantini
Copy link

@adrfantini adrfantini commented Oct 30, 2018

What's wrong with do.call('rbind')?

@ramarty
Copy link

@ramarty ramarty commented May 21, 2019

do.call('rbind') works great when all the columns are the same. When they're not, I use a solution similar to @jsta's:

library(sf)
library(dplyr)

bind_rows_sf <- function(...){
  sf_list <- rlang::dots_values(...)[[1]]
  
  sfg_list_column <- lapply(sf_list, function(sf) sf$geometry[[1]]) %>% st_sfc
  df <- lapply(sf_list, function(sf) st_set_geometry(sf, NULL)) %>% bind_rows
  
  sf_appended <- st_sf(data.frame(df, geom=sfg_list_column))

  return(sf_appended)
}

sf_1 <- st_sf(data.frame(a=1, geom=st_sfc(st_point(0:1))))
sf_2 <- st_sf(data.frame(a=2, b=4, geom=st_sfc(st_point(1:2))))
sf_3 <- st_sf(data.frame(a=3, b=5, c=6, geom=st_sfc(st_point())))
sf_list <- list(sf_1, sf_2, sf_3)

sf_123 <- sf_list %>% bind_rows_sf
sf_123

@pasipasi123
Copy link

@pasipasi123 pasipasi123 commented Oct 28, 2020

I've not looked into why, but right now bind_rows() works.

library(sf)
#> Warning: package 'sf' was built under R version 4.0.3
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(dplyr)

crs = st_crs(3857)
a = st_sf(a=1, geom = st_sfc(st_point(0:1)), crs = crs)
b = st_sf(a=1, geom = st_sfc(st_linestring(matrix(1:4,2))), crs = crs)
c = st_sf(a=4, geom = st_sfc(st_multilinestring(list(matrix(1:4,2)))), crs = crs)

list(a, b, c) %>% 
  bind_rows()
#> Simple feature collection with 3 features and 1 field
#> geometry type:  GEOMETRY
#> dimension:      XY
#> bbox:           xmin: 0 ymin: 1 xmax: 2 ymax: 4
#> projected CRS:  WGS 84 / Pseudo-Mercator
#>   a                         geom
#> 1 1                  POINT (0 1)
#> 2 1        LINESTRING (1 3, 2 4)
#> 3 4 MULTILINESTRING ((1 3, 2 4))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants