-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] dplyr::bind_rows() for sf #49
Comments
I don't see how this can be done, since neither
@hadley maybe I'm overlooking something? |
|
The problem is that I don't know how to make an efficient generic. I could possibly provide a generic for restoring attributes afterwards. I'm not sure what the best approach is. |
I put together a test exploring the way
It sounds like the larger question of efficient generics needs to be resolved before the |
Signed-off-by: Edzer Pebesma <edzer.pebesma@uni-muenster.de>
Thanks for your test, @tiernanmartin ! Interestingly, Also, unlike
As of |
The duplicate name issue is definitely a bug. I'm not sure the semantics on bind_rows are well defined, but it should probably preserve the attributes, at least of the first df. Maybe we can preserve the performance of bind_rows but make it generic by creating a method for preserving attributes. |
As we see in @tiernanmartin 's test above and rbind.sf, the geometry needs to be postprocessed anyway in case two different geometry types are rbind-ed, so for bind_rows we also need a mechanism where |
Maybe bind_rows() should fall back to a bind_rows() that works with a pair of data frames. But then you lose a lot of the efficiency - but at least that's better than not working |
With dplyr 0.5.0.9004, this still doesn't work: library(sf)
a = st_sf(a=1, geom=st_sfc(st_point(0:1)))
library(dplyr)
b = bind_rows(a, a)
# Warning messages:
# 1: In bind_rows_(x, .id) :
# Vectorizing 'sfc_POINT' elements may not preserve their attributes
# 2: In bind_rows_(x, .id) :
# Vectorizing 'sfc_POINT' elements may not preserve their attributes
b
# Error in .subset2(x, i, exact = exact) :
# attempt to select less than one element in get1index
attributes(b)
# $names
# [1] "a" "geom"
#
# $row.names
# [1] 1 2
#
# $class
# [1] "sf" "data.frame"
### -> sf_column and agr are missing I don't see anything we can do about this on the |
Yeah, it needs some pretty deep changes on our side. |
My workaround for this issue is to temporarily remove geometries, bind, and rejoin. library(sf)
library(dplyr)
a <- st_sf(a=1, geom=st_sfc(st_point(0:1)))
a_nogeom <- a
st_geometry(a_nogeom) <- NULL
b <- bind_rows(a_nogeom, a_nogeom)
b <- dplyr::left_join(b, a, by = "a")
b
# a geom
# 1 1 POINT (0 1)
# 2 1 POINT (0 1) |
what's wrong with |
Binding more than 2 objects. Looks like we both arrived at roughly the same strategy: https://geocompr.github.io/geocompkg/articles/tidyverse-pitfalls.html#pitfall-binding-rows |
What's wrong with |
|
I've not looked into why, but right now
|
Building on the list of
dplyr
verbs requested in edzer/sfr#42, could you please consider addingdplyr::bind_rows()
?This function makes it easy to combine dataframe-like objects, which would also be useful for
sf
objects.Unlike
rbind()
orspRbind()
, thebind_rows()
function allows the merger of objects with non-matching columns, filling any unshared columns withNA
. I find this convenience feature saves me a lot of time, even if it does lead to the creation of the occasional ugly dataframe.There are probably some details that would need to be worked out with the
geom
list-columns, especially in cases wheresf
objects with differing geometry classes are present. Perhaps you could coerce thegeom
column togeometry type: GEOMETRY
?The text was updated successfully, but these errors were encountered: