Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr::bind_rows fails with XYZ geometry in dplyr 1.0 #1390

Closed
mem48 opened this issue May 9, 2020 · 18 comments · Fixed by r-lib/vctrs#1111
Closed

dplyr::bind_rows fails with XYZ geometry in dplyr 1.0 #1390

mem48 opened this issue May 9, 2020 · 18 comments · Fixed by r-lib/vctrs#1111

Comments

@mem48
Copy link

mem48 commented May 9, 2020

I'm not sure if this is an SF or dplyr problem, so posting here first.

I'm updating my package to support the major breaking changes in dplyr 1.0.

I use dplyr::bind_rows to bind a list of sf data frames into a single data frame.

Simple example:

xy1 <- st_linestring(matrix(runif(18), ncol = 2))
xy1 <- st_as_sf(data.frame(id = 1, geometry = st_as_sfc(list(xy1))))

xy2 <- st_linestring(matrix(runif(18), ncol = 2))
xy2 <- st_as_sf(data.frame(id = 2, geometry = st_as_sfc(list(xy2))))

res_xy <- dplyr::bind_rows(list(xy1, xy2))

This works with both the CRAN and GitHub version of dplyr.

But if I use an XYZ linestring:

xyz1 <- st_linestring(matrix(runif(18), ncol = 3))
xyz1 <- st_as_sf(data.frame(id = 1, geometry = st_as_sfc(list(xyz1))))

xyz2 <- st_linestring(matrix(runif(18), ncol = 3))
xyz2 <- st_as_sf(data.frame(id = 2, geometry = st_as_sfc(list(xyz2))))

res_xyz <- dplyr::bind_rows(list(xyz1, xyz2))

I get an error:

Error in st_sfc(x, crs = st_crs(to), precision = st_precision(to)) : 
  found multiple dimensions: XYZ XY

This is not true as all the geometries are LINESTRING Z, and this error does not occur when I use the CRAN version of dplyr.

Session Info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.8.99.9002       sf_0.9-3                opentripplanner_0.2.1.0 testthat_2.3.2         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6          compiler_3.6.3        pillar_1.4.4          prettyunits_1.1.1    
 [5] remotes_2.1.1         class_7.3-15          tools_3.6.3           googlePolylines_0.7.2
 [9] pkgbuild_1.0.8        jsonlite_1.6.1        checkmate_2.0.0       lifecycle_0.2.0      
[13] tibble_3.0.1          pkgconfig_2.0.3       rlang_0.4.6           cli_2.0.2            
[17] DBI_1.1.0             rstudioapi_0.11       parallel_3.6.3        curl_4.3             
[21] e1071_1.7-3           httr_1.4.1            withr_2.2.0           generics_0.0.2       
[25] vctrs_0.3.0           grid_3.6.3            classInt_0.4-3        rprojroot_1.3-2      
[29] tidyselect_1.0.0      glue_1.4.0            R6_2.4.1              processx_3.4.2       
[33] pbapply_1.4-2         fansi_0.4.1           callr_3.4.3           purrr_0.3.4          
[37] magrittr_1.5          units_0.6-6           backports_1.1.6       ps_1.3.2             
[41] ellipsis_0.3.0        assertthat_0.2.1      geodist_0.0.3         KernSmooth_2.23-16   
[45] crayon_1.3.4  

@mem48
Copy link
Author

mem48 commented May 9, 2020

@Robinlovelace does this affect any of your packages?

@edzer
Copy link
Member

edzer commented May 10, 2020

@lionel- any ideas?

@Robinlovelace
Copy link
Contributor

does this affect any of your packages?

Not at present, the slopes package is the only one that explicitly uses xyz coordinates and that has no dplyr dependencies. FYI I did notice lots various sf things crashing when I tested the dev version of dplyr for stplanr which is now protected from the incoming changes, but didn't report them because that was a while ago an the underlying vctrs work seemed to be in flux. Now is a great time to fix issues like this to ensure dplyr/sf compatibility. Let us know @lionel- and others if we can help with any more testing or use cases.

@lionel-
Copy link
Contributor

lionel- commented May 10, 2020

I was going to take a look. I'll do it now.

@lionel-
Copy link
Contributor

lionel- commented May 10, 2020

@Robinlovelace Any failing test cases that you think might have different causes of failures would be great!

@lionel-
Copy link
Contributor

lionel- commented May 14, 2020

@mem48 I'm working on improving base fallbacks in vctrs so we can just use c.sfc() instead of vctrs methods. Also FYI we're delaying the 1.0 release two more weeks.

@mem48
Copy link
Author

mem48 commented May 14, 2020

@lionel- thanks for the update. Sorry I got a bit panicked that my package would break and be pulled from CRAN. Glad to hear a solution is in the works.

@hadley
Copy link
Contributor

hadley commented May 21, 2020

@edzer I think you can close this since the fix will come from our side, and we're testing this specific case.

@edzer
Copy link
Member

edzer commented May 22, 2020

Thanks so much, @hadley & @lionel- !

@mem48
Copy link
Author

mem48 commented May 30, 2020

@lionel- I tested your commits to vctrs and this does not appeare to be fixed with lists longer than 2. Using my original example

# works
res_xyz2 <- dplyr::bind_rows(list(xyz1, xyz2))
# fails
res_xyz3 <- dplyr::bind_rows(list(xyz1, xyz2, xyz1))

Error in st_geometry.sf(x) : 
  attr(obj, "sf_column") does not point to a geometry column.
Did you rename it, without setting st_geometry(obj) <- "newname"?

A new error message, so perhaps a new problem?

@lionel-
Copy link
Contributor

lionel- commented May 31, 2020

@mem48 Confirmed thanks, I see what's going on. Sorry this might have to wait until after Wednesday, but I'll hold vctrs 0.3.1 so I can fix this.

@mem48
Copy link
Author

mem48 commented May 31, 2020

Thanks, at the moment I'm advising my users not to update dplyr but that will become harder over time.

@lionel-
Copy link
Contributor

lionel- commented May 31, 2020

As far as I know bind_rows() with sf data frames didn't work with dplyr 0.8.5 either, so that's not a regression. Do you know about issues for other functionality? Any reprexes would be helpful, becuase it allows us to grow our set of unit tests.

@mem48
Copy link
Author

mem48 commented May 31, 2020

So bind_rows() does not work perfectly in dplyr 0.8.5 with sf dataframes, but it can be made to work if you make a few assumptions.

The place my package is having problems is here

The relevant bit of code is below.

suppressWarnings(results_routes <- dplyr::bind_rows(results_routes))
results_routes <- as.data.frame(results_routes)
results_routes$geometry <- sf::st_sfc(results_routes$geometry)
results_routes <- sf::st_sf(results_routes)
sf::st_crs(results_routes) <- 4326

results_routes is a list of sf data frames, in this case, I know that for every data frame in the list the CRS, column names / types, geometry type etc is always the same. So this trick works. It creates warnings, hence the suppressWarnings but it works.

I've seen this used quite widely, it is useful when iterating over each row of an sf data frame, If you use a for loop or lapply you often end up with a list of 1 row data frames.

@mem48
Copy link
Author

mem48 commented Jun 1, 2020

@lionel- I've had an email from CRAN giving me until the 15th to fix the problem, or my package will be pulled from CRAN. Do you think vctrs 0.3.1 will be out before then? Otherwise, I'm going to have to find a way to remove dplyr from my package dependencies.

@edzer If this is not fixed soon, it would be worth reopening this issue, in case others come across the same problem.

@hadley
Copy link
Contributor

hadley commented Jun 1, 2020

@mem48 yes — we are literally working on this as hard as we possible can.

@hadley
Copy link
Contributor

hadley commented Jun 1, 2020

@mem48 just to be clear, this will be fixed by vctrs 0.3.1 which will be coming out later this week, but you have made a bad situation for yourself. It's a bad idea to silence warnings that tell you what you're doing is dangerous.

@mem48
Copy link
Author

mem48 commented Jun 1, 2020

@hadley as a general rule I would agree. But my options were limited. Not using bind_rows would have left me using do.call(rbind) which is so slow as to make the package unusable. While not suppressing the warning would leave my package users with lots of warnings that they wouldn't understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants