Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of st_transform #1367

Closed
Robinlovelace opened this issue Apr 14, 2020 · 26 comments
Closed

Performance of st_transform #1367

Robinlovelace opened this issue Apr 14, 2020 · 26 comments

Comments

@Robinlovelace
Copy link
Contributor

@Robinlovelace Robinlovelace commented Apr 14, 2020

More of a question than a bug, but it seems that transforming object geometries is slower in sf than its predecessor. I'm not sure if this is new but was surprised to find this as most benchmarks I've seen that compare sp code with sf code show substantial performance improvements of the more recent package.

I know performance isn't everything but I thought the factor of ~4 slowdown I'm seeing in what is I think equivalent code may be of interest.

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
nc = st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `/home/robin/R/x86_64-pc-linux-gnu-library/3.6/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> geographic CRS: NAD27
nc_sp = as(nc, "Spatial")
bench::mark(st_transform(nc, 4326), sp::spTransform(nc_sp, sp::CRS("+init=epsg:4326")), check = FALSE)
#> # A tibble: 2 x 6
#>   expression                                             min median `itr/sec`
#>   <bch:expr>                                         <bch:t> <bch:>     <dbl>
#> 1 st_transform(nc, 4326)                             100.5ms  102ms      9.80
#> 2 sp::spTransform(nc_sp, sp::CRS("+init=epsg:4326"))  23.2ms   24ms     41.4 
#> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>

Created on 2020-04-14 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Ubuntu 18.04.4 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB:en                    
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-04-14                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                             
#>  assertthat    0.2.1       2019-03-21 [2] CRAN (R 3.6.0)                     
#>  backports     1.1.6       2020-04-05 [1] CRAN (R 3.6.3)                     
#>  bench         1.1.1       2020-01-13 [2] CRAN (R 3.6.2)                     
#>  callr         3.4.3       2020-03-28 [1] CRAN (R 3.6.3)                     
#>  class         7.3-16      2020-03-25 [2] CRAN (R 3.6.3)                     
#>  classInt      0.4-3       2020-04-06 [1] Github (r-spatial/classInt@d024051)
#>  cli           2.0.2       2020-02-28 [1] CRAN (R 3.6.2)                     
#>  crayon        1.3.4       2017-09-16 [2] standard (@1.3.4)                  
#>  DBI           1.1.0       2019-12-15 [2] CRAN (R 3.6.2)                     
#>  desc          1.2.0       2018-05-01 [2] standard (@1.2.0)                  
#>  devtools      2.3.0       2020-04-10 [1] CRAN (R 3.6.3)                     
#>  digest        0.6.25      2020-02-23 [1] CRAN (R 3.6.2)                     
#>  e1071         1.7-3       2019-11-26 [2] CRAN (R 3.6.1)                     
#>  ellipsis      0.3.0       2019-09-20 [3] CRAN (R 3.6.1)                     
#>  evaluate      0.14        2019-05-28 [2] CRAN (R 3.6.0)                     
#>  fansi         0.4.1       2020-01-08 [1] CRAN (R 3.6.2)                     
#>  fs            1.4.1       2020-04-04 [2] CRAN (R 3.6.3)                     
#>  glue          1.4.0       2020-04-03 [2] CRAN (R 3.6.3)                     
#>  highr         0.8         2019-03-20 [3] CRAN (R 3.5.3)                     
#>  htmltools     0.4.0.9003  2020-04-09 [1] Github (rstudio/htmltools@1a7d0dc) 
#>  KernSmooth    2.23-16     2019-10-15 [4] CRAN (R 3.6.1)                     
#>  knitr         1.28        2020-02-06 [1] CRAN (R 3.6.2)                     
#>  lattice       0.20-41     2020-04-02 [2] CRAN (R 3.6.3)                     
#>  lifecycle     0.2.0.9000  2020-03-16 [1] Github (r-lib/lifecycle@355dcba)   
#>  magrittr      1.5         2014-11-22 [2] CRAN (R 3.5.2)                     
#>  memoise       1.1.0       2017-04-21 [3] CRAN (R 3.5.0)                     
#>  pillar        1.4.3       2019-12-20 [1] CRAN (R 3.6.2)                     
#>  pkgbuild      1.0.6       2019-10-09 [2] CRAN (R 3.6.1)                     
#>  pkgconfig     2.0.3       2019-09-22 [2] CRAN (R 3.6.1)                     
#>  pkgload       1.0.2       2018-10-29 [3] CRAN (R 3.5.1)                     
#>  prettyunits   1.1.1       2020-01-24 [1] CRAN (R 3.6.2)                     
#>  processx      3.4.2       2020-02-09 [1] CRAN (R 3.6.3)                     
#>  profmem       0.5.0       2018-01-30 [2] CRAN (R 3.5.2)                     
#>  ps            1.3.2       2020-02-13 [1] CRAN (R 3.6.3)                     
#>  R6            2.4.1       2019-11-12 [2] CRAN (R 3.6.1)                     
#>  Rcpp          1.0.4.6     2020-04-09 [1] CRAN (R 3.6.3)                     
#>  remotes       2.1.1       2020-02-15 [1] CRAN (R 3.6.2)                     
#>  rgdal         1.4-8       2019-11-27 [1] CRAN (R 3.6.3)                     
#>  rlang         0.4.5.9000  2020-03-25 [1] Github (r-lib/rlang@a90b04b)       
#>  rmarkdown     2.1.2       2020-04-09 [1] Github (rstudio/rmarkdown@65dd144) 
#>  rprojroot     1.3-2       2018-01-03 [2] CRAN (R 3.5.3)                     
#>  sessioninfo   1.1.1       2018-11-05 [3] CRAN (R 3.5.1)                     
#>  sf          * 0.9-2       2020-04-09 [1] Github (r-spatial/sf@0b09452)      
#>  sp            1.4-1       2020-02-28 [1] CRAN (R 3.6.2)                     
#>  stringi       1.4.6       2020-02-17 [1] CRAN (R 3.6.2)                     
#>  stringr       1.4.0       2019-02-10 [2] standard (@1.4.0)                  
#>  testthat      2.3.2       2020-03-02 [1] CRAN (R 3.6.3)                     
#>  tibble        3.0.0       2020-03-30 [1] CRAN (R 3.6.3)                     
#>  units         0.6-6       2020-03-16 [1] CRAN (R 3.6.3)                     
#>  usethis       1.6.0       2020-04-09 [1] CRAN (R 3.6.3)                     
#>  utf8          1.1.4       2018-05-24 [2] CRAN (R 3.5.3)                     
#>  vctrs         0.2.99.9011 2020-04-14 [1] Github (r-lib/vctrs@1fcc3fb)       
#>  withr         2.1.2       2018-03-15 [2] CRAN (R 3.5.3)                     
#>  xfun          0.13        2020-04-13 [1] CRAN (R 3.6.3)                     
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 3.6.2)                     
#> 
#> [1] /home/robin/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
@rsbivand
Copy link
Member

@rsbivand rsbivand commented Apr 14, 2020

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Apr 14, 2020

The profiles of both don't reveal much FYI...

library(proffer)
pprof(st_transform(nc, 4326))
pprof(sp::spTransform(nc_sp, sp::CRS("+init=epsg:4326")))

image

image

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Apr 14, 2020

Just installing now with

install.packages("rgdal", repos = "http://R-Forge.R-project.org")
@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Apr 14, 2020

Interesting result, sf is now faster:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
nc = st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `/home/robin/R/x86_64-pc-linux-gnu-library/3.6/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> geographic CRS: NAD27
nc_sp = as(nc, "Spatial")
bench::mark(st_transform(nc, 4326), sp::spTransform(nc_sp, sp::CRS("+init=epsg:4326")), check = FALSE)
#> # A tibble: 2 x 6
#>   expression                                           min median `itr/sec`
#>   <bch:expr>                                         <bch> <bch:>     <dbl>
#> 1 st_transform(nc, 4326)                             100ms  102ms      9.84
#> 2 sp::spTransform(nc_sp, sp::CRS("+init=epsg:4326")) 769ms  769ms      1.30
#> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>

Created on 2020-04-14 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Ubuntu 18.04.4 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB:en                    
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-04-14                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                             
#>  assertthat    0.2.1       2019-03-21 [2] CRAN (R 3.6.0)                     
#>  backports     1.1.6       2020-04-05 [1] CRAN (R 3.6.3)                     
#>  bench         1.1.1       2020-01-13 [2] CRAN (R 3.6.2)                     
#>  callr         3.4.3       2020-03-28 [1] CRAN (R 3.6.3)                     
#>  class         7.3-16      2020-03-25 [2] CRAN (R 3.6.3)                     
#>  classInt      0.4-3       2020-04-06 [1] Github (r-spatial/classInt@d024051)
#>  cli           2.0.2       2020-02-28 [1] CRAN (R 3.6.2)                     
#>  crayon        1.3.4       2017-09-16 [2] standard (@1.3.4)                  
#>  DBI           1.1.0       2019-12-15 [2] CRAN (R 3.6.2)                     
#>  desc          1.2.0       2018-05-01 [2] standard (@1.2.0)                  
#>  devtools      2.3.0       2020-04-10 [1] CRAN (R 3.6.3)                     
#>  digest        0.6.25      2020-02-23 [1] CRAN (R 3.6.2)                     
#>  e1071         1.7-3       2019-11-26 [2] CRAN (R 3.6.1)                     
#>  ellipsis      0.3.0       2019-09-20 [3] CRAN (R 3.6.1)                     
#>  evaluate      0.14        2019-05-28 [2] CRAN (R 3.6.0)                     
#>  fansi         0.4.1       2020-01-08 [1] CRAN (R 3.6.2)                     
#>  fs            1.4.1       2020-04-04 [2] CRAN (R 3.6.3)                     
#>  glue          1.4.0       2020-04-03 [2] CRAN (R 3.6.3)                     
#>  highr         0.8         2019-03-20 [3] CRAN (R 3.5.3)                     
#>  htmltools     0.4.0.9003  2020-04-09 [1] Github (rstudio/htmltools@1a7d0dc) 
#>  KernSmooth    2.23-16     2019-10-15 [4] CRAN (R 3.6.1)                     
#>  knitr         1.28        2020-02-06 [1] CRAN (R 3.6.2)                     
#>  lattice       0.20-41     2020-04-02 [2] CRAN (R 3.6.3)                     
#>  lifecycle     0.2.0.9000  2020-03-16 [1] Github (r-lib/lifecycle@355dcba)   
#>  magrittr      1.5         2014-11-22 [2] CRAN (R 3.5.2)                     
#>  memoise       1.1.0       2017-04-21 [3] CRAN (R 3.5.0)                     
#>  pillar        1.4.3       2019-12-20 [1] CRAN (R 3.6.2)                     
#>  pkgbuild      1.0.6       2019-10-09 [2] CRAN (R 3.6.1)                     
#>  pkgconfig     2.0.3       2019-09-22 [2] CRAN (R 3.6.1)                     
#>  pkgload       1.0.2       2018-10-29 [3] CRAN (R 3.5.1)                     
#>  prettyunits   1.1.1       2020-01-24 [1] CRAN (R 3.6.2)                     
#>  processx      3.4.2       2020-02-09 [1] CRAN (R 3.6.3)                     
#>  profmem       0.5.0       2018-01-30 [2] CRAN (R 3.5.2)                     
#>  ps            1.3.2       2020-02-13 [1] CRAN (R 3.6.3)                     
#>  R6            2.4.1       2019-11-12 [2] CRAN (R 3.6.1)                     
#>  Rcpp          1.0.4.6     2020-04-09 [1] CRAN (R 3.6.3)                     
#>  remotes       2.1.1       2020-02-15 [1] CRAN (R 3.6.2)                     
#>  rgdal         1.5-6       2020-04-07 [1] R-Forge (R 3.6.3)                  
#>  rlang         0.4.5.9000  2020-03-25 [1] Github (r-lib/rlang@a90b04b)       
#>  rmarkdown     2.1.2       2020-04-09 [1] Github (rstudio/rmarkdown@65dd144) 
#>  rprojroot     1.3-2       2018-01-03 [2] CRAN (R 3.5.3)                     
#>  sessioninfo   1.1.1       2018-11-05 [3] CRAN (R 3.5.1)                     
#>  sf          * 0.9-2       2020-04-09 [1] Github (r-spatial/sf@0b09452)      
#>  sp            1.4-1       2020-02-28 [1] CRAN (R 3.6.2)                     
#>  stringi       1.4.6       2020-02-17 [1] CRAN (R 3.6.2)                     
#>  stringr       1.4.0       2019-02-10 [2] standard (@1.4.0)                  
#>  testthat      2.3.2       2020-03-02 [1] CRAN (R 3.6.3)                     
#>  tibble        3.0.0       2020-03-30 [1] CRAN (R 3.6.3)                     
#>  units         0.6-6       2020-03-16 [1] CRAN (R 3.6.3)                     
#>  usethis       1.6.0       2020-04-09 [1] CRAN (R 3.6.3)                     
#>  utf8          1.1.4       2018-05-24 [2] CRAN (R 3.5.3)                     
#>  vctrs         0.2.99.9011 2020-04-14 [1] Github (r-lib/vctrs@1fcc3fb)       
#>  withr         2.1.2       2018-03-15 [2] CRAN (R 3.5.3)                     
#>  xfun          0.13        2020-04-13 [1] CRAN (R 3.6.3)                     
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 3.6.2)                     
#> 
#> [1] /home/robin/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
@rsbivand
Copy link
Member

@rsbivand rsbivand commented Apr 15, 2020

But:

 library(spData)
data(house)
sf_house <- st_as_sf(house)
bench::mark(st_transform(sf_house, 4326), sp::spTransform(house, sp::CRS("+init=epsg:4326")), check = FALSE)

Here SpatialPoints are a single matrix, so no looping over SF objects.

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Apr 15, 2020

Interesting, I can reproduce your finding:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(spData)
data(house)
sf_house <- st_as_sf(house)
bench::mark(st_transform(sf_house, 4326), sp::spTransform(house, sp::CRS("+init=epsg:4326")), check = FALSE)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression                                           min median `itr/sec`
#>   <bch:expr>                                         <bch> <bch:>     <dbl>
#> 1 st_transform(sf_house, 4326)                       342ms  343ms      2.91
#> 2 sp::spTransform(house, sp::CRS("+init=epsg:4326")) 113ms  117ms      8.62
#> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>

Created on 2020-04-15 by the reprex package (v0.3.0)

I think I should change the issue title in any case: seems that there has been a regression in speed, not sure the extent to which it's dependent on size/type of geometry. Also worth benchmarking on older sf versions for a fair comparison I think. Any idea when this was introduced or has it always been a feature of sf to call PROJ from GDAL? Happy to do a benchmarking exercise if that would be useful covering some more of the parameter space.

@edzer
Copy link
Member

@edzer edzer commented Apr 15, 2020

or has it always been a feature of sf to call PROJ from GDAL?

yes: PROJ does not have an understanding of simple features; use sf_project to transform a matrix of points without going through GDAL.

PROJ itself has also undergone quite a bit of change the last few years.

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Apr 15, 2020

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(spData)
data(house)
sf_house <- st_as_sf(house)
house_coords <- as.matrix(sf::st_coordinates(sf_house))
head(sf::sf_project(st_crs(sf_house), to = st_crs(4326), pts = house_coords))
#>           [,1]     [,2]
#> [1,] -83.87964 41.41690
#> [2,] -83.87716 41.41720
#> [3,] -83.87271 41.41773
#> [4,] -83.86670 41.42522
#> [5,] -83.83824 41.42928
#> [6,] -83.86964 41.42953
sf1 = function() st_transform(sf_house, 4326)
sf2 = function() sf_project(st_crs(sf_house), to = st_crs(4326), pts = house_coords)
sp = function()  sp::spTransform(house, sp::CRS("+init=epsg:4326"))
bench::mark(sf1(), sf2(), sp(), check = FALSE)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sf1()       358.5ms  370.4ms      2.70    5.55MB     8.10
#> 2 sf2()        69.1ms   70.2ms     14.2   433.63KB     0   
#> 3 sp()        112.7ms  121.1ms      7.20    14.9MB     1.80

Created on 2020-04-15 by the reprex package (v0.3.0)

@Robinlovelace Robinlovelace changed the title Performance of st_transform vs spTransform Performance of st_transform Apr 18, 2020
@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented May 12, 2020

Just realised this is going stale. @edzer I'm satisfied that sf_project() is faster, just wondering if there's anything that can be done to allow users to benefit from that fast performance more frequently. Happy to close this general issue and re-open a specific one e.g. 'Document how sf_project() can be used to reproject point geometries' or just wait for a solution to come from elsewhere.

Question: with s2 work ongoing do you think coordinate transformation will become less commonly needed?

Context: the geo_projected() and geo_buffer() functions in stplanr no longer work and I'm wondering whether to try to update them so they can create working azimuthal equal area CRSs that work with PROJ7+ or to dump the approach altogether and embrace the brave new s2 world...

Mixing issues I know but in my experience context helps.

@edzer
Copy link
Member

@edzer edzer commented Jul 8, 2020

For what do you need projections, for realistic buffers? s2 has a fast spherical buffer, but they follow S2Cell boundaries:

library(sf) # branch s2
library(s2) # dev
b = s2_buffer_cells(s2_data_countries("United Kingdom"), 20000)
plot(st_as_sfc(b))

x

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 8, 2020

For what do you need projections, for realistic buffers?

Common use case in my work is finding a 10, 20 or 30 m buffer around road geometries. I think if the cell size is less than 1m it could work well. That doable with the new s2 branch? Impressive functionality already!

@edzer
Copy link
Member

@edzer edzer commented Jul 8, 2020

Yes, you can specify the degree of approximation.

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 9, 2020

Hi @edzer here's a reproducible example showing what I tried:

remotes::install_github("r-spatial/sf", ref = "s2")
library(sf)
cycleway_osm = osmdata::osmdata_sf(
  osmdata::add_osm_feature(
    opq = osmdata::opq("leeds"),
    key = "name",
    value = "Cycle Superhighway 1"
  )
)
cycleway = cycleway_osm$osm_lines
plot(cycleway$geometry)
system.time({
  buffer1 = stplanr::geo_buffer(cycleway, dist = 100)
})
plot(buffer1$geometry)
system.time({
  buffer2 = s2_buffer_cells(cycleway, 100)
})

That generates the following error message:

Error in s2_buffer_cells(cycleway, 100) : 
  could not find function "s2_buffer_cells"

Am I installing the wrong branch?

@edzer
Copy link
Member

@edzer edzer commented Jul 9, 2020

Yes, you need branch s2 from package sf, and package s2 master branch; maybe wait a few days, stuff is moving quickly now and close to 1.0.1 release of s2.

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 9, 2020

OK I'll wait...

@edzer
Copy link
Member

@edzer edzer commented Jul 15, 2020

@Robinlovelace , s2 1.0.1 and sf 0.9-5 are now on CRAN, both can be installed from source (binaries lack a week or so behind). Have a look at the (rather incomplete) 7th vignette: https://cran.r-project.org/web/packages/sf/vignettes/sf7.html ; buffer examples still need to be added, but are interfaced through sf::st_buffer.

edzer added a commit that referenced this issue Jul 15, 2020
@edzer
Copy link
Member

@edzer edzer commented Jul 15, 2020

Sorry, that didn't work out of the box, but now we have:

library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
cycleway_osm = osmdata::osmdata_sf(
  osmdata::add_osm_feature(
    opq = osmdata::opq("leeds"),
    key = "name",
    value = "Cycle Superhighway 1"
  )
)
cycleway = cycleway_osm$osm_lines
system.time({
  buffer1 = stplanr::geo_buffer(cycleway, dist = 100)
})
#    user  system elapsed 
#   0.907   0.028   0.936 
system.time({
  buffer2 = st_buffer(cycleway, .01) # wrong: assumes longlat=x/y
})
# dist is assumed to be in decimal degrees (arc_degrees).
#    user  system elapsed 
#   0.019   0.000   0.018 
# Warning message:
# In st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle = endCapStyle,  :
#   st_buffer does not correctly buffer longitude/latitude data
sf_use_s2(TRUE)
system.time({
  buffer3 = st_buffer(cycleway, 100, max_cells = 200)
})
#    user  system elapsed 
#   0.361   0.000   0.361 
png("/tmp/x.png")
par(mfrow = c(2, 1), mar = rep(0,4))
plot(cycleway$geometry)
plot(buffer1$geometry, add = TRUE, border = 'red')
plot(cycleway$geometry)
plot(buffer3$geometry, add = TRUE, border = 'red')

x

Setting max_cells to 200 is a wild guess, but this is a tuning parameter to make the thing faster (and slower).

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 15, 2020

Great news, it works!

remotes::install_github("r-spatial/s2") # lots of c++ compiling...
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 's2' from a github remote, the SHA1 (a4d0fa49) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_github("r-spatial/sf")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'sf' from a github remote, the SHA1 (602a4688) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
cycleway_osm = osmdata::osmdata_sf(
  osmdata::add_osm_feature(
    opq = osmdata::opq("leeds"),
    key = "name",
    value = "Cycle Superhighway 1"
  )
)
cycleway = cycleway_osm$osm_lines
system.time({
  buffer1 = stplanr::geo_buffer(cycleway, dist = 100)
})
#>    user  system elapsed 
#>   0.898   0.052   0.964
system.time({
  buffer2 = st_buffer(cycleway, .01) # wrong: assumes longlat=x/y
})
#> Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
#> endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
#> dist is assumed to be in decimal degrees (arc_degrees).
#>    user  system elapsed 
#>   0.019   0.000   0.018
sf_use_s2(TRUE)
system.time({
  buffer3 = st_buffer(cycleway, 100, max_cells = 200)
})
#>    user  system elapsed 
#>   0.343   0.001   0.347
system.time({
  buffer4 = st_buffer(cycleway, 100, max_cells = 500)
})
#>    user  system elapsed 
#>   0.827   0.000   0.836
plot(cycleway$geometry)
plot(buffer1$geometry, add = TRUE, border = 'red', lwd = 3)
plot(buffer3$geometry, add = TRUE, border = 'blue', lwd = 2)
plot(buffer4$geometry, add = TRUE, border = 'grey', lwd = 1)

plot(buffer1$geometry)

plot(buffer3$geometry)

plot(buffer4$geometry)

mapview::mapview(buffer4) +
  mapview::mapview(buffer1, zcol = "highway")

Created on 2020-07-15 by the reprex package (v0.3.0)

The result of the final plot suggests that a max cell of 500 results in ~10 m resolution result. I wouldn't say this overcomes the performance issue with st_transform() but is a very impressive new piece of functionality, great to see this in sf!

image

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 15, 2020

The timings are heavily dependent on max_cells:

> system.time({
+   buffer5 = st_buffer(cycleway, 100, max_cells = 1000)
+ })
   user  system elapsed 
  1.745   0.015   1.772 
> system.time({
+   buffer6 = st_buffer(cycleway, 100, max_cells = 2000)
+ })
   user  system elapsed 
  3.373   0.019   3.402 

buffer6 looks like this:

image

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 15, 2020

Appropriate max_cells values clearly depends on the complexity of the polygon. More results below on the unioned object, the timings are way better like this so will close the issue for now:

remotes::install_github("r-spatial/s2") # lots of c++ compiling...
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 's2' from a github remote, the SHA1 (a4d0fa49) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_github("r-spatial/sf")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'sf' from a github remote, the SHA1 (602a4688) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
cycleway_osm = osmdata::osmdata_sf(
  osmdata::add_osm_feature(
    opq = osmdata::opq("leeds"),
    key = "name",
    value = "Cycle Superhighway 1"
  )
)
cycleway = cycleway_osm$osm_lines 
geometry = cycleway_osm$osm_lines %>% 
  sf::st_union()
cycleway = st_sf(data.frame(name = "cycleway"), geometry) 
                 
plot(cycleway)

system.time({
  buffer1 = stplanr::geo_buffer(cycleway, dist = 100)
})
#>    user  system elapsed 
#>   0.840   0.048   0.888
system.time({
  buffer2 = st_buffer(cycleway, .01) # wrong: assumes longlat=x/y
})
#> Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
#> endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
#> dist is assumed to be in decimal degrees (arc_degrees).
#>    user  system elapsed 
#>   0.148   0.000   0.148
sf_use_s2(TRUE)
system.time({
  buffer3 = st_buffer(cycleway, 100, max_cells = 200)
})
#>    user  system elapsed 
#>   0.019   0.000   0.019
system.time({
  buffer4 = st_buffer(cycleway, 100, max_cells = 500)
})
#>    user  system elapsed 
#>   0.030   0.000   0.031
system.time({
  buffer5 = st_buffer(cycleway, 100, max_cells = 1000)
})
#>    user  system elapsed 
#>   0.044   0.000   0.044
system.time({
  buffer6 = st_buffer(cycleway, 100, max_cells = 2000)
})
#>    user  system elapsed 
#>   0.094   0.000   0.095

plot(buffer1$geometry)

plot(buffer2$geometry)

plot(buffer3$geometry)

plot(buffer4$geometry)

plot(buffer5$geometry)

plot(buffer6$geometry)

Created on 2020-07-15 by the reprex package (v0.3.0)

@edzer
Copy link
Member

@edzer edzer commented Jul 15, 2020

Yes. Another thought, which you probably also tried: in many cases buffers are computed to intersect them in the next step with something else, like grid cell centers on a fine grid. In that case, you can get away without buffers, but only compute which grid cells are within a certain distance from a set of features. That computation is potentially much cheaper. In other words: I believe most buffers are computed without a need to (explicitly) compute them, but because this way we think we understand the problem better.

@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 16, 2020

Thanks for the follow-up @edzer. A couple of follow-on thoughts/questions:

  • Can max_cells take a numeric vector? This example shows very different resolutions for different sized polygons. That will affect the usefulness of buffer operations on objects more than one non-uniform features such as the cycleway above.
  • How do you compute the grid cells of a given resolution that are within a given distance of an object? That sounds like a good solution.
@edzer
Copy link
Member

@edzer edzer commented Jul 16, 2020

Here is a simple example of selecting points within a buffer without ever using the actual buffer (which is just shown for clarity):

library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
sf_use_s2(TRUE)

set.seed(131)
n = 1000
pts = st_as_sf(data.frame(x = runif(n), y = runif(n)), coords = c("x", "y"), crs = 4326)
# unit square in degrees: 111 x 111 km

size = 15000
w = st_is_within_distance(pts[1,], pts, size)[[1]]

plot(pts)
plot(pts[1,], add = TRUE, pch = 16, cex = 2, col = 'blue')
plot(st_buffer(pts[1,], size), add = TRUE)
plot(pts[w,], add = TRUE, col = 'red')

x

st_is_within_distance when using s2 is now fast (as indexed, although the lwgeom versions isn't bad either)

edzer added a commit that referenced this issue Jul 16, 2020
@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 17, 2020

Yes, that makes sense,

x[y, , op = st_within_distance, dist = distance_in_meters]

Does indeed solve the same problem most of the time. I look forward to benchmarks (on that note do you want any in the package, either in a vignette or as tests?).

Many thanks, great work!

@edzer
Copy link
Member

@edzer edzer commented Jul 29, 2020

What did you mean by "most of the time"?

Yes, some kind of benchmark would be good in the new sf vignette 7, for unprojected data comparing

  • GEOS route: st_transform, buffer with radius r, intersect within buffer, aggregate
  • S2 route: select with st_within_distance with distance r, aggregate
@Robinlovelace
Copy link
Contributor Author

@Robinlovelace Robinlovelace commented Jul 29, 2020

What did you mean by "most of the time"?

It solves the problem that buffers usually solve in use cases that I can think of. There are some use cases when it's useful to visualise and aggregate buffer polygons, to communicate work or group nearly touching linestings for example, when polygonal buffers are needed to solve specific use cases.

Yes, some kind of benchmark would be good in the new sf vignette 7, for unprojected data comparing

I'm up for giving that a go.

Robinlovelace added a commit to Robinlovelace/sf that referenced this issue Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.