Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues vec_rbind()? #1475

Open
mgirlich opened this issue Oct 9, 2021 · 1 comment
Open

Performance issues vec_rbind()? #1475

mgirlich opened this issue Oct 9, 2021 · 1 comment

Comments

@mgirlich
Copy link
Contributor

mgirlich commented Oct 9, 2021

When binding many 1 row tibbles vec_c() is 20% to 40% faster than vec_rbind(). I would have expected vec_rbind() to be faster as this seems to be the main purpose of it.

library(vctrs)

row_list1 <- vec_rep(vec_chop(mtcars), 1e3)
row_list10 <- vec_rep(vec_chop(mtcars), 10e3)
ptype <- vec_ptype(row_list1[[1]])

bench::mark(
  vec_c1 = vec_c(!!!row_list1, .ptype = ptype),
  vec_rbind1 = vec_rbind(!!!row_list1, .ptype = ptype),
  check = TRUE,
  iterations = 3
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_c1        151ms    217ms      4.70    8.47MB     6.26
#> 2 vec_rbind1    161ms    208ms      4.74    7.49MB     7.90

bench::mark(
  vec_c10 = vec_c(!!!row_list10, .ptype = ptype),
  vec_rbind10 = vec_rbind(!!!row_list10, .ptype = ptype),
  check = TRUE,
  iterations = 3
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vec_c10        1.81s    2.04s     0.507    87.7MB     1.01
#> 2 vec_rbind10    2.65s    2.72s     0.364    71.8MB     1.34

Created on 2021-10-09 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       macOS Big Sur 10.16         
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2021-10-09                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                            
#>  backports     1.2.1       2020-12-09 [1] CRAN (R 4.1.0)                    
#>  bench         1.1.1       2020-01-13 [1] CRAN (R 4.1.0)                    
#>  cli           3.0.1.9000  2021-10-07 [1] Github (r-lib/cli@2808311)        
#>  crayon        1.4.1       2021-02-08 [1] CRAN (R 4.1.0)                    
#>  digest        0.6.28      2021-09-23 [1] CRAN (R 4.1.0)                    
#>  ellipsis      0.3.2       2021-04-29 [1] CRAN (R 4.1.0)                    
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 4.1.0)                    
#>  fansi         0.5.0       2021-05-25 [1] CRAN (R 4.1.0)                    
#>  fastmap       1.1.0       2021-01-25 [1] CRAN (R 4.1.0)                    
#>  fs            1.5.0       2020-07-31 [1] CRAN (R 4.1.0)                    
#>  glue          1.4.2       2020-08-27 [1] CRAN (R 4.1.0)                    
#>  highr         0.9         2021-04-16 [1] CRAN (R 4.1.0)                    
#>  htmltools     0.5.2       2021-08-25 [1] CRAN (R 4.1.0)                    
#>  knitr         1.36        2021-09-29 [1] CRAN (R 4.1.0)                    
#>  lifecycle     1.0.1       2021-09-24 [1] CRAN (R 4.1.0)                    
#>  magrittr      2.0.1       2020-11-17 [1] CRAN (R 4.1.0)                    
#>  pillar        1.6.3       2021-09-26 [1] CRAN (R 4.1.0)                    
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 4.1.0)                    
#>  profmem       0.6.0       2020-12-13 [1] CRAN (R 4.1.0)                    
#>  purrr         0.3.4       2020-04-17 [1] CRAN (R 4.1.0)                    
#>  R.cache       0.15.0      2021-04-30 [1] CRAN (R 4.1.0)                    
#>  R.methodsS3   1.8.1       2020-08-26 [1] CRAN (R 4.1.0)                    
#>  R.oo          1.24.0      2020-08-26 [1] CRAN (R 4.1.0)                    
#>  R.utils       2.11.0      2021-09-26 [1] CRAN (R 4.1.0)                    
#>  reprex        2.0.1       2021-08-05 [1] CRAN (R 4.1.0)                    
#>  rlang         0.99.0.9000 2021-10-09 [1] Github (r-lib/rlang@d0dee64)      
#>  rmarkdown     2.11        2021-09-14 [1] CRAN (R 4.1.0)                    
#>  rstudioapi    0.13        2020-11-12 [1] CRAN (R 4.1.0)                    
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 4.1.0)                    
#>  stringi       1.7.5       2021-10-04 [1] CRAN (R 4.1.0)                    
#>  stringr       1.4.0.9000  2021-08-23 [1] Github (tidyverse/stringr@6670a37)
#>  styler        1.6.2       2021-09-23 [1] CRAN (R 4.1.0)                    
#>  tibble        3.1.5       2021-09-30 [1] CRAN (R 4.1.0)                    
#>  utf8          1.2.2       2021-07-24 [1] CRAN (R 4.1.0)                    
#>  vctrs       * 0.3.8.9001  2021-10-09 [1] Github (r-lib/vctrs@199da1a)      
#>  withr         2.4.2       2021-04-18 [1] CRAN (R 4.1.0)                    
#>  xfun          0.26        2021-09-14 [1] CRAN (R 4.1.0)                    
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 4.1.0)                    
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
@wlandau
Copy link

wlandau commented Sep 20, 2023

Could this have to do with how names are handled?

For my own use case, I have many one-row tibbles, and I would like to call vec_rbind() internally in a package (c.f. wlandau/crew#123). The package makes sure all the names are already consistent and correct, so I do not need any name checking or name repair. On my machine, the fastest supported name repair option is responsible for 50-60% of the execution time. It would be great to be able to disable name processing completely and cut out the overhead.

packageVersion("data.table")
#> [1] ‘1.14.8’
packageVersion("vctrs")
#> [1] ‘0.6.3’
result <- crew:::monad_tibble(crew::crew_eval(12))
list <- replicate(1e6, result, simplify = FALSE)
system.time(data.table::rbindlist(list, use.names = FALSE))
#>    user  system elapsed 
#>   0.924   0.014   0.940
system.time(vctrs::vec_rbind(list, .name_repair = "universal_quiet"))
#>    user  system elapsed 
#>   1.338   0.061   1.400
proffer::pprof(vctrs::vec_rbind(list, .name_repair = "universal_quiet"))

Screenshot 2023-09-20 at 3 06 00 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants