Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow performance of mutate.() with by grouping for many groups #82

Closed
marianschmidt opened this issue Jun 8, 2020 · 7 comments
Closed

Comments

@marianschmidt
Copy link

marianschmidt commented Jun 8, 2020

As somebody who likes the tidyverse syntax and requires the data.table performance while struggling with its modify-by-reference, I was very happy finding tidytable. Thanks for this great package!
I am working with large datasets (1-10M rows, 50-500 cols) that often require mutating of grouped data.
In this scenario however, I found tidytable::mutate.() to be much slower than the data.table equivalent, and still considerably slower than the dplyr alternative.

library(magrittr)
library(data.table)

rows <- 1000000
ids  <- 50000

#simple data set with many different IDs and 1M rows, 3 cols
df <- data.frame(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
                 bike = sample(c("mountain", "allround", "road", "bmx"), size = rows, replace = TRUE),
                 year = sample(1980:2020, size = rows, replace = TRUE),
                 stringsAsFactors = FALSE)

results <- bench::mark(
  #first run with tidytable
  tidytable = df %>%
    #sort by case id, time and item
    tidytable::arrange.(id, year, bike)%>%
    #calculate new item number variable #group by case id
    tidytable::mutate.(bike_number = as.integer(tidytable::row_number.()), by = id),
  #second run with dplyr
  dplyr = df %>%
    #sort by case id, time and item
    dplyr::arrange(id, year, bike)%>%
    #calculate new item number variable #group by case id
    dplyr::group_by(id) %>%
    dplyr::mutate(bike_number = as.integer(dplyr::row_number())) %>%
    dplyr::ungroup(),
  #third run with data.table
  data.table = data.table::copy(df) %>%
    data.table::as.data.table(.) %>%
    #sort by case id, time and item
    .[base::order(nchar(.[, id]), .[, id], .[, year], .[, bike], method = "radix")] %>%
    #calculate new item number variable #group by case id
    .[, bike_number := as.integer(seq_len(.N)), by=.[, id]] %>%
    .[],
  iterations = 3, filter_gc = FALSE, check = FALSE
)

ggplot2::autoplot(results)
#> Lade nötigen Namensraum: tidyr

Created on 2020-06-08 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2020-06-08                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source                                
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)                        
#>  backports     1.1.7   2020-05-13 [1] CRAN (R 3.6.3)                        
#>  beeswarm      0.2.3   2016-04-25 [1] CRAN (R 3.6.0)                        
#>  bench         1.1.1   2020-01-13 [1] CRAN (R 3.6.2)                        
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 3.6.3)                        
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.3)                        
#>  colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.1)                        
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)                        
#>  curl          4.3     2019-12-02 [1] CRAN (R 3.6.1)                        
#>  data.table  * 1.12.9  2020-03-04 [1] Github (Rdatatable/data.table@b1b1832)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.1)                        
#>  devtools      2.3.0   2020-04-10 [1] CRAN (R 3.6.3)                        
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.2)                        
#>  dplyr         1.0.0   2020-05-29 [1] CRAN (R 3.6.3)                        
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.3)                        
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)                        
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.2)                        
#>  farver        2.0.3   2020-01-16 [1] CRAN (R 3.6.2)                        
#>  fs            1.4.1   2020-04-04 [1] CRAN (R 3.6.3)                        
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 3.6.1)                        
#>  ggbeeswarm    0.6.0   2017-08-07 [1] CRAN (R 3.6.3)                        
#>  ggplot2       3.3.1   2020-05-28 [1] CRAN (R 3.6.3)                        
#>  glue          1.4.1   2020-05-13 [1] CRAN (R 3.6.3)                        
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.1)                        
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.1)                        
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)                        
#>  httr          1.4.1   2019-08-05 [1] CRAN (R 3.6.1)                        
#>  knitr         1.28    2020-02-06 [1] CRAN (R 3.6.2)                        
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.3)                        
#>  magrittr    * 1.5     2014-11-22 [1] CRAN (R 3.6.1)                        
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.1)                        
#>  mime          0.9     2020-02-04 [1] CRAN (R 3.6.2)                        
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.1)                        
#>  pillar        1.4.4   2020-05-05 [1] CRAN (R 3.6.3)                        
#>  pkgbuild      1.0.8   2020-05-07 [1] CRAN (R 3.6.3)                        
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.1)                        
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 3.6.3)                        
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)                        
#>  processx      3.4.2   2020-02-09 [1] CRAN (R 3.6.2)                        
#>  profmem       0.5.0   2018-01-30 [1] CRAN (R 3.6.2)                        
#>  ps            1.3.3   2020-05-08 [1] CRAN (R 3.6.3)                        
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 3.6.3)                        
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)                        
#>  Rcpp          1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3)                        
#>  remotes       2.1.1   2020-02-15 [1] CRAN (R 3.6.2)                        
#>  rlang         0.4.6   2020-05-02 [1] CRAN (R 3.6.3)                        
#>  rmarkdown     2.2     2020-05-31 [1] CRAN (R 3.6.3)                        
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)                        
#>  scales        1.1.1   2020-05-11 [1] CRAN (R 3.6.3)                        
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)                        
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 3.6.2)                        
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.1)                        
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 3.6.3)                        
#>  tibble        3.0.1   2020-04-20 [1] CRAN (R 3.6.3)                        
#>  tidyr         1.1.0   2020-05-20 [1] CRAN (R 3.6.3)                        
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 3.6.3)                        
#>  tidytable     0.5.1   2020-05-29 [1] CRAN (R 3.6.3)                        
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 3.6.3)                        
#>  vctrs         0.3.0   2020-05-11 [1] CRAN (R 3.6.3)                        
#>  vipor         0.4.5   2017-03-22 [1] CRAN (R 3.6.3)                        
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 3.6.3)                        
#>  xfun          0.14    2020-05-20 [1] CRAN (R 3.6.3)                        
#>  xml2          1.3.2   2020-04-23 [1] CRAN (R 3.6.3)                        
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)                        
#> 
#> [1] C:/Users/usr/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.3/library
@markfairbanks
Copy link
Owner

markfairbanks commented Jun 8, 2020

Can you install the dev version and let me know if you still have these issues? I actually found an issue that was slowing down pretty much every function in tidytable since v0.5.0. It somehow snuck past my normal speed tests. I just finished fixing it a few days ago.

devtools::install_github("markfairbanks/tidytable")

Here were the times I got when I ran your example.

One note - I made the dataset a data.table for the tidytable and data.table timings:

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)

rows <- 1000000
ids  <- 50000

#simple data set with many different IDs and 1M rows, 3 cols
df <- tibble(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
             bike = sample(c("mountain", "allround", "road", "bmx"), size = rows, replace = TRUE),
             year = sample(1980:2020, size = rows, replace = TRUE),
             stringsAsFactors = FALSE)

dt <- as.data.table(df)

results <- bench::mark(
  #first run with tidytable
  tidytable = dt %>%
    #sort by case id, time and item
    tidytable::arrange.(id, year, bike)%>%
    #calculate new item number variable #group by case id
    tidytable::mutate.(bike_number = as.integer(tidytable::row_number.()), by = id),
  #second run with dplyr
  dplyr = df %>%
    #sort by case id, time and item
    dplyr::arrange(id, year, bike)%>%
    #calculate new item number variable #group by case id
    dplyr::group_by(id) %>%
    dplyr::mutate(bike_number = as.integer(dplyr::row_number())) %>%
    dplyr::ungroup(),
  #third run with data.table
  data.table = data.table::copy(dt) %>%
    # data.table::as.data.table(.) %>%
    #sort by case id, time and item
    .[base::order(nchar(.[, id]), .[, id], .[, year], .[, bike], method = "radix")] %>%
    #calculate new item number variable #group by case id
    .[, bike_number := as.integer(seq_len(.N)), by=.[, id]] %>%
    .[],
  iterations = 3, filter_gc = FALSE, check = FALSE
)

ggplot2::autoplot(results)

@marianschmidt
Copy link
Author

With the current dev version I can reproduce the much improved results. Added some more scenarios for performance tests. Do you consider releasing this improvement soon?

#performance test with various scenarios

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
set.seed(2048)

results_new <- bench::press(
    rows = c(100000, 1000000, 1e7),
    ids = c(1000, 10000, 100000),
    {df <- tibble(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
                  bike = sample(c("mountain", "allround", "road", "bmx"), size = rows, replace = TRUE),
                  year = sample(1980:2020, size = rows, replace = TRUE))
    dt <- as.data.table(df)
      bench::mark(
          #first run with tidytable
          tidytable = dt %>%
            #sort by case id, time and item
            tidytable::arrange.(id, year, bike)%>%
            #calculate new item number variable #group by case id
            tidytable::mutate.(bike_number = as.integer(tidytable::row_number.()), by = id),
          #second run with dplyr
          dplyr = df %>%
            #sort by case id, time and item
            dplyr::arrange(id, year, bike)%>%
            #calculate new item number variable #group by case id
            dplyr::group_by(id) %>%
            dplyr::mutate(bike_number = as.integer(dplyr::row_number())) %>%
            dplyr::ungroup(),
          #third run with data.table
          data.table = data.table::copy(dt) %>%
            #sort by case id, time and item
            .[base::order(nchar(.[, id]), .[, id], .[, year], .[, bike], method = "radix")] %>%
            #calculate new item number variable #group by case id
            .[, bike_number := as.integer(seq_len(.N)), by=.[, id]] %>%
            .[],
          iterations = 3, filter_gc = FALSE, check = FALSE
      )
    }
  )
#> Running with:
#>       rows    ids
#> 1   100000   1000
#> 2  1000000   1000
#> 3 10000000   1000
#> 4   100000  10000
#> 5  1000000  10000
#> 6 10000000  10000
#> 7   100000 100000
#> 8  1000000 100000
#> 9 10000000 100000

  ggplot2::autoplot(results_new)

Created on 2020-06-09 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2020-06-09                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source                                  
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)                          
#>  backports     1.1.7   2020-05-13 [1] CRAN (R 3.6.3)                          
#>  beeswarm      0.2.3   2016-04-25 [1] CRAN (R 3.6.0)                          
#>  bench         1.1.1   2020-01-13 [1] CRAN (R 3.6.2)                          
#>  blob          1.2.1   2020-01-20 [1] CRAN (R 3.6.3)                          
#>  broom         0.5.6   2020-04-20 [1] CRAN (R 3.6.3)                          
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 3.6.3)                          
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.6.1)                          
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.3)                          
#>  colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.1)                          
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)                          
#>  curl          4.3     2019-12-02 [1] CRAN (R 3.6.1)                          
#>  data.table  * 1.12.9  2020-03-04 [1] Github (Rdatatable/data.table@b1b1832)  
#>  DBI           1.1.0   2019-12-15 [1] CRAN (R 3.6.1)                          
#>  dbplyr        1.4.4   2020-05-27 [1] CRAN (R 3.6.3)                          
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.1)                          
#>  devtools      2.3.0   2020-04-10 [1] CRAN (R 3.6.3)                          
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.2)                          
#>  dplyr       * 1.0.0   2020-05-29 [1] CRAN (R 3.6.3)                          
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.3)                          
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)                          
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.2)                          
#>  farver        2.0.3   2020-01-16 [1] CRAN (R 3.6.2)                          
#>  forcats     * 0.5.0   2020-03-01 [1] CRAN (R 3.6.3)                          
#>  fs            1.4.1   2020-04-04 [1] CRAN (R 3.6.3)                          
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 3.6.1)                          
#>  ggbeeswarm    0.6.0   2017-08-07 [1] CRAN (R 3.6.3)                          
#>  ggplot2     * 3.3.1   2020-05-28 [1] CRAN (R 3.6.3)                          
#>  glue          1.4.1   2020-05-13 [1] CRAN (R 3.6.3)                          
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.1)                          
#>  haven         2.3.1   2020-06-01 [1] CRAN (R 3.6.3)                          
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.1)                          
#>  hms           0.5.3   2020-01-08 [1] CRAN (R 3.6.2)                          
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)                          
#>  httr          1.4.1   2019-08-05 [1] CRAN (R 3.6.1)                          
#>  jsonlite      1.6.1   2020-02-02 [1] CRAN (R 3.6.2)                          
#>  knitr         1.28    2020-02-06 [1] CRAN (R 3.6.2)                          
#>  lattice       0.20-38 2018-11-04 [2] CRAN (R 3.6.3)                          
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.3)                          
#>  lubridate     1.7.8   2020-04-06 [1] CRAN (R 3.6.3)                          
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.1)                          
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.1)                          
#>  mime          0.9     2020-02-04 [1] CRAN (R 3.6.2)                          
#>  modelr        0.1.8   2020-05-19 [1] CRAN (R 3.6.3)                          
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.1)                          
#>  nlme          3.1-144 2020-02-06 [2] CRAN (R 3.6.3)                          
#>  pillar        1.4.4   2020-05-05 [1] CRAN (R 3.6.3)                          
#>  pkgbuild      1.0.8   2020-05-07 [1] CRAN (R 3.6.3)                          
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.1)                          
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 3.6.3)                          
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)                          
#>  processx      3.4.2   2020-02-09 [1] CRAN (R 3.6.2)                          
#>  profmem       0.5.0   2018-01-30 [1] CRAN (R 3.6.2)                          
#>  ps            1.3.3   2020-05-08 [1] CRAN (R 3.6.3)                          
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 3.6.3)                          
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)                          
#>  Rcpp          1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3)                          
#>  readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.6.1)                          
#>  readxl        1.3.1   2019-03-13 [1] CRAN (R 3.6.1)                          
#>  remotes       2.1.1   2020-02-15 [1] CRAN (R 3.6.2)                          
#>  reprex        0.3.0   2019-05-16 [1] CRAN (R 3.6.1)                          
#>  rlang         0.4.6   2020-05-02 [1] CRAN (R 3.6.3)                          
#>  rmarkdown     2.2     2020-05-31 [1] CRAN (R 3.6.3)                          
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)                          
#>  rvest         0.3.5   2019-11-08 [1] CRAN (R 3.6.1)                          
#>  scales        1.1.1   2020-05-11 [1] CRAN (R 3.6.3)                          
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)                          
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 3.6.2)                          
#>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 3.6.1)                          
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 3.6.3)                          
#>  tibble      * 3.0.1   2020-04-20 [1] CRAN (R 3.6.3)                          
#>  tidyr       * 1.1.0   2020-05-20 [1] CRAN (R 3.6.3)                          
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 3.6.3)                          
#>  tidytable   * 0.5.1.9 2020-06-09 [1] Github (markfairbanks/tidytable@c133581)
#>  tidyverse   * 1.3.0   2019-11-21 [1] CRAN (R 3.6.1)                          
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 3.6.3)                          
#>  utf8          1.1.4   2018-05-24 [1] CRAN (R 3.6.1)                          
#>  vctrs         0.3.1   2020-06-05 [1] CRAN (R 3.6.3)                          
#>  vipor         0.4.5   2017-03-22 [1] CRAN (R 3.6.3)                          
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 3.6.3)                          
#>  xfun          0.14    2020-05-20 [1] CRAN (R 3.6.3)                          
#>  xml2          1.3.2   2020-04-23 [1] CRAN (R 3.6.3)                          
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)                          
#> 
#> [1] C:/Users/usr/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.3/library

@markfairbanks
Copy link
Owner

Awesome, good to hear that the performance issue is fixed.

Do you consider releasing this improvement soon?

Yep - my goal is to submit to CRAN this weekend. I'll keep you updated and let you know when it's accepted.

@markfairbanks
Copy link
Owner

markfairbanks commented Jun 15, 2020

@marianschmidt FYI the CRAN submission of v0.5.2 has been put on hold bc CRAN changed their documentation requirements sometime in the past week or two. My initial submission a couple days ago was rejected bc of this change.

Once r-lib/roxygen2#1108 is fixed I’ll submit to CRAN again. There are quite a few packages that are having this same problem, but it looks like it will be fixed soon! As far as I can tell it will be fixed in the next day or two

@marianschmidt
Copy link
Author

@markfairbanks Thanks a lot for your efforts. Feel free to close this issue whenever convenient for you; I consider it closed.
btw, I did a lot more testing and reimplemented my tidyverse functions in tidytable. Performance is really incredible and since I had troubles with correctly translating my own functions to data.table, your package is such a great help.
During the process, it was quite hard to find adequate replacements for the .data pronoun from rlang because simply replacing dplyr::mutate() with tidytable::mutate.() didn't do it. I now find myself using rlang::tidy_eval() quite regularly and noticed that tidytable requires the strict passing of symbols in order to retrieve dataframe columns. Now everything I used to do in tidyverse, also works in tidytable. Thanks.

@markfairbanks
Copy link
Owner

markfairbanks commented Jun 19, 2020

@marianschmidt Glad the package is working out well!

As far as the .data pronoun, you can use data.table's version .SD.

And if you want to specify that you are using a variable from the global environment, you can just unquote it using !!.

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)

test_df <- data.table(x = c(1, 1, 1))

x <- 5

# Using tidytable
test_df %>%
  mutate.(data_x_plus_global_x = .SD$x + !!x)
#>    x data_x_plus_global_x
#> 1: 1                    6
#> 2: 1                    6
#> 3: 1                    6

# Using the tidyverse (version 1)
test_df %>%
  mutate(data_x_plus_global_x = .data$x + !!x)
#>    x data_x_plus_global_x
#> 1: 1                    6
#> 2: 1                    6
#> 3: 1                    6

# Using the tidyverse (version 2)
test_df %>%
  mutate(data_x_plus_global_x = .data$x + .env$x)
#>    x data_x_plus_global_x
#> 1: 1                    6
#> 2: 1                    6
#> 3: 1                    6

@markfairbanks
Copy link
Owner

markfairbanks commented Jun 26, 2020

@marianschmidt tidytable v0.5.2 is now up on CRAN!

FYI there is a small API change - the by argument has been renamed to .by. Using by causes a warning, but will still work for a couple months or so.

library(tidytable, warn.conflicts = FALSE)

test_df <- data.table(x = 1:3, y = c("a", "a", "b"))

# Using `by` causes a warning
test_df %>%
  summarize.(avg_x = mean(x), by = y)
#> Warning: The `by` argument of `summarize.()` is deprecated as of tidytable 0.5.2.
#> Please use the `.by` argument instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#>    y avg_x
#> 1: a   1.5
#> 2: b   3.0

# Using `.by` works normally
test_df %>%
  summarize.(avg_x = mean(x), .by = y)
#>    y avg_x
#> 1: a   1.5
#> 2: b   3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants