slow performance of mutate.() with by grouping for many groups #82

marianschmidt · 2020-06-08T21:32:39Z

As somebody who likes the tidyverse syntax and requires the data.table performance while struggling with its modify-by-reference, I was very happy finding tidytable. Thanks for this great package!
I am working with large datasets (1-10M rows, 50-500 cols) that often require mutating of grouped data.
In this scenario however, I found tidytable::mutate.() to be much slower than the data.table equivalent, and still considerably slower than the dplyr alternative.

library(magrittr)
library(data.table)

rows <- 1000000
ids  <- 50000

#simple data set with many different IDs and 1M rows, 3 cols
df <- data.frame(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
                 bike = sample(c("mountain", "allround", "road", "bmx"), size = rows, replace = TRUE),
                 year = sample(1980:2020, size = rows, replace = TRUE),
                 stringsAsFactors = FALSE)

results <- bench::mark(
  #first run with tidytable
  tidytable = df %>%
    #sort by case id, time and item
    tidytable::arrange.(id, year, bike)%>%
    #calculate new item number variable #group by case id
    tidytable::mutate.(bike_number = as.integer(tidytable::row_number.()), by = id),
  #second run with dplyr
  dplyr = df %>%
    #sort by case id, time and item
    dplyr::arrange(id, year, bike)%>%
    #calculate new item number variable #group by case id
    dplyr::group_by(id) %>%
    dplyr::mutate(bike_number = as.integer(dplyr::row_number())) %>%
    dplyr::ungroup(),
  #third run with data.table
  data.table = data.table::copy(df) %>%
    data.table::as.data.table(.) %>%
    #sort by case id, time and item
    .[base::order(nchar(.[, id]), .[, id], .[, year], .[, bike], method = "radix")] %>%
    #calculate new item number variable #group by case id
    .[, bike_number := as.integer(seq_len(.N)), by=.[, id]] %>%
    .[],
  iterations = 3, filter_gc = FALSE, check = FALSE
)

ggplot2::autoplot(results)
#> Lade nötigen Namensraum: tidyr

^{Created on 2020-06-08 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2020-06-08                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source                                
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)                        
#>  backports     1.1.7   2020-05-13 [1] CRAN (R 3.6.3)                        
#>  beeswarm      0.2.3   2016-04-25 [1] CRAN (R 3.6.0)                        
#>  bench         1.1.1   2020-01-13 [1] CRAN (R 3.6.2)                        
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 3.6.3)                        
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.3)                        
#>  colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.1)                        
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)                        
#>  curl          4.3     2019-12-02 [1] CRAN (R 3.6.1)                        
#>  data.table  * 1.12.9  2020-03-04 [1] Github (Rdatatable/data.table@b1b1832)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.1)                        
#>  devtools      2.3.0   2020-04-10 [1] CRAN (R 3.6.3)                        
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.2)                        
#>  dplyr         1.0.0   2020-05-29 [1] CRAN (R 3.6.3)                        
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.3)                        
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)                        
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.2)                        
#>  farver        2.0.3   2020-01-16 [1] CRAN (R 3.6.2)                        
#>  fs            1.4.1   2020-04-04 [1] CRAN (R 3.6.3)                        
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 3.6.1)                        
#>  ggbeeswarm    0.6.0   2017-08-07 [1] CRAN (R 3.6.3)                        
#>  ggplot2       3.3.1   2020-05-28 [1] CRAN (R 3.6.3)                        
#>  glue          1.4.1   2020-05-13 [1] CRAN (R 3.6.3)                        
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.1)                        
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.1)                        
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)                        
#>  httr          1.4.1   2019-08-05 [1] CRAN (R 3.6.1)                        
#>  knitr         1.28    2020-02-06 [1] CRAN (R 3.6.2)                        
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.3)                        
#>  magrittr    * 1.5     2014-11-22 [1] CRAN (R 3.6.1)                        
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.1)                        
#>  mime          0.9     2020-02-04 [1] CRAN (R 3.6.2)                        
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.1)                        
#>  pillar        1.4.4   2020-05-05 [1] CRAN (R 3.6.3)                        
#>  pkgbuild      1.0.8   2020-05-07 [1] CRAN (R 3.6.3)                        
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.1)                        
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 3.6.3)                        
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)                        
#>  processx      3.4.2   2020-02-09 [1] CRAN (R 3.6.2)                        
#>  profmem       0.5.0   2018-01-30 [1] CRAN (R 3.6.2)                        
#>  ps            1.3.3   2020-05-08 [1] CRAN (R 3.6.3)                        
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 3.6.3)                        
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)                        
#>  Rcpp          1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3)                        
#>  remotes       2.1.1   2020-02-15 [1] CRAN (R 3.6.2)                        
#>  rlang         0.4.6   2020-05-02 [1] CRAN (R 3.6.3)                        
#>  rmarkdown     2.2     2020-05-31 [1] CRAN (R 3.6.3)                        
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)                        
#>  scales        1.1.1   2020-05-11 [1] CRAN (R 3.6.3)                        
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)                        
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 3.6.2)                        
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.1)                        
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 3.6.3)                        
#>  tibble        3.0.1   2020-04-20 [1] CRAN (R 3.6.3)                        
#>  tidyr         1.1.0   2020-05-20 [1] CRAN (R 3.6.3)                        
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 3.6.3)                        
#>  tidytable     0.5.1   2020-05-29 [1] CRAN (R 3.6.3)                        
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 3.6.3)                        
#>  vctrs         0.3.0   2020-05-11 [1] CRAN (R 3.6.3)                        
#>  vipor         0.4.5   2017-03-22 [1] CRAN (R 3.6.3)                        
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 3.6.3)                        
#>  xfun          0.14    2020-05-20 [1] CRAN (R 3.6.3)                        
#>  xml2          1.3.2   2020-04-23 [1] CRAN (R 3.6.3)                        
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)                        
#> 
#> [1] C:/Users/usr/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.3/library

The text was updated successfully, but these errors were encountered:

markfairbanks · 2020-06-08T21:56:25Z

Can you install the dev version and let me know if you still have these issues? I actually found an issue that was slowing down pretty much every function in tidytable since v0.5.0. It somehow snuck past my normal speed tests. I just finished fixing it a few days ago.

devtools::install_github("markfairbanks/tidytable")

Here were the times I got when I ran your example.

One note - I made the dataset a data.table for the tidytable and data.table timings:

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)

rows <- 1000000
ids  <- 50000

#simple data set with many different IDs and 1M rows, 3 cols
df <- tibble(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
             bike = sample(c("mountain", "allround", "road", "bmx"), size = rows, replace = TRUE),
             year = sample(1980:2020, size = rows, replace = TRUE),
             stringsAsFactors = FALSE)

dt <- as.data.table(df)

results <- bench::mark(
  #first run with tidytable
  tidytable = dt %>%
    #sort by case id, time and item
    tidytable::arrange.(id, year, bike)%>%
    #calculate new item number variable #group by case id
    tidytable::mutate.(bike_number = as.integer(tidytable::row_number.()), by = id),
  #second run with dplyr
  dplyr = df %>%
    #sort by case id, time and item
    dplyr::arrange(id, year, bike)%>%
    #calculate new item number variable #group by case id
    dplyr::group_by(id) %>%
    dplyr::mutate(bike_number = as.integer(dplyr::row_number())) %>%
    dplyr::ungroup(),
  #third run with data.table
  data.table = data.table::copy(dt) %>%
    # data.table::as.data.table(.) %>%
    #sort by case id, time and item
    .[base::order(nchar(.[, id]), .[, id], .[, year], .[, bike], method = "radix")] %>%
    #calculate new item number variable #group by case id
    .[, bike_number := as.integer(seq_len(.N)), by=.[, id]] %>%
    .[],
  iterations = 3, filter_gc = FALSE, check = FALSE
)

ggplot2::autoplot(results)

marianschmidt · 2020-06-09T07:52:49Z

With the current dev version I can reproduce the much improved results. Added some more scenarios for performance tests. Do you consider releasing this improvement soon?

#performance test with various scenarios

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
set.seed(2048)

results_new <- bench::press(
    rows = c(100000, 1000000, 1e7),
    ids = c(1000, 10000, 100000),
    {df <- tibble(id = as.character(sample(1:ids, size = rows, replace = TRUE)), #using character variable as ID
                  bike = sample(c("mountain", "allround", "road", "bmx"), size = rows, replace = TRUE),
                  year = sample(1980:2020, size = rows, replace = TRUE))
    dt <- as.data.table(df)
      bench::mark(
          #first run with tidytable
          tidytable = dt %>%
            #sort by case id, time and item
            tidytable::arrange.(id, year, bike)%>%
            #calculate new item number variable #group by case id
            tidytable::mutate.(bike_number = as.integer(tidytable::row_number.()), by = id),
          #second run with dplyr
          dplyr = df %>%
            #sort by case id, time and item
            dplyr::arrange(id, year, bike)%>%
            #calculate new item number variable #group by case id
            dplyr::group_by(id) %>%
            dplyr::mutate(bike_number = as.integer(dplyr::row_number())) %>%
            dplyr::ungroup(),
          #third run with data.table
          data.table = data.table::copy(dt) %>%
            #sort by case id, time and item
            .[base::order(nchar(.[, id]), .[, id], .[, year], .[, bike], method = "radix")] %>%
            #calculate new item number variable #group by case id
            .[, bike_number := as.integer(seq_len(.N)), by=.[, id]] %>%
            .[],
          iterations = 3, filter_gc = FALSE, check = FALSE
      )
    }
  )
#> Running with:
#>       rows    ids
#> 1   100000   1000
#> 2  1000000   1000
#> 3 10000000   1000
#> 4   100000  10000
#> 5  1000000  10000
#> 6 10000000  10000
#> 7   100000 100000
#> 8  1000000 100000
#> 9 10000000 100000

  ggplot2::autoplot(results_new)

^{Created on 2020-06-09 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  ctype    German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2020-06-09                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source                                  
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)                          
#>  backports     1.1.7   2020-05-13 [1] CRAN (R 3.6.3)                          
#>  beeswarm      0.2.3   2016-04-25 [1] CRAN (R 3.6.0)                          
#>  bench         1.1.1   2020-01-13 [1] CRAN (R 3.6.2)                          
#>  blob          1.2.1   2020-01-20 [1] CRAN (R 3.6.3)                          
#>  broom         0.5.6   2020-04-20 [1] CRAN (R 3.6.3)                          
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 3.6.3)                          
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.6.1)                          
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.3)                          
#>  colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.1)                          
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)                          
#>  curl          4.3     2019-12-02 [1] CRAN (R 3.6.1)                          
#>  data.table  * 1.12.9  2020-03-04 [1] Github (Rdatatable/data.table@b1b1832)  
#>  DBI           1.1.0   2019-12-15 [1] CRAN (R 3.6.1)                          
#>  dbplyr        1.4.4   2020-05-27 [1] CRAN (R 3.6.3)                          
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.1)                          
#>  devtools      2.3.0   2020-04-10 [1] CRAN (R 3.6.3)                          
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.2)                          
#>  dplyr       * 1.0.0   2020-05-29 [1] CRAN (R 3.6.3)                          
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 3.6.3)                          
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)                          
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.2)                          
#>  farver        2.0.3   2020-01-16 [1] CRAN (R 3.6.2)                          
#>  forcats     * 0.5.0   2020-03-01 [1] CRAN (R 3.6.3)                          
#>  fs            1.4.1   2020-04-04 [1] CRAN (R 3.6.3)                          
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 3.6.1)                          
#>  ggbeeswarm    0.6.0   2017-08-07 [1] CRAN (R 3.6.3)                          
#>  ggplot2     * 3.3.1   2020-05-28 [1] CRAN (R 3.6.3)                          
#>  glue          1.4.1   2020-05-13 [1] CRAN (R 3.6.3)                          
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.1)                          
#>  haven         2.3.1   2020-06-01 [1] CRAN (R 3.6.3)                          
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.1)                          
#>  hms           0.5.3   2020-01-08 [1] CRAN (R 3.6.2)                          
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)                          
#>  httr          1.4.1   2019-08-05 [1] CRAN (R 3.6.1)                          
#>  jsonlite      1.6.1   2020-02-02 [1] CRAN (R 3.6.2)                          
#>  knitr         1.28    2020-02-06 [1] CRAN (R 3.6.2)                          
#>  lattice       0.20-38 2018-11-04 [2] CRAN (R 3.6.3)                          
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.3)                          
#>  lubridate     1.7.8   2020-04-06 [1] CRAN (R 3.6.3)                          
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.1)                          
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.1)                          
#>  mime          0.9     2020-02-04 [1] CRAN (R 3.6.2)                          
#>  modelr        0.1.8   2020-05-19 [1] CRAN (R 3.6.3)                          
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.1)                          
#>  nlme          3.1-144 2020-02-06 [2] CRAN (R 3.6.3)                          
#>  pillar        1.4.4   2020-05-05 [1] CRAN (R 3.6.3)                          
#>  pkgbuild      1.0.8   2020-05-07 [1] CRAN (R 3.6.3)                          
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.1)                          
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 3.6.3)                          
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.2)                          
#>  processx      3.4.2   2020-02-09 [1] CRAN (R 3.6.2)                          
#>  profmem       0.5.0   2018-01-30 [1] CRAN (R 3.6.2)                          
#>  ps            1.3.3   2020-05-08 [1] CRAN (R 3.6.3)                          
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 3.6.3)                          
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)                          
#>  Rcpp          1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3)                          
#>  readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.6.1)                          
#>  readxl        1.3.1   2019-03-13 [1] CRAN (R 3.6.1)                          
#>  remotes       2.1.1   2020-02-15 [1] CRAN (R 3.6.2)                          
#>  reprex        0.3.0   2019-05-16 [1] CRAN (R 3.6.1)                          
#>  rlang         0.4.6   2020-05-02 [1] CRAN (R 3.6.3)                          
#>  rmarkdown     2.2     2020-05-31 [1] CRAN (R 3.6.3)                          
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)                          
#>  rvest         0.3.5   2019-11-08 [1] CRAN (R 3.6.1)                          
#>  scales        1.1.1   2020-05-11 [1] CRAN (R 3.6.3)                          
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)                          
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 3.6.2)                          
#>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 3.6.1)                          
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 3.6.3)                          
#>  tibble      * 3.0.1   2020-04-20 [1] CRAN (R 3.6.3)                          
#>  tidyr       * 1.1.0   2020-05-20 [1] CRAN (R 3.6.3)                          
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 3.6.3)                          
#>  tidytable   * 0.5.1.9 2020-06-09 [1] Github (markfairbanks/tidytable@c133581)
#>  tidyverse   * 1.3.0   2019-11-21 [1] CRAN (R 3.6.1)                          
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 3.6.3)                          
#>  utf8          1.1.4   2018-05-24 [1] CRAN (R 3.6.1)                          
#>  vctrs         0.3.1   2020-06-05 [1] CRAN (R 3.6.3)                          
#>  vipor         0.4.5   2017-03-22 [1] CRAN (R 3.6.3)                          
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 3.6.3)                          
#>  xfun          0.14    2020-05-20 [1] CRAN (R 3.6.3)                          
#>  xml2          1.3.2   2020-04-23 [1] CRAN (R 3.6.3)                          
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)                          
#> 
#> [1] C:/Users/usr/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.3/library

markfairbanks · 2020-06-09T16:13:23Z

Awesome, good to hear that the performance issue is fixed.

Do you consider releasing this improvement soon?

Yep - my goal is to submit to CRAN this weekend. I'll keep you updated and let you know when it's accepted.

markfairbanks · 2020-06-15T03:51:53Z

@marianschmidt FYI the CRAN submission of v0.5.2 has been put on hold bc CRAN changed their documentation requirements sometime in the past week or two. My initial submission a couple days ago was rejected bc of this change.

Once r-lib/roxygen2#1108 is fixed I’ll submit to CRAN again. There are quite a few packages that are having this same problem, but it looks like it will be fixed soon! As far as I can tell it will be fixed in the next day or two

marianschmidt · 2020-06-19T14:27:05Z

@markfairbanks Thanks a lot for your efforts. Feel free to close this issue whenever convenient for you; I consider it closed.
btw, I did a lot more testing and reimplemented my tidyverse functions in tidytable. Performance is really incredible and since I had troubles with correctly translating my own functions to data.table, your package is such a great help.
During the process, it was quite hard to find adequate replacements for the .data pronoun from rlang because simply replacing dplyr::mutate() with tidytable::mutate.() didn't do it. I now find myself using rlang::tidy_eval() quite regularly and noticed that tidytable requires the strict passing of symbols in order to retrieve dataframe columns. Now everything I used to do in tidyverse, also works in tidytable. Thanks.

markfairbanks · 2020-06-19T16:21:00Z

@marianschmidt Glad the package is working out well!

As far as the .data pronoun, you can use data.table's version .SD.

And if you want to specify that you are using a variable from the global environment, you can just unquote it using !!.

library(tidytable, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)

test_df <- data.table(x = c(1, 1, 1))

x <- 5

# Using tidytable
test_df %>%
  mutate.(data_x_plus_global_x = .SD$x + !!x)
#>    x data_x_plus_global_x
#> 1: 1                    6
#> 2: 1                    6
#> 3: 1                    6

# Using the tidyverse (version 1)
test_df %>%
  mutate(data_x_plus_global_x = .data$x + !!x)
#>    x data_x_plus_global_x
#> 1: 1                    6
#> 2: 1                    6
#> 3: 1                    6

# Using the tidyverse (version 2)
test_df %>%
  mutate(data_x_plus_global_x = .data$x + .env$x)
#>    x data_x_plus_global_x
#> 1: 1                    6
#> 2: 1                    6
#> 3: 1                    6

markfairbanks · 2020-06-26T23:51:57Z

@marianschmidt tidytable v0.5.2 is now up on CRAN!

FYI there is a small API change - the by argument has been renamed to .by. Using by causes a warning, but will still work for a couple months or so.

library(tidytable, warn.conflicts = FALSE)

test_df <- data.table(x = 1:3, y = c("a", "a", "b"))

# Using `by` causes a warning
test_df %>%
  summarize.(avg_x = mean(x), by = y)
#> Warning: The `by` argument of `summarize.()` is deprecated as of tidytable 0.5.2.
#> Please use the `.by` argument instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.
#>    y avg_x
#> 1: a   1.5
#> 2: b   3.0

# Using `.by` works normally
test_df %>%
  summarize.(avg_x = mean(x), .by = y)
#>    y avg_x
#> 1: a   1.5
#> 2: b   3.0

markfairbanks closed this as completed Jun 19, 2020

markfairbanks mentioned this issue Jul 6, 2020

calling str_replace_all inside mutate. throws 'argument "replacement" is missing' error in the latest build #91

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slow performance of mutate.() with by grouping for many groups #82

slow performance of mutate.() with by grouping for many groups #82

marianschmidt commented Jun 8, 2020 •

edited

Loading

markfairbanks commented Jun 8, 2020 •

edited

Loading

marianschmidt commented Jun 9, 2020

markfairbanks commented Jun 9, 2020

markfairbanks commented Jun 15, 2020 •

edited

Loading

marianschmidt commented Jun 19, 2020

markfairbanks commented Jun 19, 2020 •

edited

Loading

markfairbanks commented Jun 26, 2020 •

edited

Loading

slow performance of mutate.() with by grouping for many groups #82

slow performance of mutate.() with by grouping for many groups #82

Comments

marianschmidt commented Jun 8, 2020 • edited Loading

markfairbanks commented Jun 8, 2020 • edited Loading

marianschmidt commented Jun 9, 2020

markfairbanks commented Jun 9, 2020

markfairbanks commented Jun 15, 2020 • edited Loading

marianschmidt commented Jun 19, 2020

markfairbanks commented Jun 19, 2020 • edited Loading

markfairbanks commented Jun 26, 2020 • edited Loading

marianschmidt commented Jun 8, 2020 •

edited

Loading

markfairbanks commented Jun 8, 2020 •

edited

Loading

markfairbanks commented Jun 15, 2020 •

edited

Loading

markfairbanks commented Jun 19, 2020 •

edited

Loading

markfairbanks commented Jun 26, 2020 •

edited

Loading