You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance regression when rbinding lots of data frames that have df-cols. I'm sure this has to do with making extra copies, but I'm not sure where yet. I'll take a look.
This is with dev vctrs.
R 3.6
library(vctrs)
df_col<- new_data_frame(list(x=1:2))
df<- new_data_frame(list(y=df_col))
x<- rep_len(list(df), 10000)
y<- rep_len(list(df_col), 10000)
lst_rbind<-function(x) {
vec_rbind(!!!x)
}
bench::mark(lst_rbind(x))
#> # A tibble: 1 x 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>#> 1 lst_rbind(x) 37.5ms 37.9ms 26.2 277KB 52.4bench::mark(lst_rbind(y))
#> # A tibble: 1 x 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>#> 1 lst_rbind(y) 16.5ms 18.2ms 55.1 274KB 16.5
R 4.0
library(vctrs)
df_col<- new_data_frame(list(x=1:2))
df<- new_data_frame(list(y=df_col))
x<- rep_len(list(df), 10000)
y<- rep_len(list(df_col), 10000)
lst_rbind<-function(x) {
vec_rbind(!!!x)
}
bench::mark(lst_rbind(x))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.#> # A tibble: 1 x 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>#> 1 lst_rbind(x) 316ms 352ms 2.84 764MB 48.2bench::mark(lst_rbind(y))
#> # A tibble: 1 x 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>#> 1 lst_rbind(y) 15.8ms 17.6ms 55.6 274KB 23.4
The text was updated successfully, but these errors were encountered:
It seems like when the output df-col is restored with vec_restore() before assigning it into out, that somehow increments the refcnt on each individual column of the df-col from 1 to 2, so then the columns of the output df-col are needlessly copied at the next assignment iteration
This is a problem, because this is a df-col that we are restoring, meaning that it has already been set inside a data frame and the df-col itself has a refcnt of 1 already. So it is referenced, and a shallow duplication does happen here. This triggers a refcnt bump of all of the columns of that df-col, which bumps them from 1 up to 2.
It is possible we need to pass the ownership parameter down to vec_restore()
Performance regression when rbinding lots of data frames that have df-cols. I'm sure this has to do with making extra copies, but I'm not sure where yet. I'll take a look.
This is with dev vctrs.
R 3.6
R 4.0
The text was updated successfully, but these errors were encountered: