-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating and readd
ing ggplot plots is very slow
#1258
Comments
This is an instance of #882. Unfortunately, objects like library(drake)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
library(webshot)
density_plot <- function(df, xvar, title, xlab, grouping, file) {
x_quo <- enquo(xvar)
group_quo <- enquo(grouping)
out <- ggplot(df, aes(x = !!x_quo, color = !!group_quo, fill = !!group_quo)) +
geom_density(alpha = 0.7) +
scale_colour_viridis_d() +
scale_fill_viridis_d() +
labs(
title = title,
x = xlab,
subtitle = paste("Calculated on:", Sys.time())
)
ggsave(file, out)
invisible()
}
my_plan <- drake_plan(
df = data.frame(
group = c(rep("good", 1e5), rep("bad", 1e5)),
values = rnorm(2e5),
replicate(200, sample(0:1, 1000, rep = TRUE))
),
my_plot = density_plot(
df, values,
"My plot",
"values",
group,
file_out("my_plot.png")
)
)
make(my_plan)
#> ▶ target df
#> ▶ target my_plot
#> Saving 7 x 5 in image
# Runs more quickly.
build_times()
#> # A tibble: 21 x 4
#> target elapsed user system
#> <chr> <Duration> <Duration> <Duration>
#> 1 coef_regression1_large 0.009s 0.005s 0.002s
#> 2 coef_regression1_small 0.012s 0.004s 0.002s
#> 3 coef_regression2_large 0.014s 0.003s 0.003s
#> 4 coef_regression2_small 0.02s 0.004s 0.001s
#> 5 df 7.334s 7.062s 0.223s
#> 6 large 0.012s 0.007s 0.002s
#> 7 my_plot 0.794s 0.62s 0.093s
#> 8 regression1_large 0.012s 0.006s 0.003s
#> 9 regression1_small 0.03s 0.006s 0.002s
#> 10 regression2_large 0.009s 0.004s 0.002s
#> # … with 11 more rows
# We do not return a value.
readd(my_plot)
#> NULL
# But we still get a plot.
webshot("my_plot.png") Created on 2020-05-18 by the reprex package (v0.3.0) |
thanks! for future reference:
|
Prework
drake
's code of conduct.remotes::install_github("ropensci/drake")
) and mention the SHA-1 hash of the Git commit you install.Description
Saving and
readd
ing ggplot plots is very slow (~7min vs 1s running directly from the console) when the underlying data.frame has 2e5 rows and 200 cols. I'm usingr_make()
to run the plan.Reproducible example
The
density_plot
function below is exactly what I'm using in my code.Here is my actual plan: https://gist.github.com/jcpsantiago/e119a53199379a438c14e1c33f651b93
The reprex slows down, but it's still <1min so something is missing.
The last step rendering the report is especially slow. The whole plan took ~2.5h while running the same code in a notebook would need around 15--20min (talking about the gist above).
Created on 2020-05-18 by the reprex package (v0.3.0)
Session info
Benchmarks
This is the result of
pprof(the_plan)
with only one plot out of dateThe text was updated successfully, but these errors were encountered: