Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #91

Closed
seabbs opened this issue May 13, 2020 · 3 comments · Fixed by #93
Closed

Memory leak #91

seabbs opened this issue May 13, 2020 · 3 comments · Fixed by #93
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@seabbs
Copy link
Contributor

seabbs commented May 13, 2020

When running over many regions there is a memory leak that causes an out of RAM error and ultimately leading to estimation failing.

There is a gist here that provides a reprex.

Via turning areas in the code off I believe I have localized the issue to here:

EpiNow/R/estimate_R0.R

Lines 190 to 239 in 0d69358

## Forecast Rts using the mean estimate
rt_forecasts <-
data.table::setDT(
safe_forecast(rts = est_r[sample == 1, .(date, rt = mean_R)],
model = forecast_model,
horizon = horizon,
samples = rt_samples)[[1]]
)
##Forecast the variance using the same model structure
sd_forecasts <-
data.table::setDT(
safe_forecast(rts = est_r[sample == 1, .(date, rt = sd_R)],
model = forecast_model,
horizon = horizon,
samples = rt_samples)[[1]]
)[, sd_rt := rt][,rt := NULL]
## Join mean and sd forecasts
rt_forecasts <- rt_forecasts[sd_forecasts, on = c("date", "sample", "horizon")]
## Add gamma noise and drop mean and sd forecasts
## Assume that the minumum allowed gamma noise is 1e-4
## Zero sd will result in rgamma draws that are also 0 and so must be avoided
rt_forecasts <- rt_forecasts[sd_rt <= 0, sd_rt := 1e-4][,
rt := purrr::map2_dbl(rt, sd_rt, ~ mean_rgamma(1, mean = .x, sd = .y))][,
.(sample, date, horizon, rt)]
## Forecast cases
case_forecasts <-
data.table::setDT(
EpiSoon::forecast_cases(
cases = summed_cases,
fit_samples = rt_forecasts,
rdist = rpois,
serial_interval = generation_times[, index]
)
)[,rt_type := "forecast"]
## Join Rt estimates and forecasts
est_r <- est_r[, `:=`(R = sample_R, rt_type = "nowcast")][,
`:=`(mean_R = NULL, sd_R = NULL, sample_R = NULL)][,
.(date, R, sample, crps, window, rt_type)]
est_r <- data.table::rbindlist(
list(est_r,
rt_forecasts[,.(date, R = rt, sample, rt_type = "forecast")]
), fill = TRUE)
return(list(rts = est_r, cases = case_forecasts))

This area of the code runs a forecast and relies on EpiSoon, data.table, purrr, fable and fabletools any of which may be the source of the problem.

This deduction may also be a false positive with another area of the code to blame. All code here that is not from other packages has recently been rewritten from the tidyverse to data.table.

@seabbs seabbs added bug Something isn't working help wanted Extra attention is needed labels May 13, 2020
@johnlees
Copy link

Hi,
I didn't have any luck with profvis, but using lineprof:

library(EpiNow)
library(lineprof)
prof <- lineprof(EpiNow::estimate_R0(cases = EpiSoon::example_obs_cases,
                          generation_times = as.matrix(EpiNow::covid_generation_times[,1]),
                           rt_prior = list(mean_prior = 2.6, std_prior = 2),
                          windows = c(1, 3, 7), rt_samples = 10, gt_samples = 2,
                          min_est_date =  as.Date("2020-02-18"),
                          forecast_model = function(...){EpiSoon::fable_model(model = fable::ETS(y ~ trend("A")), ...)},
                         horizon = 7), torture = TRUE

The only thing that came up was EpiEstim::estimate_R leaking about 1Mb (and not obviously in a single line, at least as far as I can tell). I guess that's not enough to explain your issue.

Attached rds if you want to take a look.
lineprof.rds.zip

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] shiny_1.4.0       lineprof_0.1.9001 EpiNow_0.2.0

loaded via a namespace (and not attached):
  [1] mcmc_0.9-7              matrixStats_0.56.0      xts_0.12-0
  [4] lubridate_1.7.8         bsts_0.9.5              rstan_2.19.3
  [7] tools_3.6.2             R6_2.4.1                coarseDataTools_0.6-5
 [10] colorspace_1.4-1        npsurv_0.4-0.1          tidyselect_1.0.0
 [13] gridExtra_2.3           prettyunits_1.1.1       EpiEstim_2.2-1
 [16] processx_3.4.2          compiler_3.6.2          cli_2.0.2
 [19] quantreg_5.55           HDInterval_0.2.0        SparseM_1.78
 [22] scales_1.1.0            callr_3.4.3             goftest_1.2-2
 [25] stringr_1.4.0           digest_0.6.25           StanHeaders_2.21.0-1
 [28] tsibble_0.8.6           R.utils_2.9.2           MCMCpack_1.4-7
 [31] base64enc_0.1-3         Boom_0.9.6              pkgconfig_2.0.3
 [34] htmltools_0.4.0         fastmap_1.0.1           scoringRules_1.0.0
 [37] rlang_0.4.6             generics_0.0.2          jsonlite_1.6.1
 [40] zoo_1.8-8               dplyr_0.8.5             R.oo_1.23.0
 [43] inline_0.3.15           magrittr_1.5            EpiSoon_0.3.0
 [46] R.devices_2.16.1        loo_2.2.0               patchwork_1.0.0
 [49] Matrix_1.2-18           Rcpp_1.0.4.6            munsell_0.5.0
 [52] fansi_0.4.1             lifecycle_0.2.0         R.methodsS3_1.8.0
 [55] furrr_0.1.0             stringi_1.4.6           MASS_7.3-51.4
 [58] pkgbuild_1.0.7          plyr_1.8.6              grid_3.6.2
 [61] parallel_3.6.2          BoomSpikeSlab_1.2.3     listenv_0.8.0
 [64] promises_1.1.0          crayon_1.3.4            lattice_0.20-38
 [67] cowplot_1.0.0           splines_3.6.2           knitr_1.28
 [70] anytime_0.3.7           ps_1.3.2                pillar_1.4.4
 [73] future.apply_1.4.0      reshape2_1.4.3          codetools_0.2-16
 [76] stats4_3.6.2            glue_1.4.0              lsei_1.2-0.1
 [79] fabletools_0.1.3        data.table_1.12.8       vctrs_0.2.4
 [82] httpuv_1.5.2            MatrixModels_0.4-1      gtable_0.3.0
 [85] purrr_0.3.4             tidyr_1.0.2             future_1.17.0
 [88] incidence_1.7.1         assertthat_0.2.1        ggplot2_3.3.0
 [91] xfun_0.12               mime_0.9                xtable_1.8-4
 [94] coda_0.19-3             later_1.0.0             scoringutils_0.0.0.9000
 [97] survival_3.1-8          tibble_3.0.1            globals_0.12.5
[100] fitdistrplus_1.0-14     ellipsis_0.3.0

@seabbs
Copy link
Contributor Author

seabbs commented May 14, 2020

Hi John,

Thanks for taking a look (and for the offer of help - see email).

I haven't tried lineprof (only profvis) so I will take a look. Even a small memory leak here could be at fault if it scales with samples (1000 per country so ~ 1Gb per country at least).

Sam

@seabbs
Copy link
Contributor Author

seabbs commented May 14, 2020

Further investigation indicates that the root cause of the memory leak is the fable/fabletools package (potentially linked to this issue: tidyverts/fable#230).

Dropping the fabletools model defined in the reprex and replacing it with the following:

function(...) {EpiSoon::bsts_model(model = 
                      function(ss, y){bsts::AddSemilocalLinearTrend(ss, y = y)}, ...)}

Appears to fix the RAM issue. This is not a drop-in replacement however as in testing the bsts models have generally performed worse than those from fable. Possible options that are also likely to perform well are the forecastHybrid package, the forecast package and the brms package. None of these are currently supported by EpiSoon with an appropriate wrapper which will first need to be done before this can tested further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants