memory size error with large random effects models in rstanarm #27

mhandreae · 2015-09-29T20:15:36Z

Executive summary:
While rstanarm fits a random effects logistic regression model to my large dataset (~50,000 units of observations) and runs four chains without an issue, on trying to generate the stanfit object, I get a memory size error. There are about 2000 random effects in this model. The issue arises even with only 10 iterations. I have 8 RAM and four cores on a virtual windows server.

formulaR2.0 <- ond ~ pay +age_group +sex + (1 | cpt)

stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial, iter= 10, cores =2)
Loading required namespace: rstudioapi

Error: cannot allocate vector of size 2.7 Gb In addition:

Warning messages:
1: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
2: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
3: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
4: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] rstanarm_2.8.0 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1 lme4_1.1-9
[6] Matrix_1.2-2 knitr_1.11

loaded via a namespace (and not attached):
[1] nloptr_1.0.4 plyr_1.8.3 shinyjs_0.2.0 xts_0.9-7
[5] base64enc_0.1-3 tools_3.2.2 digest_0.6.8 nlme_3.1-121
[9] gtable_0.1.2 lattice_0.20-33 rstudioapi_0.3.1 shiny_0.12.2
[13] shinystan_2.0.1 proto_0.3-10 loo_0.1.3 gridExtra_2.0.0
[17] stringr_1.0.0 gtools_3.5.0 dygraphs_0.4.5 htmlwidgets_0.5
[21] DT_0.1 stats4_3.2.2 grid_3.2.2 inline_0.3.14
[25] R6_2.1.1 minqa_1.2.4 reshape2_1.4.1 magrittr_1.5
[29] codetools_0.2-14 shinythemes_1.0.1 threejs_0.2.1 scales_0.3.0
[33] matrixStats_0.14.2 htmltools_0.2.6 MASS_7.3-43 splines_3.2.2
[37] xtable_1.7-4 mime_0.4 colorspace_1.2-6 httpuv_1.3.3
[41] stringi_0.5-5 munsell_0.4.2 markdown_0.7.7 zoo_1.7-12

jgabry · 2015-09-29T22:33:03Z

Looking closely, those warning messages say that it's actually trying to
allocate more than 8GB of memory.

On Tuesday, September 29, 2015, Michael H Andreae notifications@github.com
wrote:

Executive summary:
While rstanarm fits a random effects logistic regression model to my large
dataset (~50,000 units of observations) and runs four chains without an
issue, on trying to generate the stanfit object, I get a memory size error.
There are about 2000 random effects in this model. The issue arises even
with only 10 iterations. I have 8 RAM and four cores on a virtual windows
server.

formulaR2.0 <- ond ~ pay +age_group +sex + (1 | cpt)

stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial,
iter= 10, cores =2)
Loading required namespace: rstudioapi

Error: cannot allocate vector of size 2.7 Gb In addition:

Warning messages:
1: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
2: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
3: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
4: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] rstanarm_2.8.0 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1 lme4_1.1-9

[6] Matrix_1.2-2 knitr_1.11

loaded via a namespace (and not attached):
[1] nloptr_1.0.4 plyr_1.8.3 shinyjs_0.2.0 xts_0.9-7

[5] base64enc_0.1-3 tools_3.2.2 digest_0.6.8 nlme_3.1-121

[9] gtable_0.1.2 lattice_0.20-33 rstudioapi_0.3.1 shiny_0.12.2

[13] shinystan_2.0.1 proto_0.3-10 loo_0.1.3 gridExtra_2.0.0

[17] stringr_1.0.0 gtools_3.5.0 dygraphs_0.4.5 htmlwidgets_0.5

[21] DT_0.1 stats4_3.2.2 grid_3.2.2 inline_0.3.14

[25] R6_2.1.1 minqa_1.2.4 reshape2_1.4.1 magrittr_1.5

[29] codetools_0.2-14 shinythemes_1.0.1 threejs_0.2.1 scales_0.3.0

[33] matrixStats_0.14.2 htmltools_0.2.6 MASS_7.3-43 splines_3.2.2

[37] xtable_1.7-4 mime_0.4 colorspace_1.2-6 httpuv_1.3.3

[41] stringi_0.5-5 munsell_0.4.2 markdown_0.7.7 zoo_1.7-12

—
Reply to this email directly or view it on GitHub
#27.

mhandreae · 2015-09-30T03:18:51Z

Great observation. Now I see. Thank you. I do not understand why the file
is so large with just 10 (ten) iterations...?

But it is hence clear that I need to increase available RAM or can this be
done with virtual memory at the expense of time?

Ben suggested that a function could be implemented to drop the draws and
only keep the mode of the random effects to reduce the required memory...?

Cheers
Michael
On Sep 29, 2015 6:33 PM, "Jonah Gabry" notifications@github.com wrote:

Looking closely, those warning messages say that it's actually trying to
allocate more than 8GB of memory.

On Tuesday, September 29, 2015, Michael H Andreae <
notifications@github.com>
wrote:

Executive summary:
While rstanarm fits a random effects logistic regression model to my
large
dataset (~50,000 units of observations) and runs four chains without an
issue, on trying to generate the stanfit object, I get a memory size
error.
There are about 2000 random effects in this model. The issue arises even
with only 10 iterations. I have 8 RAM and four cores on a virtual windows
server.

formulaR2.0 <- ond ~ pay +age_group +sex + (1 | cpt)

stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial,
iter= 10, cores =2)
Loading required namespace: rstudioapi

Error: cannot allocate vector of size 2.7 Gb In addition:

Warning messages:
1: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
2: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
3: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
4: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] rstanarm_2.8.0 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1 lme4_1.1-9

[6] Matrix_1.2-2 knitr_1.11

loaded via a namespace (and not attached):
[1] nloptr_1.0.4 plyr_1.8.3 shinyjs_0.2.0 xts_0.9-7

[5] base64enc_0.1-3 tools_3.2.2 digest_0.6.8 nlme_3.1-121

[9] gtable_0.1.2 lattice_0.20-33 rstudioapi_0.3.1 shiny_0.12.2

[13] shinystan_2.0.1 proto_0.3-10 loo_0.1.3 gridExtra_2.0.0

[17] stringr_1.0.0 gtools_3.5.0 dygraphs_0.4.5 htmlwidgets_0.5

[21] DT_0.1 stats4_3.2.2 grid_3.2.2 inline_0.3.14

[25] R6_2.1.1 minqa_1.2.4 reshape2_1.4.1 magrittr_1.5

[29] codetools_0.2-14 shinythemes_1.0.1 threejs_0.2.1 scales_0.3.0

[33] matrixStats_0.14.2 htmltools_0.2.6 MASS_7.3-43 splines_3.2.2

[37] xtable_1.7-4 mime_0.4 colorspace_1.2-6 httpuv_1.3.3

[41] stringi_0.5-5 munsell_0.4.2 markdown_0.7.7 zoo_1.7-12

—
Reply to this email directly or view it on GitHub
#27.

—
Reply to this email directly or view it on GitHub
#27 (comment).

jgabry · 2015-09-30T03:59:57Z

If you run traceback() does it give you more info on which function
actually errored?

Or maybe setting options(error = recover) and then recreating the error.

On Tuesday, September 29, 2015, Michael H Andreae notifications@github.com
wrote:

Great observation. Now I see. Thank you. I do not understand why the file
is so large with just 10 (ten) iterations...?

But it is hence clear that I need to increase available RAM or can this be
done with virtual memory at the expense of time?

Ben suggested that a function could be implemented to drop the draws and
only keep the mode of the random effects to reduce the required memory...?

Cheers
Michael
On Sep 29, 2015 6:33 PM, "Jonah Gabry" <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Looking closely, those warning messages say that it's actually trying to
allocate more than 8GB of memory.

On Tuesday, September 29, 2015, Michael H Andreae <
notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Executive summary:
While rstanarm fits a random effects logistic regression model to my
large
dataset (~50,000 units of observations) and runs four chains without an
issue, on trying to generate the stanfit object, I get a memory size
error.
There are about 2000 random effects in this model. The issue arises
even
with only 10 iterations. I have 8 RAM and four cores on a virtual
windows
server.

formulaR2.0 <- ond ~ pay +age_group +sex + (1 | cpt)

stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial,
iter= 10, cores =2)
Loading required namespace: rstudioapi

Error: cannot allocate vector of size 2.7 Gb In addition:

Warning messages:
1: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
2: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
3: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)
4: In structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr") :
Reached total allocation of 8191Mb: see help(memory.size)

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] rstanarm_2.8.0 rstan_2.8.0 ggplot2_1.0.1 Rcpp_0.12.1 lme4_1.1-9

[6] Matrix_1.2-2 knitr_1.11

loaded via a namespace (and not attached):
[1] nloptr_1.0.4 plyr_1.8.3 shinyjs_0.2.0 xts_0.9-7

[5] base64enc_0.1-3 tools_3.2.2 digest_0.6.8 nlme_3.1-121

[9] gtable_0.1.2 lattice_0.20-33 rstudioapi_0.3.1 shiny_0.12.2

[13] shinystan_2.0.1 proto_0.3-10 loo_0.1.3 gridExtra_2.0.0

[17] stringr_1.0.0 gtools_3.5.0 dygraphs_0.4.5 htmlwidgets_0.5

[21] DT_0.1 stats4_3.2.2 grid_3.2.2 inline_0.3.14

[25] R6_2.1.1 minqa_1.2.4 reshape2_1.4.1 magrittr_1.5

[29] codetools_0.2-14 shinythemes_1.0.1 threejs_0.2.1 scales_0.3.0

[33] matrixStats_0.14.2 htmltools_0.2.6 MASS_7.3-43 splines_3.2.2

[37] xtable_1.7-4 mime_0.4 colorspace_1.2-6 httpuv_1.3.3

[41] stringi_0.5-5 munsell_0.4.2 markdown_0.7.7 zoo_1.7-12

—
Reply to this email directly or view it on GitHub
#27.

—
Reply to this email directly or view it on GitHub
#27 (comment).

—
Reply to this email directly or view it on GitHub
#27 (comment).

jgabry · 2015-10-01T17:21:24Z

@mhandreae Are you able to share the data for this model?

mhandreae · 2015-10-06T12:09:28Z

stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial,
iter= 10, cores =2)
Show Traceback

Rerun with Debug
Error: cannot allocate vector of size 2.7 Gb In addition: There were 18
warnings (use warnings() to see them)

traceback()
5: structure(.Internal(La_qr(x)), useLAPACK = TRUE, class = "qr")
4: qr.default(x, tol = .Machine$double.eps, LAPACK = TRUE)
3: qr(x, tol = .Machine$double.eps, LAPACK = TRUE) at stanreg.R#11
2: stanreg(fit) at stan_glmer.R#95
1: stan_glmer(formulaR2.0, data = myAQI, family = binomial, iter = 10,
cores = 2)

The data file is very large and I am not allowed to share the data, but we
could sit and look at it together on the virtual machine. Maybe time for
another tutorial on trouble shooting memory problems in Rstanarm?

See you at 11am today as usual?

Michael

On Thu, Oct 1, 2015 at 1:21 PM, Jonah Gabry notifications@github.com
wrote:

@mhandreae https://github.com/mhandreae Are you able to share the data
for this model?

—
Reply to this email directly or view it on GitHub
#27 (comment).

mhandreae · 2015-10-06T18:55:46Z

I reduced the size of the dataset to 10% (from 100,000 units of observation to 10,000), which reduced the number of random effects (from 3000 to 1000).

rstanarm now runs the chains and save the stanfit object, indeed it converges and is a nice model.
BUT obviously I want to include all the data and all the random effects.
Michael

mhandreae · 2015-10-13T15:25:10Z

After we deactivated the QR decomposition in rstanarm the only remaining errror is:

stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial, iter= 5, cores =2)
Hide Traceback

Rerun with Debug
Error: (converted from warning) There were 12 divergent transitions after warmup. Increasing adapt_delta may help.
11 doWithOneRestart(return(expr), restart)
10 withOneRestart(expr, restarts[[1L]])
9 withRestarts({
.Internal(.signalCondition(simpleWarning(msg, call), msg,
call))
.Internal(.dfltWarn(msg, call)) ...
8 .signalSimpleWarning("There were 12 divergent transitions after warmup. Increasing adapt_delta may help.",
quote(NULL))
7 warning("There were ", n_d, " divergent transitions after warmup.",
" Increasing adapt_delta may help.", call. = FALSE)
6 throw_sampler_warnings(nfits)
5 .local(object, ...)
4 rstan::sampling(stanfit, data = standata, pars = pars, control = stan_control,
show_messages = FALSE, ...)
3 rstan::sampling(stanfit, data = standata, pars = pars, control = stan_control,
show_messages = FALSE, ...) at stan_glm.fit.R#352
2 stan_glm.fit(x = X, y = y, weights = weights, offset = offset,
family = family, prior = prior, prior_intercept = prior_intercept,
prior_ops = prior_ops, prior_PD = prior_PD, algorithm = algorithm,
group = group, ...) at stan_glmer.R#78
1 stan_glmer(formulaR2.0, data = myAQI, family = binomial, iter = 5,
cores = 2)

jgabry · 2015-10-13T17:17:55Z

Actually I think that stuff is fine. It's just showing you the path leading
to the warning about divergent iterations. It normally wouldn't show that
stuff, but we set options(warn=2). I was hoping that would show us
where the other warning (something about factors) came from.

On Tuesday, October 13, 2015, Michael H Andreae notifications@github.com
wrote:

After we deactivated the QR decomposition in rstanarm the only remaining
errror is:

stanfit2.0 <- stan_glmer(formulaR2.0, data = myAQI, family = binomial,
iter= 5, cores =2)
Hide Traceback

Rerun with Debug
Error: (converted from warning) There were 12 divergent transitions after
warmup. Increasing adapt_delta may help.
11 doWithOneRestart(return(expr), restart)
10 withOneRestart(expr, restarts[[1L]])
9 withRestarts({
.Internal(.signalCondition(simpleWarning(msg, call), msg,
call))
.Internal(.dfltWarn(msg, call)) ...
8 .signalSimpleWarning("There were 12 divergent transitions after warmup.
Increasing adapt_delta may help.",
quote(NULL))
7 warning("There were ", n_d, " divergent transitions after warmup.",
" Increasing adapt_delta may help.", call. = FALSE)
6 throw_sampler_warnings(nfits)
5 .local(object, ...)
4 rstan::sampling(stanfit, data = standata, pars = pars, control =
stan_control,
show_messages = FALSE, ...)
3 rstan::sampling(stanfit, data = standata, pars = pars, control =
stan_control,
show_messages = FALSE, ...) at stan_glm.fit.R#352
2 stan_glm.fit(x = X, y = y, weights = weights, offset = offset,
family = family, prior = prior, prior_intercept = prior_intercept,
prior_ops = prior_ops, prior_PD = prior_PD, algorithm = algorithm,
group = group, ...) at stan_glmer.R#78
1 stan_glmer(formulaR2.0, data = myAQI, family = binomial, iter = 5,
cores = 2)

—
Reply to this email directly or view it on GitHub
#27 (comment).

bob-carpenter · 2015-10-13T17:18:22Z

I don't see what you think is an error. But in any case, it looks like a different issue than the memory issue, so this particular issue (#27) should be closed and a new issue opened for the new problem with a clear indication of what you think the problem is and how to reproduce it.

jgabry · 2015-10-13T17:26:09Z

Yeah, this isn't an error, just related to the warning about divergent
iterations. But let's keep the issue open for a bit because the fix for
Michael's memory issue is only implemented on his machine. I want to look
into it a bit more before implementing it inside the package and I can
close it when I do that.

On Tuesday, October 13, 2015, Bob Carpenter notifications@github.com
wrote:

I don't see what you think is an error. But in any case, it looks like a
different issue than the memory issue, so this particular issue (#27
#27) should be closed and a
new issue opened for the new problem with a clear indication of what you
think the problem is and how to reproduce it.

—
Reply to this email directly or view it on GitHub
#27 (comment).

bgoodri · 2016-01-06T17:09:21Z

I think this was not really an rstanarm issue but rather a memory issue or possibly a rstan memory issue that happened to pop up when using rstanarm to estimate the model. It might be better now that rstanarm is not saving the warmup draws.

bgoodri closed this as completed Jan 6, 2016

jgabry added the bug label Jan 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory size error with large random effects models in rstanarm #27

memory size error with large random effects models in rstanarm #27

mhandreae commented Sep 29, 2015

jgabry commented Sep 29, 2015

mhandreae commented Sep 30, 2015

jgabry commented Sep 30, 2015

jgabry commented Oct 1, 2015

mhandreae commented Oct 6, 2015

mhandreae commented Oct 6, 2015

mhandreae commented Oct 13, 2015

jgabry commented Oct 13, 2015

bob-carpenter commented Oct 13, 2015

jgabry commented Oct 13, 2015

bgoodri commented Jan 6, 2016

memory size error with large random effects models in rstanarm #27

memory size error with large random effects models in rstanarm #27

Comments

mhandreae commented Sep 29, 2015

jgabry commented Sep 29, 2015

mhandreae commented Sep 30, 2015

jgabry commented Sep 30, 2015

jgabry commented Oct 1, 2015

mhandreae commented Oct 6, 2015

mhandreae commented Oct 6, 2015

mhandreae commented Oct 13, 2015

jgabry commented Oct 13, 2015

bob-carpenter commented Oct 13, 2015

jgabry commented Oct 13, 2015

bgoodri commented Jan 6, 2016