Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory allocation error with ggeffect, ggpredict #22

Closed
emjonaitis opened this issue Apr 5, 2018 · 25 comments
Closed

memory allocation error with ggeffect, ggpredict #22

emjonaitis opened this issue Apr 5, 2018 · 25 comments
Labels
waiting for response 💌 Response from users awaited

Comments

@emjonaitis
Copy link

I have a three-way interaction from an lmer model that I was able to summarize and plot easily with the effects package. However, when I try to summarize it using ggeffect I get the following error message: Error: cannot allocate vector of size 84619.5 Gb. With ggpredict I get a similar but less alarming message (the vector size is ~8.3 Gb). I have no sample data I can share, but the fact that this works in effects but not ggeffects makes me suspect a bug.

@richardneilbelcher
Copy link

richardneilbelcher commented Apr 8, 2018

I just had a similar issue with the ggpredict function. Setting ci.lvl = NA made it work without the vector allocation error (although ggeffect worked for me). With other lme4 prediction tools they can use some intensive methods for degrees of freedom calculations so that may be it.

@strengejacke
Copy link
Owner

strengejacke commented Apr 8, 2018

If one of the terms is continuous, try to specify some values, e.g. terms = "cont_var [10, 20, 30, 40, 50]", if cont_var ranges from 10 to 50. I use expand.grid() on all possible values, and computation of CI might be very memory consuming in such cases. The effects package, by default, does not compute effects and CI for all possible values, but for a selection only.

@strengejacke
Copy link
Owner

Could you please check if this issue still exists in the current dev-version? I have added a pretty argument, which creates a sequence of "pretty" numbers for predictor terms with many unique values.

@strengejacke
Copy link
Owner

Since pretty does not plot splines nicely, I've changed the default to pretty = FALSE. So, if possible, please check if ggpredict(..., pretty = TRUE) solves your issue.

@vspinu
Copy link

vspinu commented May 1, 2018

I am having same issue with ggeffects::ggalleffects which blows to 15GB for 64k row x 9col data set. ggeffects::predict doesn't work for me because I have sales(forecast) term in my formula:

> labels(terms(M))
 [1] "error"               "scale(forecast)"     "error2"             
 [4] "waste1"              "waste2"              "sales"              
 [7] "family"              "scale(transactions)" "cluster"            
[10] "aweek"               "year"                "holid_nat"          
> ggeffects::ggpredict(M, labels(terms(M)), pretty = T)
`terms` must have not more than three values. Using first three values now.
Error in scale(forecast, center = 276.325019004872, scale = 375.963889410209) : 
  object 'forecast' not found

The above works just fine with the effects package.

@strengejacke
Copy link
Owner

strengejacke commented May 1, 2018

Try to standardise the variable before you fit the model, does this work?

@vspinu
Copy link

vspinu commented May 1, 2018

they are standardized. Sorry, I haven't provided the model formula:

waste ~ error + scale(forecast) + error2 + waste1 + waste2 + 
    sales + family + scale(transactions) + cluster + aweek + 
    year + holid_nat

I expect this error has something to do with the inline scale(forecast).

@strengejacke
Copy link
Owner

Yes, please standardize before, and don't use "inline" calls to functions. And it's preferred to use sjmisc::std(), because scale() changes the input type.

@vspinu
Copy link

vspinu commented May 2, 2018

Yes, please standardize before, and don't use "inline" calls to functions.

Yerh, one too many restriction; I guess I would stay away then. All other standard R software works with "inline" functions and doesn't require standardization (effects package including). But well, every package is different ;)

As a side note, subsampling or pretty=TRUE should be the default. The splines issue is really not an excuse to blow people's R sessions during basic plotting (especially with small data sets).

@strengejacke
Copy link
Owner

strengejacke commented May 2, 2018

Actually, ggpredict() should work with the the inline-use of functions, however, the term-argument must use the original term names. So term = "forecast" should work, while labels(terms(M)) returns scale(forecast).

Your argument is translated to predict(M, newdata = data.frame('scale(forecast)' = ...)), which causes the error.

@strengejacke
Copy link
Owner

@emjonaitis and @richardneilbelcher do you still have the memory allocation issues if you set pretty = TRUE? I think I will indeed make this as default option, and print a message if prettifying was done, so the user is not too curious about less smoothed plots.

@emjonaitis
Copy link
Author

emjonaitis commented May 4, 2018 via email

@emjonaitis
Copy link
Author

emjonaitis commented May 4, 2018 via email

@strengejacke
Copy link
Owner

strengejacke commented Jun 19, 2018

I revised the calculation for CI for mixed models, which now should be more efficient. When you now either use:

ggpredict(model, term = "myterm", pretty = TRUE)

or

ggpredict(model, term = "myterm [range]") # should really be "range", this is no placeholder

does one of these two options solve your issue? This requires the current GitHub-version of ggeffects.

@strengejacke
Copy link
Owner

@vspinu If you want to plot effects for all model terms, you can now simply leave the terms argument missing or NULL, so just calling ggeffects::ggpredict(M) should work, and is comparable (regarding the effort) to allEffects(M).

@strengejacke strengejacke added the question ⁉️ Further information is requested label Jun 26, 2018
@strengejacke
Copy link
Owner

I'd be happy if someone who still had problems with memory allocation errors, could check the current GitHub-version. It automatically should calculate a reasonable pretty range of predicted values and should be much more memory efficient when calculating SE/CI for predictions.

@emjonaitis
Copy link
Author

emjonaitis commented Jun 29, 2018 via email

@strengejacke strengejacke added waiting for response 💌 Response from users awaited and removed question ⁉️ Further information is requested labels Jul 7, 2018
@strengejacke
Copy link
Owner

bump

@emjonaitis
Copy link
Author

emjonaitis commented Jul 18, 2018 via email

@strengejacke
Copy link
Owner

No, it's the correct version you are using, either the master branch or the current GitHub version should be more memory efficient.
When you run ggpredict(), does it display a message about prettifying the values?

@emjonaitis
Copy link
Author

emjonaitis commented Jul 19, 2018 via email

@strengejacke
Copy link
Owner

Thanks for looking into this. ggeffect() calls effects::effect(), so it might be an issue of the effects package. I'll try to find out where the issue is located.

@strengejacke
Copy link
Owner

I just realized that ggeffect() is not optimized, it only applies to ggpredict(). ggeffect() was a bit neglected by me, and I was always thinking of ggpredict() when talking about this issue. 🙄

@strengejacke
Copy link
Owner

Ok, memory allocation problems with ggeffect() should also be solved now.

@emjonaitis
Copy link
Author

emjonaitis commented Jul 31, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for response 💌 Response from users awaited
Projects
None yet
Development

No branches or pull requests

4 participants