Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict over millions of sites (non-spatially explicit); out of memory #145

Open
stephanJG opened this issue Jul 6, 2022 · 0 comments
Open

Comments

@stephanJG
Copy link

Hi Hmsc team,
I am trying to predict over many new sites (~30 mil sites) with a large model ("142 sampling units, 232 species, 8 covariates, 1 traits and 3 random levels"), but starting to wonder if this is computational feasible.
Neither of the random levels have spatial dimensions. I am using the predict.Hmsc function, after creating new data using the prepareGradient function:
gradient<-prepareGradient(model, XDataNew = XData.grid)
predY<-predict(object = model, Gradient=gradient, expected = TRUE)

Until now I have:

  • only tested using the data from the model and increased the number of copies (1 to 5 times the data the model was build with)
  • split the data so I can run them parallel on the HPC
  • have saved predY in an array and rounded the predicted values to 4 digits to decrease the Mb
  • (in another project I successfully predicted over 20 mil sites with a model for 1 species using GPP)

If I use 4 times the data (568 sites) it takes 2.5 hours to predict. However, already at 5 times (710 sites) the HPC give me an out-of-memory message, which may be understandable as this is already 164.720.000 numbers (142 sites * 5 times * 232 species * 1000 draws)?

How can I make predictions for these data?
Many thanks in advance
Best
Jörg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant