-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of 'thin' and 'start' in computePredictedValues() #86
Comments
library(Hmsc)
> TD$m # example data in the package
Hmsc object with 50 sampling units, 4 species, 3 covariates, 3 traits and 2 random levels
Posterior MCMC sampling with 2 chains each with 100 samples, thin 1 and transient 50
> preds <- computePredictedValues(TD$m)
> dim(preds)
[1] 50 4 200
> preds <- computePredictedValues(TD$m, thin=10)
> dim(preds)
[1] 50 4 20 The number of samples in the predicted object is defined by the There may be some deeper reason for your failure than the object size. The result is a 3-D array with dimensions sampling units (10K+) times species times samples (6000). You don't say how many species you had, but with 10000 sampling units and 6000 samples you may exceed the integer maximum with 36 species: > .Machine$integer.max/10000/6000
[1] 35.79139 I don't know about long-integer support in the underlying code, but indexing can become a problem. Probably storage space as well. I hope you don't crash your HPC cluster: it costs a lot to buy a new one. |
Hi @jarioksa thanks for all that info! That example works for me too. But, I didn't mention that I was also using a partition (for CV), and when I add that partition to the example the dimensions remain the same (50 x 4 x 200) even with thinning. |
@rburner : this is a bug, and actually a severe bug. I'll have a look at this ASAP. |
@jarioksa Jari, wow ok, thanks for looking into this! Let me know. |
Functin computePredictedValues always returned array with original number of samples even if user defined thin or start for a smaller array. The real bug was that only the reduced number of samples was calculated, but the array was filled to the original number of samples by replicating predicted values. Discussed in issue #86 in github.
This should be fixed now in github with commit 51d14ec. Please try to see if this is sufficient (you still have 10k+ sampling units and multiplications is a nasty operation, and multiplying three numbers is much nastier than multiplying two numbers: 3-dim arrays can be huge even if you make one dimension shorter). |
@jarioksa Great Jari, thanks so much for that! I will see if I can make it work with e.g. 500 samples! Best, Ryan |
Hi! I have a model with many sampling units (10k+), fitted using many (probably too many) samples (1000 per chain x 6 chains).
The problem is that when I use computePredictedValues() the array gets too big (50 gb+) and crashes the HPC cluster I'm using. I thought I could reduce this size by decreasing the 'depth' of the predicted values array from 6000 (n samples) to a smaller number using the 'thin' option. But, in tests with smaller models, using thin doesn't change the dimensions of the predicted values array.
The thin option does appear to affect the behavior of poolMcmcChains() within the computePredictedValues() function, but then predict() doesn't seem to notice.
Is this how the 'thin' and 'start' arguments are supposed to be used?
Of course I could refit the models with fewer samples, but would use a shortcut if available.
Thanks!!!
The text was updated successfully, but these errors were encountered: