Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic error during fitting or cross-validation in: sampleMcmc -> alignPosterior -> abind #27

Closed
rburner opened this issue Oct 23, 2019 · 13 comments
Labels
bug

Comments

@rburner
Copy link

@rburner rburner commented Oct 23, 2019

Hi. When fitting several lists of 3-4 models I've been getting the following error, but not every time.

For example, when I run fit the models using short chains (e.g. 50 samples with thin = 3) as a test, all three models will fit and cross-validate successfully.

But, when I try with more samples (using the mcmc parameters as below) I eventually get the error I show below. Sometimes the error comes up during model fitting, other times during cross-validation. Sometime on the first model in the list, sometimes the second.

The models all have 100 sites, 100 species, 5 continuous environmental covariates, 3 trait covariates (one of which is a 3-level factor), and a phylogeny. Random effects include site (non-spatial), year, and project.

#read in models
redHPC1=readRDS("t02ns.rds")

#make list for cross validation results
crossFit=list("a")

#set fitting parameters
nChains = 2
thin = 5
samples = 1000
transient = 400thin
verbose = 150
thin

for (i in 1:length(redHPC1)) {
#run the model in parallel
redHPC1[[i]] = sampleMcmc(redHPC1[[i]], thin = thin, samples = samples,
transient = transient, nChains = nChains, nParallel = 2,
verbose = verbose, initPar = "fixed effects")

#save results
saveRDS(redHPC1, file = "t02nsFIT.rds")

#do cross-validation
partition = createPartition(redHPC1[[i]], nfolds = 2, column = "Site")
preds = computePredictedValues(redHPC1[[i]],partition=partition,nParallel = 2)
crossFit[[i]] = evaluateModelFit(hM=redHPC1[[i]], predY=preds)

#save cross-validation results
saveRDS(crossFit, file = "t02nsCROSS.rds")
}`

[1] "Setting updater$Gamma2=FALSE due to specified phylogeny matrix"
Error in abind(cpL[[j]]$Delta[[r]], array(1, DeltaAddDim), along = 1) :
arg 'X2' has dims=1, 2; but need dims=X, 1
Calls: sampleMcmc -> alignPosterior -> abind

@gtikhonov
Copy link
Collaborator

@gtikhonov gtikhonov commented Oct 23, 2019

I think I fixed this issue just an hour ago - was some unclearly-sourced bug happening at the stage of aligning the number of latent factors on different chains after the HMSC model was fitted (commit 45f6071). Please try again now.

@rburner
Copy link
Author

@rburner rburner commented Oct 23, 2019

@gtikhonov Great, thank you! I'll retry

@rburner
Copy link
Author

@rburner rburner commented Oct 23, 2019

@gtikhonov I was about to ask about this issue too, but maybe it got fixed by the same modifications?

Same pattern, except this error shows up when I'm using a spatial random effect. Models again run fine with short chains, and sometimes fit and cross-validate with long chains, but eventually I get this error (either at fitting or cross val):

Error in if (alphapw[alpha[h], 1] > 0) { : argument is of length zero
Calls: computePredictedValues -> predict -> predict.Hmsc -> predictLatentFactor

@gtikhonov
Copy link
Collaborator

@gtikhonov gtikhonov commented Oct 23, 2019

Yes, this issue can have the same conceptual source - varying number of factors in different chains. Did it happen with recent versions?

@rburner
Copy link
Author

@rburner rburner commented Oct 23, 2019

Yes, it happened with an installation from ~16 October

@gtikhonov
Copy link
Collaborator

@gtikhonov gtikhonov commented Oct 23, 2019

Well, currently I do not notice any logical flaws which could lead to such situation. Although this does not guarantee that there are none. Just to ensure several things:
0) you are using latest GitHub master branch version of the package

  1. in the call to sampler the aligning argument is set to true: sampleMcmc(..., alignPost=TRUE)
  2. you actually refit the model, not just running post processing using some model fitted with older version of code
  3. if both 1 and 2 are satisfied, try to run alignPosterior(hM) prior to your call to predict(...). If any of the methods throw an error, there is definitely some bug.

You can start with point 3) if refitting your models takes considerable time. If you can replicate this behaviour with any sharable code that I can rerun, that would be helpful.

@rburner
Copy link
Author

@rburner rburner commented Oct 23, 2019

Ok, I will work through that list and see what I can figure out. The first thing I notice is that I was using the CRAN version, so I will re-install from github.

I haven't included alignPost=TRUE when using sampleMcmc but can start...but TRUE is maybe the default?

@rburner
Copy link
Author

@rburner rburner commented Oct 23, 2019

Ok, I re-installed from github using (correct?):
install_github("hmsc-r/HMSC", build_opts = c("--no-resave-data", "--no-manual"),force=TRUE)

Then I ran the following code and still go the same error. This was on an HPC cluster just fyi.

Maybe I can upload a model object for you to try?
Thanks!

#####################################
R version 3.5.3 (2019-03-11) -- "Great Truth"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

#call packages
library(usdm)
Loading required package: sp
Loading required package: raster
library(Hmsc)
Loading required package: coda
library(MASS)
library(rockchalk)
library(ape)
library(corrplot)
library(MCMCpack)

#set working directory
setwd("/ddnB/work/rburne4/HmscFiles")

#read in models
redHPC1=readRDS("t03.rds")

#make list for cross validation results
crossFit=list("a")

#set fitting parameters
nChains = 2
thin = 5
samples = 1000
transient = 400thin
verbose = 150
thin

for (i in 1:length(redHPC1)) {
#run the model in parallel
redHPC1[[i]] = sampleMcmc(redHPC1[[i]], thin = thin, samples = samples,
transient = transient, nChains = nChains, nParallel = 2,
verbose = verbose, initPar = "fixed effects", alignPost = TRUE)

redHPC1[[i]]=alignPosterior(hM=redHPC1[[i]])

#save results
saveRDS(redHPC1, file = "t03FIT2.rds")

#do cross-validation
partition = createPartition(redHPC1[[i]], nfolds = 2, column = "Site")
preds = computePredictedValues(redHPC1[[i]],partition=partition,nParallel = 2)
crossFit[[i]] = evaluateModelFit(hM=redHPC1[[i]], predY=preds)

#save cross-validation results
saveRDS(crossFit, file = "t03CROSS2.rds")
}
[1] "Setting updater$Gamma2=FALSE due to specified phylogeny matrix"
[1] "Cross-validation, fold 1 out of 2"
[1] "Setting updater$Gamma2=FALSE due to specified phylogeny matrix"
Error in if (alphapw[alpha[h], 1] > 0) { : argument is of length zero
Calls: computePredictedValues -> predict -> predict.Hmsc -> predictLatentFactor
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

@gtikhonov gtikhonov added the bug label Oct 24, 2019
@gtikhonov
Copy link
Collaborator

@gtikhonov gtikhonov commented Oct 24, 2019

I've fixed the minor bug that caused this misbehaviour. It was just due ti setting some indices to 0 instead of 1. Perhaps I just had too much Python coding recently and it did not catch my eye immediately.
You do probably need to refit the model though, or do some direct manipulation with the

m$postList[[chain_index]][[sample_index]]$Alpha[[spatial_latent_factor_index]]

objects - these are vectors and all values of 0 must be replaced to 1. Then you can use the fitted object in the postprocessing.

@rburner
Copy link
Author

@rburner rburner commented Oct 24, 2019

Great, thank you for the quick response! Will give it a try.

@rburner rburner closed this Oct 24, 2019
@plthompson
Copy link

@plthompson plthompson commented Dec 6, 2019

Hi Gleb,

I seem to be getting the same error when running multiple chains using the sampleMcmc function.

Error in abind(cpL[[j]]$Delta[[r]], array(1, DeltaAddDim), along = 1) :
arg 'X2' has dims=1, 2; but need dims=X, 1

I don't get it every time, but it is the majority of times. It seems to be an issue with joining the chains at the end as the chains appear to run fine. I haven't yet had an issue when running a single chain.

Here is my model structure:

m <- Hmsc(Y = Y,
XData = XData,
XFormula = ~ poly(temp, degree = 2, raw = TRUE) * dispersal,
studyDesign = studyDesign,
ranLevels = list(tank = rL, metacommunity = rL_meta),
distr = "poisson")

Temp is continuous.
Dispersal is a three level factor - none, low, high.
Tank as a random factor has 48 levels, metacommunity as a random factor has 12 levels.

I have not encountered this error when I run models with temp or dispersal on their own as fixed effects.

I am running the current GitHub version of Hmsc in R.3.6.1.

@gtikhonov
Copy link
Collaborator

@gtikhonov gtikhonov commented Dec 6, 2019

Yes, it is a very annoying problem caused by a small utility that is called from sampleMcmc() just before you get the results back. It is on my to-do list and I will try to fix it as soon as I find enough time.
But you can always disable that - just use sampleMcmc(... , alignPost = TRUE). This should fix the problem.

@plthompson
Copy link

@plthompson plthompson commented Dec 6, 2019

Thanks for the quick reply and solution. I really appreciate the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.