Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number of samples for cross-validation #85

Closed
FabianRoger opened this issue Mar 3, 2021 · 5 comments
Closed

number of samples for cross-validation #85

FabianRoger opened this issue Mar 3, 2021 · 5 comments

Comments

@FabianRoger
Copy link

Hi,

I am trying to evaluate the model fit by two-fold cross-validation using the computePredictedValues function. However, I am confused by what I should specify for the sampling. I looked in both the documentation and the book but didn't find an explanation of how the models are fitted to each fold. Sorry if I missed it!

The computePredictedValues has the parameters start, thin, and mcmcStep, each of which defaults to 1. Does that mean the model would be fitted with a single iteration? This seems implausible? Or do I need to specify the sampling to match the sampling form the model fitting?

For the model I fitted I used the following parameters:

nChains = 10
samples = 200
thin = 200
transient = 0.5*thin*samples

Sorry if I miss the obvious.

ps: I also get the error

keeping only two first columns of 'distr' matrixsetting updater$GammaEta=FALSE: not implemented for spatial methods 'GPP' and 'NNGP'

(I set the updater to FALSE as it was suggested to me this would speed up the mcmc sampling significantly. I didn't get an error during the initial model fitting)

Is that anything to worry about?

Thank you for your help!

@jarioksa
Copy link
Collaborator

jarioksa commented Mar 3, 2021

Neither of those is an error. A characteristic feature of errors is that the message starts with word Error: and the execution of code stops at that moment.

Both are informative messages. The first implies that you had an old model fitted with an earlier version (earlier than 3.0-9) of Hmsc which had two unused columns in the distribution matrix, and now we ignore those two unused ones (they really were unused earlier, too, but if you imagined that you could put there something useful, you are informed that it does not work and never worked).

The second tells that one of the updaters will not be used for your model because you had a spatial method which is not supported by that updater (GammaEta).

@FabianRoger
Copy link
Author

sorry about that, I meant warning. Thanks for the explanation.

@jarioksa
Copy link
Collaborator

jarioksa commented Mar 3, 2021

See issue #86 that touches the same problem.

@jarioksa
Copy link
Collaborator

jarioksa commented Mar 3, 2021

@FabianRoger computePredictedValues computes predictions for all posterior samples, but you can skip some by setting thin and start. Then you get smaller predictions arrays (all sampling units, all species, but not so many posterior samples). MCMC sampling is only performed with cross-validated predictions, but there we use the same parameters as in the original Hmsc models (samples, thin, transient). The argument mcmcStep is used to update random effects in conditional models: see the help pages for predict.Hmsc in addition to computePredictedValues.

@FabianRoger
Copy link
Author

MCMC sampling is only performed with cross-validated predictions, but there we use the same parameters as in the original Hmsc models

Thanks! This answers my question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants