Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-validation of CAR models cannot be done with stratified folds #637

Open
qdread opened this issue Apr 9, 2019 · 6 comments
Open

Cross-validation of CAR models cannot be done with stratified folds #637

qdread opened this issue Apr 9, 2019 · 6 comments

Comments

@qdread
Copy link

qdread commented Apr 9, 2019

Hi there Paul -- first off thanks so much for all your work! I appreciate the improvements to the cross-validation options but I came across an issue 馃槙

I am trying to implement "block" cross validation of a CAR model by sequentially holding out multiple regions for each fold of the CV. However this gives the error message: "Error: Cannot handle new locations in CAR models." I cannot determine what is going on in the traceback.

Here is a minimal example:

library(brms)
set.seed(1)

adjmat <- rbind('a'=c(0,1,0,0),
                'b'=c(1,0,1,0),
                'c'=c(0,1,0,1),
                'd'=c(0,0,1,0))
region <- sample(letters[1:4], 100, replace = TRUE)
x <- sample(1:10, 100, replace = TRUE)
y <- 2 * x + 5 + rnorm(100, 0, 2) + as.numeric(factor(region))

dat <- data.frame(region, x, y)

fit <- brm(formula = y ~ x, data = dat, family = 'gaussian', autocor = cor_car(adjmat, formula = ~ 1|region),
           chains = 2, iter = 500, warmup = 250)

# Random CV (does not throw error)
cv_random <- kfold(fit, K = 4, Ksub = as.array(1), chains = 2, iter = 500, warmup = 250, save_fits = TRUE, group = NULL, seed = 111)

# Blocked CV (throws error)
cv_blocked <- kfold(fit, K = 4, Ksub = as.array(1), chains = 2, iter = 500, warmup = 250, save_fits = TRUE, group = 'region', seed = 111)

# Also throws error if specifying stratified folds
cv_blocked <- kfold(fit, K = 4, Ksub = as.array(1), folds = 'stratified', chains = 2, iter = 500, warmup = 250, save_fits = TRUE, group = 'region', seed = 111)
@paul-buerkner
Copy link
Owner

The problem is that CAR structures cannot handle new locations (regions in your case) that were not present when fitting the model but are present when predicting new data. Stratified folds should work if you have enough observations per region so that each region is present in every fold. I will take a closer look at your example to see whats wrong.

@qdread
Copy link
Author

qdread commented Apr 9, 2019

Thanks for getting back to me! I had previously constructed the holdout datasets so that each region is present in every fold, but a reviewer argued that this does not correctly address spatial dependence and gives artificially low RMSE. So I wanted to address the comment by holding out entire regions one by one. I guess what you are saying is that the CAR structure inherently cannot support that?

@paul-buerkner
Copy link
Owner

Not in the current implementation. I don't know enough of CAR models to tell you if it is possible or not in general. So if you can point me to some ideas to make this possible I am happy to take a look and see if we can get make this possible eventually.

@qdread
Copy link
Author

qdread commented Apr 9, 2019

Thanks so much! The paper cited by the reviewer was this one by Roberts et al.: https://onlinelibrary.wiley.com/doi/full/10.1111/ecog.02881
I will try to track down some more ideas.

@qdread
Copy link
Author

qdread commented Apr 16, 2019

An additional resource which describes prediction in CAR models is a paper by Ver Hoef et al. that came out last year in Ecological Monographs: https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecm.1283
The appendix of that paper has the algorithm used for prediction. There is also a GitHub repo that goes with the paper: https://github.com/jayverhoef/spAReco. I think their methods can be used to generate fitted values for the regions that are missing from the holdout datasets.

@paul-buerkner
Copy link
Owner

Thanks! I will take a look, but I don't know when I will be able to implement any satisfying solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants