Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error for predict along a gradient with NNGP #96

Open
FabianRoger opened this issue Apr 29, 2021 · 14 comments
Open

error for predict along a gradient with NNGP #96

FabianRoger opened this issue Apr 29, 2021 · 14 comments
Labels
bug Something isn't working

Comments

@FabianRoger
Copy link

I fitted a model with a spatial random factor with the 'NNGP' method. Now I tried to predict with this model along a gradient.

My code is

 Gradient = constructGradient(models$m1, focalVariable = "Granskog")
 predY = predict(models$m1, Gradient = Gradient, expected = TRUE)

This gives me the error

    Error in get.knnx(data, query, k, algorithm) : 
    Number of columns must be same!.

Googling the error message brings me here but I am not sure if this is what's going on internally. If necessary, I can try to provide a repex later.

Thanks!

@jarioksa
Copy link
Collaborator

I can reproduce this in an NNGP model. I'll have a look at this.

@jarioksa
Copy link
Collaborator

This is now fixed in commit ee1367a. Please try the githbut version of Hmsc to see if it cures your problem (sloppy coding).

@jarioksa jarioksa added the bug Something isn't working label Apr 29, 2021
@FabianRoger
Copy link
Author

FabianRoger commented Apr 29, 2021

thanks!

I tried it with the updated version and the original error seems to be fixed. However I now get and error saying

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': the leading minor of order 1 is not positive definite

also, weirdly, when testing the same code on another model that is identical to the model m1 in all but that it uses abundance data conditional on presence (the second part of the hurdle model) the original error still occurs.

@jarioksa
Copy link
Collaborator

This error does not come directly from Hmsc, but from some other package or R function that we use. After getting the error, write traceback() and post the output.

@FabianRoger
Copy link
Author

this is what I get

 traceback()
 9: h(simpleError(msg, call))
 8: .handleSimpleError(function (cond) 
   .Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite", 
       base::quote(chol.default(W)))
7: chol.default(W)
6: chol(W)
5: t(chol(W))
4: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[, 
       r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]], 
       predictMean = predictEtaMean, predictMeanField = predictEtaMeanField)
3: predict.Hmsc(models$m1, Gradient = Gradient, expected = TRUE)
2: predict(models$m1, Gradient = Gradient, expected = TRUE)
1: predict(models$m1, Gradient = Gradient, expected = TRUE)

@ovaskain
Copy link
Collaborator

ovaskain commented Apr 30, 2021 via email

@jarioksa
Copy link
Collaborator

What bugs me in this error is that you should never execute command t(chol(W)): it is executed only in full spatial models, but never in NNGP models. Now I really need a reproducible example to see how can you get into this line in NNGP models.

@jarioksa
Copy link
Collaborator

jarioksa commented Apr 30, 2021

For anybody trying to inspect the issue, here is how I generated the reproducible test case:

m <- TD$m
rL1 <- TD$rL1
rL1 <- update(rL1, sMethod="NNGP", nNeighbours=4)
m <- update(m, ranLevels = list(sample = TD$rL2, plot = rL1))
m <- sampleMcmc(m, samples=100, thin=1, transient=50, nChains=2, nParallel=2)
Gradient <- constructGradient(m, focalVariable = "x2")
predY <- predict(m, Gradient = Gradient, expected = TRUE) # fails before ee1367a

Can you use something similar to reproduce your problem?

@FabianRoger
Copy link
Author

FabianRoger commented Apr 30, 2021

re distance between points:

It shouldn't be, no. The coordinates are centroids and about 20 km apart form each other. I checked and the minimum distance between any two points is 26km.

@FabianRoger
Copy link
Author

weirdly enough the repex above runs without error for me.

@fkeppeler
Copy link

Are there any updates about this issue?
I'm having similar problems when using both GPP and NNGP models.

traceback()
8: h(simpleError(msg, call))
7: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite",
base::quote(chol.default(W)))
6: chol.default(W)
5: chol(W)
4: t(chol(W))
3: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[,
r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]],
predictMean = predictEtaMean, predictMeanField = predictEtaMeanField)
2: predict.Hmsc(m2, Gradient = Gradient)
1: predict(m2, Gradient = Gradient)

@cgoetsch
Copy link

I am also getting this error for a non-spatial model, although I have a temporal effect of year. I am just trying to create the gradient plots by covariate, not make spatial predictions - those are working fine.

studyDesign = data.frame(
  tow = as.factor(da$UniqueID),
  grid_id = as.factor(da$grid_id),
  year = as.factor(da$YEAR))

#set up the random effect of sample level to get species co-occurrence at the sample level
rL.tow = HmscRandomLevel(units = studyDesign$tow)
#set up the temporal random effect - temporally explicit for temporal autocorrelation
rL.time = HmscRandomLevel(sData=time_da) 

m = Hmsc(Y = Y, XData = X, XFormula = XFormula.1.d,
distr="probit", studyDesign=studyDesign,
ranLevels=list("tow"= rL.tow, "year" = rL.time))

Gradient_cov = constructGradient(m, focalVariable = "rugosity")
predY_cov = predict(m, Gradient=Gradient_cov, expected = TRUE)

traceback()
9: h(simpleError(msg, call))
8: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "the leading minor of order 1 is not positive definite",
base::quote(chol.default(W)))
7: chol.default(W)
6: chol(W)
5: t(chol(W))
4: predictLatentFactor(unitsPred = levels(dfPiNew[, r]), units = levels(object$dfPi[,
r]), postEta = postEta, postAlpha = postAlpha, rL = rL[[r]],
predictMean = predictEtaMean, predictMeanField = predictEtaMeanField)
3: predict.Hmsc(m, Gradient = Gradient_cov, expected = TRUE)
2: predict(m, Gradient = Gradient_cov, expected = TRUE)
1: predict(m, Gradient = Gradient_cov, expected = TRUE)

@Yo-B
Copy link

Yo-B commented Nov 26, 2021

I have been having the same error with my temporal random effect. I cannot make prediction if I define it with argument sData when training the model. I also have a spatial random effect that works fine with NNGP or GPP, and removing it does not help in passing my temporal factor as sData. If I pass the temporal as unit, so not as a longitudinal factor, it works, but I'm surely loosing some information by doing this. I attach below a minimal reproducible example that triggers the same error. It is quite big, yet minimal enough I hope. Any suggestion on how I should define my longitudinal random effects here? BTW thanks guys for your awesome work!

library("dplyr")
library("Hmsc")

## data dimensions
n_cov <- 10
n_sample <- 100
n_species <- 20
n_trait <- 5

## generate data set with species occurrence, covariates, spatiotemporal coordinates and traits
XData <- data.frame(matrix(rnorm(n_sample*n_cov), ncol=n_cov))
colnames(XData) <- c("x", "y", paste0("var_", 1:(n_cov-2)))
XData$year <- sample(2000:2010, n_sample, replace=TRUE)

YData <- data.frame(matrix(rnorm(n_sample*n_species), ncol=n_species)) %>% 
  mutate_all(function(x) (x>.75)*1)
colnames(YData) <- paste0("sp_", 1:n_species)
TrData <- data.frame(matrix(rnorm(n_trait*n_species), ncol=n_trait))
colnames(TrData) <- paste0("tr_", 1:n_trait)
rownames(TrData) <- paste0("sp_", 1:n_species)

## subset the data set in train/test sets
train <- sample(n_sample, n_sample*.6)
designTrain <- XData[train, c("year", "x", "y")]
xTrain <- XData[train, paste0("var_", 1:(n_cov-2))]
yTrain <- YData[train, ]

## study design
studyDesign <- data.frame(year=factor(designTrain$year),
                          xy=factor(paste(designTrain$x, designTrain$y, sep="_")))
rownames(studyDesign) <- rownames(yTrain) <- rownames(xTrain) <- 1:nrow(yTrain)

## formulas
f_cov <- as.formula(paste0("~", paste(paste0("var_", 1:(n_cov-2)), collapse = "+")))
f_tra <- as.formula(paste0("~", paste(colnames(TrData), collapse = "+")))

## prepare temporal random effect
time <- data.frame(year=sort(unique(designTrain$year)))
rownames(time) <- time$year
## prepare spatial random effect
space <- data.frame(unique(designTrain[, c("x", "y")]))
rownames(space) <- paste(space$x, space$y, sep="_")
xyknots <- constructKnots(sData=space, knotDist = 1)
plot(space, asp=1) ; points(xyknots, col="red")
## define random effects
rL3 <- HmscRandomLevel(sData = time)
rL4 <- HmscRandomLevel(sData = space, sMethod = "GPP", sKnot = xyknots)

## Define the model
(m1 <- Hmsc(Y = yTrain,
            XData = xTrain, 
            studyDesign = studyDesign,
            TrData = TrData,
            TrFormula =  f_tra, 
            XFormula = f_cov,
            ranLevels = list(year=rL3, xy=rL4),
            distr = "probit"))

## train the model
m1 <- sampleMcmc(m1, thin=2, samples=100, nParallel=3, transient=10, nChains=3, verbose=10)

## predict on test and train data sets
designTest <- XData[!(1:n_sample %in% train), c("year", "x", "y")]
xTest <- XData[!(1:n_sample %in% train), paste0("var_", 1:(n_cov-2))]
yTest <- YData[!(1:n_sample %in% train), ]

## Try to make predictions
Gradient <- prepareGradient(m1, XDataNew = data.frame(rbind(xTest, xTrain)), 
                            sDataNew = list(
                              year=data.frame(year=rbind(designTest, designTrain)$year), ## commenting that line and removing rL3 from model definition above will allow predictions
                              xy=data.frame(rbind(designTest, designTrain)[, c('x','y')])))
pred_occ <- predict(m1, Gradient = Gradient) #### ERROR will occur here



@jarioksa
Copy link
Collaborator

jarioksa commented Dec 2, 2021

@Yo-B This is one of the issues I have been studying recently: thank you for supplying a reproducible example.

I have an experimental, dirty and noisy fix for this discussed in issue #123. You can try this installing the experimental branch with command devtools::install_github("hmsc-r/HMSC", ref="updater-errors"). It is experimental because I have not thoroughly tested it, it is dirty because I have not thoroughly thought over it, and it is noisy because it makes a whole lot of noise: it prints you a large amount of error messages which are ignored, and you will get a result despite these messages.

The reason of failure in several of these cases seems to be that Gradient values duplicate data and therefore you have zero-distances between Gradient points and data points (training points). This in turn gives you zero W matrix – and a matrix of zeros is not positive definite. The purpose of this calculation is to get the value of standard error for random normal variate, and that standard error is distance-dependent, and zero when distance is zero.

The risk of duplicating data points is particularly large in time random variables which – like in your case – are integer years and the generated Gradient values will duplicate data values.

This fix may also work with the cases of @cgoetsch, @fkeppeler and @FabianRoger but I haven't tested with their data. Remember: you should get a lot of error messages, but these will not stop the analysis, but you will get a result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants