sampleMcmc() error in chol.default(W) #8

keilastark · 2019-05-30T00:21:51Z

I am trying to fit a model with 4 random effects, one is spatial coordinates to be specified with the sData argument, and three are hierarchical (Quadrat, Site, and Region). The input for the spatial random effect object (sData) is a matrix with two columns (lat and long), and a row for each observation. The spatial coordinates are in conventional lat and long to four decimal places, as some points are as close as 15m apart. However each row has a distinct set of coordinates; I have tested this. The object "Pi" below describes my three hierarchical random effects.

spat <- data.frame(spat = sprintf('spatial_%.2d',1:78))
studyDesign <- cbind(Pi, spat)

rL1 = HmscRandomLevel(units = studyDesign$Quadrat)
rL2 = HmscRandomLevel(units = studyDesign$Site)
rL3 = HmscRandomLevel(units = studyDesign$Region)
rL4 = HmscRandomLevel(sData = coords[,2:3])

m <- Hmsc(Y = Y, XData = X, XFormula = ~eelgrass_lai +  dissox + salinity + ph + sstmean + nitrate + chlomean, studyDesign = studyDesign, ranLevels = list(Quadrat = rL1, Site = rL2, Region = rL3, spat = rL4))

When I use the sampleMcmc() function I get the error:

Error in chol.default(W) : the leading minor of order 2 is not positive definite

I do not see this error when I attempt to fit the exact same model but with no spatial random effect (rL4 not included). I have also tried swapping the order of the rL's when specifying my Hmsc object, but this doesn't change anything.

The text was updated successfully, but these errors were encountered:

gtikhonov · 2019-05-30T06:15:18Z

@keilast thanks for posting this here instead of the personal email.
Indeed, swapping the order of elements in the rL argument should have no essential effect on the results. However, now since you explicitly mentioned that

The spatial coordinates are in conventional lat and long to four decimal places, as some points are as close as 15m apart.

I guess that potential source of error you encounter can be simply due to the numerical precision of the spatial covariance matrix. Thus, since you have very close locations and quite-far-away locations (guessing based on that you have Region random effect), the covariance matrix for your spatial random effect is going to be close to singular, leading to the Cholesky decomposition failure.
As a fast way to test this hypothesis, I would suggest that you try to test whether the code runs if you drop some of your spatial locations, so that none of them are closer than X meters apart (e.g. you can try with X = 0.1% of your maximum distance between locations). If that is the reason, we will give some more detailed advice on how you can handle the whole data.
One more question that would help us to pinpoint the problem - does your code throw this error before you see the "Computing chain 1" output (or any other output from the sampleMcmc function) to your console or after?

keilastark · 2019-05-31T05:53:07Z

Yes, I suspected the same re: the precision of the spatial points. I have considered converting the coordinates to UTM, however they span two UTM zones, and I am not sure this will do anything to fix the problem.

Yes, the code throws the error instantaneously, before any "Computing chain" outputs.

Removing sites so the minimum cutoff distance is >10% of the maximum distance does not fix the problem either, yet works fine when I try to fit that reduced-site version of the model without the spatial rL.

Thank you!

gtikhonov · 2019-05-31T07:20:59Z

@keilast , so now I am almost 100% sure that your problem happens in the internal function computeDataParameters defined in computeDataParameters.R, which is called from the sampleMcmc method. Particularly the call to the chol function in line 68 of that file.

Given your description of the spatial data you input to the model fitting, there should be no problem, yet you still encounter it. Thus, I suggest that you try to execute that part of code outside of package in separate script, which would allow to explicitly check what and why goes wrong. So, just copy-paste the lines 47-75 from that file and try to run them after you have defined your model as you did earlier, but save it to hM variable:
hM <- Hmsc(Y = Y, XData = X, XFormula = ~eelgrass_lai + dissox + salinity + ph + sstmean + nitrate + chlomean, studyDesign = studyDesign, ranLevels = list(Quadrat = rL1, Site = rL2, Region = rL3, spat = rL4))

If you have the same error message, check the content of variables W, distance and alpha. Also check the ag variable and please tell its value. The W must be positive-definite matrix. If it is not and alpha is not NA, you should check the distance - this is Euclidian distance between the locations you've provided. There should be neither NA in this matrix, nor off-diagonal zero values. If there are NA - check the s variable, it has to be something wrong with the coordinates you provide (e.g. they are read as strings instead of numerical values)....

keilastark · 2019-06-21T02:35:52Z

Hello,

So I have gone and done as you've recommended, and the same error came up.

W = square matrix of mostly NA's, except for the diagonal elements which = 1
distance = same as W
ag = 2
alpha = 0.92
s = 2 columns of NA's

Since you mentioned the coordinates may not be read properly, I saved them as a matrix (previously was a dataframe) before naming it as sData in rL4, and ran the exact same code. Now this error comes up:

Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds

, but I checked hM$rL[[4]]$s give the correct spatial coordinates. I have tried transforming the coordinate data (ie. multiplying by 10 000) but that did not change the outcome.

Thanks

gtikhonov · 2019-06-21T11:09:16Z

Hello,
Thanks for the testing you've conducted!

distance = same as W

This clearly indicates that there is some issue with imputation of the coordinates. I believe that the reason is due to reading-in some non we almost figured out the issue.

I saved them as a matrix (previously was a dataframe) before naming it as sData in rL4

Are you sure that your matrix consists of numbers and not strings and its rownames are the same values as you plug to the spat <- data.frame(spat = sprintf('spatial_%.2d',1:78))
E.g. that your matrix look like this:
matrix(1:10,5,2,dimnames=list(sprintf('spatial_%.2d',1:5),NULL)),
and not like this
matrix(as.character(1:10),5,2,dimnames=list(sprintf('spatial_%.2d',1:5),NULL))
or this
matrix(1:10,5,2)

Thanks for co-operation!

keilastark · 2019-07-08T22:44:13Z

Hello, sorry for the delayed response on this!

A colleague tinkered with my data and the function ran successfully:

spatial$longitude <- spatial$longitude-min(spatial$longitude)
studyDesign$Spatial <- factor(studyDesign$Quadrat)

#then establish random levels
rL1 = HmscRandomLevel(units = studyDesign$Quadrat)
rL2 = HmscRandomLevel(units = studyDesign$Site)
rL3 = HmscRandomLevel(units = studyDesign$Region)
rL4= HmscRandomLevel(sData = spatial)

Cheers!

LimaRAF · 2019-12-06T13:19:29Z

Dear @gtikhonov ,

I am having a similar problem trying to sampleMcmc a model 2 random effects, one that was the spatial coordinates and the other a regional categorical variable. Since I have >1000 community samples, the spatial method set to GPP, using about 117 knots. It is a hurdle model, so I am first fitting the presence/absence data and then species abundances (individuals ha-1 from 0.001 until ~2000) and relative densities (0-1 values). I log-transformed abundances and relative densities to use distr = "normal".

All my coordinates are in meters and I have jittered coordinates that were too close to each other to avoid problems related to the precision of the spatial coordinates. However, I still have distances between units that vary from 5 meters to 3e+06 meters. Anyways, not sure if this spatial precision is indeed an issue while using GPP.

Notably, the error only appeared when I increased the thin x samples combination and only for the abundance part of the model. The error appears error by the end of chain 1, sometimes of chain 2:
Error in chol.default(iU) :
the leading minor of order 19 is not positive definite

I run the tests you suggested for keilast here are the results for the computeDataParameters part of the sampleMCMC function. Here are the results:
alpha = 4257548
ag = 101
W is positive-definite and has no NAs
'distance' has no NAs but it is not positive-definite.
's' a matrix with 2 columns of numbers and no NAs, with upper off-diagonal = 8916294 and lower off-diagonal = 5766795 9000554.

In conclusion, I think the problem is probably in the small distance between some of my sites. But there may be a possible interaction between these distances and data type (problem not arises with presence/absence data, only abundances).

Any clues? You mentioned some that you had more detailed advice on how you can handle the whole data. I have spent the entire morning truing to solve this issue, so any help would be very welcome.

Thanks in advance,
Renato

gtikhonov · 2019-12-06T14:33:52Z

Hi @LimaRAF
Seems that your problem is quite different from the one above. And given your description of your spatial design, the numerical precision uses are very plausible (which were not the problem in the example above).

Generally, I would recommend to reconsider how you input the spatial effect to the study. However, the way you do that depends on what you want to capture with the spatial effect. One option is instead of having a spatial effect on the level of samples, you can have it on the level of "clusters of samples". and then extra no-spatial effect on the level of samples. Other option is to have a single spatial latent effect, but restrict its spatial range to somewhat small (although GPP is a poor choice for this one). The first option aims to capture long-range spatial correlation, the second - short range. Well, you can have both as well. But the key problem here is that it causes numerical instabilities if you do it in the straightforward way, so here is the idea of separating it in two levels.

LimaRAF · 2019-12-06T15:03:24Z

Dear @gtikhonov ,

Thanks for the quick reply and sorry for posting in an different issue type (do you want me to post it separatedly for the record?)

However, I am not sure I fully understood your suggestion. You are proposing to nest my samples in my regional categorical variable?

The region has a big impact on species distribution, but setting it as a fixed effects led to predictions on parts of the regions where the species actually does not occurrence or creating very sharp differences in the prediction in the transitions/contact of these regions. So the solution was to use it as an extra random effect, hoping to still capture the effect but without creating such 'weird' predictions.

Currently, I am defining my study design and random effects like this:
studyDesign = data.frame(site = as.factor(sites$SiteCode), ecoreg = as.factor(sites$ecoreg.af))
where 'SiteCode' are my samples and 'ecoreg.af' my biogeographical regions. 'sites' is my data frame

My Sdata:
xy = as.matrix(cbind(sites$x, sites$y)) # x and y are now in UTM (meters)
rownames(xy) = studyDesign[,1]

Defining the random effects and setting the spatial model:
Knots = constructKnots(xy, nKnots = 60, minKnotDist = 10000)
rL.gpp = HmscRandomLevel(sData = xy, sMethod = "GPP", sKnot = Knots)
rL.region = HmscRandomLevel(units = levels(studyDesign$ecoreg))

Are you suggesting to do:
rownames(xy) = studyDesign[,2]

and then drop rL.region from my the 'ranLevels'of my model?

Thanks again,
Renato

gtikhonov · 2019-12-06T15:26:30Z

I have opened and will reply in a separate issue.

aeiche01 · 2020-09-19T20:19:04Z

For people reading this in the future: I had the same issue as keilast Error in chol.default(W) : the leading minor of order 2 is not positive definite and then ran into the same subsequent error once I put the data into a matrix: Error in hM$rL[[r]]$s[levels(hM$dfPi[, r]), ] : subscript out of bounds

I was able to fix this error by making sure that the rownames of the latlong matrix matched the spatial plots/routes/whatever the coordinates are supposed to mark in the studyDesign data frame.

My code looks like this: rownames(xy)=studyDesign$routes

gtikhonov added bug Something isn't working algorithm Issues with the algorithm implementation labels May 30, 2019

keilastark closed this as completed Jul 8, 2019

gtikhonov mentioned this issue Dec 6, 2019

Numerical instability in a very uneven spatial study design #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampleMcmc() error in chol.default(W) #8

sampleMcmc() error in chol.default(W) #8

keilastark commented May 30, 2019

gtikhonov commented May 30, 2019 •

edited

Loading

keilastark commented May 31, 2019

gtikhonov commented May 31, 2019 •

edited

Loading

keilastark commented Jun 21, 2019

gtikhonov commented Jun 21, 2019

keilastark commented Jul 8, 2019 •

edited

Loading

LimaRAF commented Dec 6, 2019

gtikhonov commented Dec 6, 2019 •

edited

Loading

LimaRAF commented Dec 6, 2019 •

edited

Loading

gtikhonov commented Dec 6, 2019

aeiche01 commented Sep 19, 2020 •

edited

Loading

sampleMcmc() error in chol.default(W) #8

sampleMcmc() error in chol.default(W) #8

Comments

keilastark commented May 30, 2019

gtikhonov commented May 30, 2019 • edited Loading

keilastark commented May 31, 2019

gtikhonov commented May 31, 2019 • edited Loading

keilastark commented Jun 21, 2019

gtikhonov commented Jun 21, 2019

keilastark commented Jul 8, 2019 • edited Loading

LimaRAF commented Dec 6, 2019

gtikhonov commented Dec 6, 2019 • edited Loading

LimaRAF commented Dec 6, 2019 • edited Loading

gtikhonov commented Dec 6, 2019

aeiche01 commented Sep 19, 2020 • edited Loading

gtikhonov commented May 30, 2019 •

edited

Loading

gtikhonov commented May 31, 2019 •

edited

Loading

keilastark commented Jul 8, 2019 •

edited

Loading

gtikhonov commented Dec 6, 2019 •

edited

Loading

LimaRAF commented Dec 6, 2019 •

edited

Loading

aeiche01 commented Sep 19, 2020 •

edited

Loading