Covariate-dependent latent variables #31

aminorberg · 2019-11-28T08:51:30Z

Would any of the developers happen to have a full working example of the usage of the covariate-dependent latent variables?

gtikhonov · 2019-11-28T10:45:47Z

Hei @aminorberg
The precise answer to your question very much depends on what you refer to as a full working example. As far as I am aware of, the functionality from the original methodological paper is properly implemented and anything not working with of that shall be treated as bug.
On the other hand, there are currently a lot of post processing utilities in the package, which bring the raw results of the MCMC model fitting closer to ecological interpretation. These are not guaranteed to work fine with variable latent loadings, both due to

the fact that often it is improper to interprete them in similar manner as the constant latent loadings and the underlying math is yet to be designed,
in some of those cases, when the interpretation is transferrable, the current implementation is deficient due to limited robustness caused by iterative development process.

In personal communication, @ovaskain claimed that they encountered multiple challenges with this class of models, both in terms of sensitivity of prior assumptions and/or sophisticated exploration of the posterior typical set using available MCMC algorithm.

aminorberg · 2019-11-28T12:41:16Z

Thanks @gtikhonov!

By working example I mean e.g. an example using the data provided with the package, since I can't even get that working, and I think it's just that I don't know the syntax. This is what I tried:

studyDesign <- data.frame(sample = as.factor(1:50), 
                          plot = as.factor(sample(1:20, 50, replace = TRUE)))
rL2 <- HmscRandomLevel(units = TD$studyDesign$plot)
rL1 <- HmscRandomLevel(xData = data.frame(x1 = rep(1, length(TD$X$x1)), 
                                          x2 = TD$X$x2))

m <- Hmsc(Y = TD$Y, XData = TD$X, 
          XFormula = ~x1+x2, 
          studyDesign = studyDesign, 
          ranLevels = list("sample" = rL1, "plot" = rL2))

ps <- sampleMcmc(m, samples = 1000)

And when attempting to sampleMcmc, I get an error:

Error in (Eta[[r]][Pi[, r], ] * rL[[r]]$x[as.character(dfPi[, r]), k]) %*%  : 
  non-conformable arguments
In addition: Warning message:
In Ops.factor(Eta[[r]][Pi[, r], ], rL[[r]]$x[as.character(dfPi[,  :
  ‘*’ not meaningful for factors

gtikhonov · 2019-11-28T12:43:49Z

Looks like a bug. Will have a close look at this.

jarioksa · 2019-11-28T17:22:36Z

Looks like this comes from updateZ in loop of lines 22 to 30, and exactly there in line 28:

   for(r in seq_len(nr)){
      if(rL[[r]]$xDim == 0){
         ...
      } else{
         LRan[[r]] = matrix(0,ny,ns)
         for(k in 1:rL[[r]]$xDim)
            LRan[[r]] = LRan[[r]] + (Eta[[r]][Pi[,r],]*rL[[r]]$x[as.character(dfPi[,r]),k]) %*% Lambda[[r]][,,k]
      }
   }

and basically the error is caused by this statement with innermost loop at k == 2

 rL[[r]]$x[as.character(dfPi[,r]),k]
 [1] o o o o o o o o o o o o o o o o o o o o o o o o o c c c c c c c c c c c c c
[39] c c c c c c c c c c c c
Levels: c o

which is a factor that is not meaningful in multiplication (like the warning said), and when used within

Eta[[r]][Pi[, r], ] * rL[[r]]$x[as.character(dfPi[,r]),k]

The result drops dimensions (50,2) and returns a vector of 100 NA values and this fails in the following matrix multiplication (%*%) with the error of non-conformable arguments. So it is about handling factor variables.

gtikhonov · 2019-11-28T17:47:07Z

Thanks, @jarioksa, I was going to mark the same thing. We have not yet implemented the XFormula type of data specification for this feature. However, there is one more issue - namely in the alignPosterior() function, which is be default called after the MCMC is done. I have not fixed it yet, but the following code is a fully operating example of model fitting for covariate-dependent associations:

library(Hmsc)
studyDesign <- data.frame(sample = as.factor(1:50),
                          plot = as.factor(sample(1:20, 50, replace = TRUE)))
rL2 <- HmscRandomLevel(units = TD$studyDesign$plot)
xData = data.frame(x1=rep(1, length(TD$X$x1)), x2=as.numeric(TD$X$x2=="c"))
rL1 <- HmscRandomLevel(xData = xData)
m <- Hmsc(Y = TD$Y,XData = TD$X,
          XFormula = ~x1+x2,
          studyDesign = studyDesign,
          ranLevels = list("sample" = rL1, "plot" = rL2))
ps <- sampleMcmc(m, samples = 1000, alignPost=FALSE)

jarioksa · 2019-11-28T17:58:53Z

@gtikhonov : It works in this case where x2 is a two-level factor which can be turned into one contrast variable. A more general solution is:

model.matrix(reformulate(names(TD$X[, -1])), TD$X)

model.matrix generates an (Intercept) which already is in TD$X and one of the replicated intercepts can be removed – this is one way, another way is to just drop the first column of the resulting model matrix. This solution will work when there is any number of factors with any number of levels. The generation of the model matrix could be incorporated in updateZ to allow flexible use of factor variables.

oysteiop · 2019-11-28T21:25:24Z

Hi, I have also been exploring this recently. Despite the minor bugs the models seem to fit. Regarding how to compute the Lambdas per site as a function of the covariates, I have tried (continuing from Gleb's working example above):

#Covariates for plot-level latent variable
xmat = as.matrix(m$rL[[1]]$x)
dim(xmat)

#Residual correlations given observed covariates
OmegaCor = getPostEstimate(ps, "OmegaCor", r=1, x=xmat)$mean
OmegaCor

#Residual correlations given observed covariates for first site
OmegaCor = getPostEstimate(ps, "OmegaCor", r=1, x=xmat[1,])$mean
OmegaCor

gtikhonov · 2019-11-28T21:44:21Z

@oysteiop, this does not look like a valid approach:

you repeat l variable (small L) both for the loop index and accumulated sum. At least my copy of your code indicates this, maybe you meant I and l. Such repetition would not bring you any good.
you shall not sum over the l1 at all, and your output should have dimension [2,4,50] ([nf,ns,ny]).

The package tensorA features a function that can replace the whole loop with single line. Additionally, the Hmsc function getPostEstimate features parameter x that controls the condition for association matrix if you set parName="Omega".

oysteiop · 2019-11-28T21:59:16Z

Thanks for spotting this typo. I have edited my comment above to you much simpler solution.

jarioksa · 2019-11-29T08:00:58Z

I think you are on a dangerous path: these tricks work when you study a case with a two-level factor where the internal numeric coding will give the correct contrast. However, this approach fails when you have a factor with three levels, where the internal numeric coding just changes the factor into an arbitrary continuous variable. Most dangerously, it will fail silently: no error messages, but wrong results. You must study the model matrix which changes a three-level factor into two contrast variables (and p-level factor into p-1 contrast variables). Solving a margin case will bring trouble and grievance later in life.

aminorberg · 2019-11-29T08:42:31Z

Hmm, @gtikhonov, running your example results in a new error:
Error in lambda * array(rep(rL[[r]]$x[unLdfPi[q], ], each = nf * ns), :
non-numeric argument to binary operator

gtikhonov · 2019-11-29T16:26:37Z

@jarioksa , I am not sure whom did you address, but here are some thoughts of mine regarding this aspect. To my mind, any of the package's internal functionality on for simplifying user's life when specifying the covariate matrix (e.g. XData + XFormula way) are potentially unsafe, since a user can easily define something very different that he/she desires without noticing it. However, I also do fully recognise that it is often nuch more gandy in this way + Otso had some ideas on using this formula-empowered way for more informative postprocessing (not implemented yet).

gtikhonov · 2019-12-01T17:53:24Z

@aminorberg is the problem still actual? As far as I understood from the responses by other collaborators on this branch, for them that example code was executing without issues...

aminorberg · 2019-12-02T11:13:49Z

@gtikhonov I reinstalled Hmsc (master version), and now your example works without errors. So only the CRAN version gives the error. Thank you!

aminorberg · 2019-12-09T09:36:24Z

@gtikhonov my problems with cov-dep latents variables continued as I tried to compute predicted values:

cv_partition <- createPartition(ps, nfolds = 2, column = "sample")
cv_preds <- computePredictedValues(ps, partition = cv_partition, expected = FALSE)

Error in LambdaPostMean[k, ] : incorrect number of dimensions

Related to the alignPost = FALSE ?

jarioksa · 2019-12-09T12:06:43Z

Immediate reason for this is that in your example, internal item LambdaPostMean is a 3-dim array, and too few indices are given when it is referred: It should be referred as LambdaPostMean[k, , ] – that is, one comma more since three indices are needed.

aminorberg · 2020-01-22T17:47:36Z

@gtikhonov Is there any chance that the predict function would be updated for cov-dep latent variables some time soon?

rburner · 2020-01-27T14:07:15Z

Hi @aminorberg - I've been following this discussion and also have a few projects where I'm hoping to use the covariate-dependent association features.

Am I correct in understanding that covariate(s) must be categorical?

Fabiogeography · 2022-06-11T17:42:40Z

Hi, I'm still struggling to get the covariate-dependent functionality to work.
The code that @gtikhonov provoided on 28th Jan 2019 makes a model, but computeAssociations fails:

Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent

This seems to be because in the postList object created by poolMcmcChains in computeAssociations, the element of Lambda that corresponds to the covariate-dependent random level (rL1) has two species * species matrices (see also @jarioksa comment on 9th Dec 2019):

postList[[1]]$Lambda

[[1]]
, , 1

[,1] [,2] [,3] [,4]
[1,] 0.0262100640 0.0186711924 5.804635e-02 0.1077498535
[2,] -0.0136396407 -0.0261227143 -5.455211e-03 -0.0122868940
[3,] 0.0013262262 0.0002285267 6.567342e-03 -0.0011980602
[4,] -0.0004133883 -0.0007863265 -3.854753e-05 0.0004773631

, , 2

[,1] [,2] [,3] [,4]
[1,] -0.2373420188 -0.0828967318 -0.066711818 -0.1553379441
[2,] -0.0478642459 0.0118648434 -0.032209240 -0.0578322325
[3,] 0.0001250210 0.0081198502 -0.003553992 0.0140523202
[4,] -0.0002287827 -0.0001740797 0.001141877 0.0001855759

[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.20153787 -0.10386425 -0.08746174 -0.04134730
[2,] 0.04805722 0.01782121 0.03143461 0.02981933

I can update computeAssociations to work around this, but I'm not certain how to use these two matrices to understand how the species associations are affected by the environmental variable. Am I right in thinking that the first matrix is the species associations independent of the environmental variable and the second matrix is the species associations dependent on the selected environmental variable?

Presumably this is explained in the Matlab code for the 2017 paper, but that's not accessible to me.

convertToCodaObject and the predict functions etc... also fails for the same reason I think. Again, I can fix for my data, but would need to understand what the two Lambda matrices are.

Thanks for any light you're able to shed!

gtikhonov added the bug Something isn't working label Nov 28, 2019

gtikhonov self-assigned this Nov 28, 2019

gtikhonov mentioned this issue Nov 28, 2019

constructGradient error and conditional estimates of species associations #29

Open

jarioksa mentioned this issue Mar 12, 2020

Error during conditional cross-validation in a NNGP model #40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Covariate-dependent latent variables #31

Covariate-dependent latent variables #31

aminorberg commented Nov 28, 2019

gtikhonov commented Nov 28, 2019

aminorberg commented Nov 28, 2019 •

edited by jarioksa

Loading

gtikhonov commented Nov 28, 2019

jarioksa commented Nov 28, 2019 •

edited

Loading

gtikhonov commented Nov 28, 2019 •

edited

Loading

jarioksa commented Nov 28, 2019 •

edited

Loading

oysteiop commented Nov 28, 2019 •

edited by jarioksa

Loading

gtikhonov commented Nov 28, 2019

oysteiop commented Nov 28, 2019

jarioksa commented Nov 29, 2019

aminorberg commented Nov 29, 2019

gtikhonov commented Nov 29, 2019

gtikhonov commented Dec 1, 2019

aminorberg commented Dec 2, 2019

aminorberg commented Dec 9, 2019 •

edited by jarioksa

Loading

jarioksa commented Dec 9, 2019 •

edited

Loading

aminorberg commented Jan 22, 2020

rburner commented Jan 27, 2020

Fabiogeography commented Jun 11, 2022 •

edited

Loading

Covariate-dependent latent variables #31

Covariate-dependent latent variables #31

Comments

aminorberg commented Nov 28, 2019

gtikhonov commented Nov 28, 2019

aminorberg commented Nov 28, 2019 • edited by jarioksa Loading

gtikhonov commented Nov 28, 2019

jarioksa commented Nov 28, 2019 • edited Loading

gtikhonov commented Nov 28, 2019 • edited Loading

jarioksa commented Nov 28, 2019 • edited Loading

oysteiop commented Nov 28, 2019 • edited by jarioksa Loading

gtikhonov commented Nov 28, 2019

oysteiop commented Nov 28, 2019

jarioksa commented Nov 29, 2019

aminorberg commented Nov 29, 2019

gtikhonov commented Nov 29, 2019

gtikhonov commented Dec 1, 2019

aminorberg commented Dec 2, 2019

aminorberg commented Dec 9, 2019 • edited by jarioksa Loading

jarioksa commented Dec 9, 2019 • edited Loading

aminorberg commented Jan 22, 2020

rburner commented Jan 27, 2020

Fabiogeography commented Jun 11, 2022 • edited Loading

aminorberg commented Nov 28, 2019 •

edited by jarioksa

Loading

jarioksa commented Nov 28, 2019 •

edited

Loading

gtikhonov commented Nov 28, 2019 •

edited

Loading

jarioksa commented Nov 28, 2019 •

edited

Loading

oysteiop commented Nov 28, 2019 •

edited by jarioksa

Loading

aminorberg commented Dec 9, 2019 •

edited by jarioksa

Loading

jarioksa commented Dec 9, 2019 •

edited

Loading

Fabiogeography commented Jun 11, 2022 •

edited

Loading