Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covariate-dependent latent variables #31

Open
aminorberg opened this issue Nov 28, 2019 · 19 comments
Open

Covariate-dependent latent variables #31

aminorberg opened this issue Nov 28, 2019 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@aminorberg
Copy link

Would any of the developers happen to have a full working example of the usage of the covariate-dependent latent variables?

@gtikhonov
Copy link
Member

Hei @aminorberg
The precise answer to your question very much depends on what you refer to as a full working example. As far as I am aware of, the functionality from the original methodological paper is properly implemented and anything not working with of that shall be treated as bug.
On the other hand, there are currently a lot of post processing utilities in the package, which bring the raw results of the MCMC model fitting closer to ecological interpretation. These are not guaranteed to work fine with variable latent loadings, both due to

  1. the fact that often it is improper to interprete them in similar manner as the constant latent loadings and the underlying math is yet to be designed,
  2. in some of those cases, when the interpretation is transferrable, the current implementation is deficient due to limited robustness caused by iterative development process.

In personal communication, @ovaskain claimed that they encountered multiple challenges with this class of models, both in terms of sensitivity of prior assumptions and/or sophisticated exploration of the posterior typical set using available MCMC algorithm.

@aminorberg
Copy link
Author

aminorberg commented Nov 28, 2019

Thanks @gtikhonov!

By working example I mean e.g. an example using the data provided with the package, since I can't even get that working, and I think it's just that I don't know the syntax. This is what I tried:

studyDesign <- data.frame(sample = as.factor(1:50), 
                          plot = as.factor(sample(1:20, 50, replace = TRUE)))
rL2 <- HmscRandomLevel(units = TD$studyDesign$plot)
rL1 <- HmscRandomLevel(xData = data.frame(x1 = rep(1, length(TD$X$x1)), 
                                          x2 = TD$X$x2))

m <- Hmsc(Y = TD$Y, XData = TD$X, 
          XFormula = ~x1+x2, 
          studyDesign = studyDesign, 
          ranLevels = list("sample" = rL1, "plot" = rL2))

ps <- sampleMcmc(m, samples = 1000)

And when attempting to sampleMcmc, I get an error:

Error in (Eta[[r]][Pi[, r], ] * rL[[r]]$x[as.character(dfPi[, r]), k]) %*%  : 
  non-conformable arguments
In addition: Warning message:
In Ops.factor(Eta[[r]][Pi[, r], ], rL[[r]]$x[as.character(dfPi[,  :*not meaningful for factors

@gtikhonov
Copy link
Member

Looks like a bug. Will have a close look at this.

@gtikhonov gtikhonov added the bug Something isn't working label Nov 28, 2019
@gtikhonov gtikhonov self-assigned this Nov 28, 2019
@jarioksa
Copy link
Collaborator

jarioksa commented Nov 28, 2019

Looks like this comes from updateZ in loop of lines 22 to 30, and exactly there in line 28:

   for(r in seq_len(nr)){
      if(rL[[r]]$xDim == 0){
         ...
      } else{
         LRan[[r]] = matrix(0,ny,ns)
         for(k in 1:rL[[r]]$xDim)
            LRan[[r]] = LRan[[r]] + (Eta[[r]][Pi[,r],]*rL[[r]]$x[as.character(dfPi[,r]),k]) %*% Lambda[[r]][,,k]
      }
   }

and basically the error is caused by this statement with innermost loop at k == 2

 rL[[r]]$x[as.character(dfPi[,r]),k]
 [1] o o o o o o o o o o o o o o o o o o o o o o o o o c c c c c c c c c c c c c
[39] c c c c c c c c c c c c
Levels: c o

which is a factor that is not meaningful in multiplication (like the warning said), and when used within

Eta[[r]][Pi[, r], ] * rL[[r]]$x[as.character(dfPi[,r]),k]

The result drops dimensions (50,2) and returns a vector of 100 NA values and this fails in the following matrix multiplication (%*%) with the error of non-conformable arguments. So it is about handling factor variables.

@gtikhonov
Copy link
Member

gtikhonov commented Nov 28, 2019

Thanks, @jarioksa, I was going to mark the same thing. We have not yet implemented the XFormula type of data specification for this feature. However, there is one more issue - namely in the alignPosterior() function, which is be default called after the MCMC is done. I have not fixed it yet, but the following code is a fully operating example of model fitting for covariate-dependent associations:

library(Hmsc)
studyDesign <- data.frame(sample = as.factor(1:50),
                          plot = as.factor(sample(1:20, 50, replace = TRUE)))
rL2 <- HmscRandomLevel(units = TD$studyDesign$plot)
xData = data.frame(x1=rep(1, length(TD$X$x1)), x2=as.numeric(TD$X$x2=="c"))
rL1 <- HmscRandomLevel(xData = xData)
m <- Hmsc(Y = TD$Y,XData = TD$X,
          XFormula = ~x1+x2,
          studyDesign = studyDesign,
          ranLevels = list("sample" = rL1, "plot" = rL2))
ps <- sampleMcmc(m, samples = 1000, alignPost=FALSE)

@jarioksa
Copy link
Collaborator

jarioksa commented Nov 28, 2019

@gtikhonov : It works in this case where x2 is a two-level factor which can be turned into one contrast variable. A more general solution is:

model.matrix(reformulate(names(TD$X[, -1])), TD$X)

model.matrix generates an (Intercept) which already is in TD$X and one of the replicated intercepts can be removed – this is one way, another way is to just drop the first column of the resulting model matrix. This solution will work when there is any number of factors with any number of levels. The generation of the model matrix could be incorporated in updateZ to allow flexible use of factor variables.

@oysteiop
Copy link
Collaborator

oysteiop commented Nov 28, 2019

Hi, I have also been exploring this recently. Despite the minor bugs the models seem to fit. Regarding how to compute the Lambdas per site as a function of the covariates, I have tried (continuing from Gleb's working example above):

#Covariates for plot-level latent variable
xmat = as.matrix(m$rL[[1]]$x)
dim(xmat)

#Residual correlations given observed covariates
OmegaCor = getPostEstimate(ps, "OmegaCor", r=1, x=xmat)$mean
OmegaCor

#Residual correlations given observed covariates for first site
OmegaCor = getPostEstimate(ps, "OmegaCor", r=1, x=xmat[1,])$mean
OmegaCor

@gtikhonov
Copy link
Member

@oysteiop, this does not look like a valid approach:

  1. you repeat l variable (small L) both for the loop index and accumulated sum. At least my copy of your code indicates this, maybe you meant I and l. Such repetition would not bring you any good.
  2. you shall not sum over the l1 at all, and your output should have dimension [2,4,50] ([nf,ns,ny]).

The package tensorA features a function that can replace the whole loop with single line. Additionally, the Hmsc function getPostEstimate features parameter x that controls the condition for association matrix if you set parName="Omega".

@oysteiop
Copy link
Collaborator

Thanks for spotting this typo. I have edited my comment above to you much simpler solution.

@jarioksa
Copy link
Collaborator

I think you are on a dangerous path: these tricks work when you study a case with a two-level factor where the internal numeric coding will give the correct contrast. However, this approach fails when you have a factor with three levels, where the internal numeric coding just changes the factor into an arbitrary continuous variable. Most dangerously, it will fail silently: no error messages, but wrong results. You must study the model matrix which changes a three-level factor into two contrast variables (and p-level factor into p-1 contrast variables). Solving a margin case will bring trouble and grievance later in life.

@aminorberg
Copy link
Author

Hmm, @gtikhonov, running your example results in a new error:
Error in lambda * array(rep(rL[[r]]$x[unLdfPi[q], ], each = nf * ns), :
non-numeric argument to binary operator

@gtikhonov
Copy link
Member

@jarioksa , I am not sure whom did you address, but here are some thoughts of mine regarding this aspect. To my mind, any of the package's internal functionality on for simplifying user's life when specifying the covariate matrix (e.g. XData + XFormula way) are potentially unsafe, since a user can easily define something very different that he/she desires without noticing it. However, I also do fully recognise that it is often nuch more gandy in this way + Otso had some ideas on using this formula-empowered way for more informative postprocessing (not implemented yet).

@gtikhonov
Copy link
Member

@aminorberg is the problem still actual? As far as I understood from the responses by other collaborators on this branch, for them that example code was executing without issues...

@aminorberg
Copy link
Author

@gtikhonov I reinstalled Hmsc (master version), and now your example works without errors. So only the CRAN version gives the error. Thank you!

@aminorberg
Copy link
Author

aminorberg commented Dec 9, 2019

@gtikhonov my problems with cov-dep latents variables continued as I tried to compute predicted values:

cv_partition <- createPartition(ps, nfolds = 2, column = "sample")
cv_preds <- computePredictedValues(ps, partition = cv_partition, expected = FALSE)

Error in LambdaPostMean[k, ] : incorrect number of dimensions

Related to the alignPost = FALSE ?

@jarioksa
Copy link
Collaborator

jarioksa commented Dec 9, 2019

Immediate reason for this is that in your example, internal item LambdaPostMean is a 3-dim array, and too few indices are given when it is referred: It should be referred as LambdaPostMean[k, , ] – that is, one comma more since three indices are needed.

@aminorberg
Copy link
Author

@gtikhonov Is there any chance that the predict function would be updated for cov-dep latent variables some time soon?

@rburner
Copy link

rburner commented Jan 27, 2020

Hi @aminorberg - I've been following this discussion and also have a few projects where I'm hoping to use the covariate-dependent association features.

Am I correct in understanding that covariate(s) must be categorical?

@Fabiogeography
Copy link

Fabiogeography commented Jun 11, 2022

Hi, I'm still struggling to get the covariate-dependent functionality to work.
The code that @gtikhonov provoided on 28th Jan 2019 makes a model, but computeAssociations fails:

Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent

This seems to be because in the postList object created by poolMcmcChains in computeAssociations, the element of Lambda that corresponds to the covariate-dependent random level (rL1) has two species * species matrices (see also @jarioksa comment on 9th Dec 2019):

postList[[1]]$Lambda

[[1]]
, , 1

[,1] [,2] [,3] [,4]
[1,] 0.0262100640 0.0186711924 5.804635e-02 0.1077498535
[2,] -0.0136396407 -0.0261227143 -5.455211e-03 -0.0122868940
[3,] 0.0013262262 0.0002285267 6.567342e-03 -0.0011980602
[4,] -0.0004133883 -0.0007863265 -3.854753e-05 0.0004773631

, , 2

[,1] [,2] [,3] [,4]
[1,] -0.2373420188 -0.0828967318 -0.066711818 -0.1553379441
[2,] -0.0478642459 0.0118648434 -0.032209240 -0.0578322325
[3,] 0.0001250210 0.0081198502 -0.003553992 0.0140523202
[4,] -0.0002287827 -0.0001740797 0.001141877 0.0001855759

[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.20153787 -0.10386425 -0.08746174 -0.04134730
[2,] 0.04805722 0.01782121 0.03143461 0.02981933

I can update computeAssociations to work around this, but I'm not certain how to use these two matrices to understand how the species associations are affected by the environmental variable. Am I right in thinking that the first matrix is the species associations independent of the environmental variable and the second matrix is the species associations dependent on the selected environmental variable?

Presumably this is explained in the Matlab code for the 2017 paper, but that's not accessible to me.

convertToCodaObject and the predict functions etc... also fails for the same reason I think. Again, I can fix for my data, but would need to understand what the two Lambda matrices are.

Thanks for any light you're able to shed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants