Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with convergence of hierarchical models. #281

Closed
urchgene opened this issue May 7, 2019 · 5 comments

Comments

@urchgene
Copy link

@urchgene urchgene commented May 7, 2019

Hi @goldingn,

I saw that you had a discussion about this exact issue in here [https://github.com//issues/107] but I am still having same issues.

First of all I am modeling a cov matrix using:

Omega1 = diag(sig_vec1) %*% LKJ(n) %*% diag(sig_vec1)
Omega2 = diag(sig_vec2) %*% LKJ(n) %*% diag(sig_vec2)

Then I found your post trying to avoid cholesky for a mvNormal using this cov matrix by doing:

Omega_U <- greta:::as.greta_array(Omega$node$representations$cholesky_factor)
U <- sweep(Omega_U, 2, sigma, '*')

I use Omega1 to draw samples from mvNormal as a conditional for mu as:
mu = XB + z %*% Omega1 where z = N(0,1).

To complicate mine, i have different mu for my mvNormal so I tried to write a function to get the density of observations given Omega using the re-parameterization trick:

logdet <- function (cholx){
ll <- 2 * sum(log(diag(cholx)))
ll
}


### Sigma here should be chol(Sigma)
dmvn <- function(x, mu, Sigma){
  k <- nrow(Sigma)
  Omega <- chol2inv(Sigma)
  ss <- x - mu
  z <- rowSums({ss %*% Omega} * ss)
  #zz <- c()
  #for (i in 1:nrow(x)){ zz[i] <- sum(z[i,])}
  dens <- -0.5 * (k * log(2 * pi) + logdet(Sigma)) - (0.5 * z)
  return(dens)
}

Finally, I try to pass this to greta distribution as:

# get Likelihood and logp from Greta

distribution(Y) <- dmvn(Y, mu, Omega2)

But I get the error message:
Error: cannot convert objects with missing or infinite values to greta_arrays.

I know for sure I am doing many things wrong here since the parameters cannot converge!

My questions will be:

  1. How can I recover the cholesky_factor for the LKJ in greta because If I follow what is above I have an error saying the greta_array is an atomic vector.

  2. Some of the parameters in Omega2 converge but NONE converged for Omega1 if I do
    distribution(Y) <- multivariate_normal(mu, Rcov, n_realisations=nrow(mu)) instead of using dmvn function I have . This is the reason I am trying to use the function to avoid the problem.

Your guidance and answers will be very valuable.
Thanks.

@goldingn

This comment has been minimized.

Copy link
Member

@goldingn goldingn commented May 8, 2019

The error is unsurprising, as the object on the right hand side of distribution<- needs to be a greta array with a probability distribution, not an R function. So you either need to use greta's existing probability distribution constructors (like multivariate_normal()) or write your own distribution R6 class, and function to create a variable greta array following that distribution. You can see examples of how these are written here and here, but this is not intended for users, is not documented, requires writing tensorflow code, and is not at all easy to do. I'm not sure you need to do that at all though.

If you observe the variable (Y) that follows a multivariate normal distribution, then there is no need to a decentred parameterisation. That parameterisation is to avoid awkward shapes in the posterior that arise from both the variable and its covariance being unknown. If you know one of them, then it isn't a problem.

There is still a computational benefit (and reduction in numerical issues) to avoiding an extra Cholesky decomposition though, so I'm assuming you want to work with the Cholesky factor for that reason. If not (and there's no point playing with Cholesky factors unless Omega2 is very large), then I would suggest you just use distribution(Y) <- multivariate_normal(mu, Omega2).

If you really do need to skip the additional Cholesky factorisation that multivariate_normal() would do in that case, then you can adapt the following code. The trick is to get the Cholesky factor for the correlation matrix, do the operations to yield the Cholesky factor for the covariance matrix, and then convert the Cholesky factor for the correlation matrix back to the symmetric matrix, so it can be passed into multivariate_normal(). If the covariance matrix passed into multivariate_normal() has a Cholesky factor representation (and chol_to_symmetric makes sure it does), then it uses that in exactly the way you describe above, avoiding the need to do another Cholesky factorisation.

# get some internal greta functions
as.greta_array <- .internals$greta_arrays$as.greta_array
get_node <- .internals$greta_arrays$get_node
chol_to_symmetric <- .internals$utils$greta_array_operations$chol_to_symmetric

# fake data
n_dim <- 5
n_obs <- 10
Y <- matrix(rnorm(n_obs * n_dim), n_obs, n_dim)

# parameters
sds <- normal(0, 1, dim = n_dim, truncation = c(0, Inf))
correl <- lkj_correlation(dimension = n_dim, eta = 1)
mu <- zeros(n_obs, n_dim)

# hack to find the cholesky factor representation
correl_U <- as.greta_array(get_node(correl)$representations$cholesky)

# convert to covariance on the cholesky scale
Sigma_U <- sweep(correl_U, 2, sds, '*')

# convert back from cholesky representation to a symmetric matrix
Sigma <- chol_to_symmetric(Sigma_U)

# use in likelihood
distribution(Y) <- multivariate_normal(mu, Sigma)

This obviously uses some internal functions. Ideally we could use the representations approach to hide this computational trick from the user. That would probably require us to add quad_form() and quad_form_diag() functions to greta, like those in Stan. That way we'll be able to detect that we can propagate Cholesky factors here (whereas with %*%, we wouldn't know the user's intent).

@urchgene

This comment has been minimized.

Copy link
Author

@urchgene urchgene commented May 8, 2019

Thanks a lot @goldingn ... This is very helpful!!

@urchgene

This comment has been minimized.

Copy link
Author

@urchgene urchgene commented May 8, 2019

Hi @goldingn, one more question...

How do I obtain the inverse of this Sigma_U matrix

# convert to covariance on the cholesky scale
Sigma_U <- sweep(correl_U, 2, sds, '*')

When I try chol2inv(), I get an error downstream that it cannot be converted to greta_array cos of infinite values.

I am trying to use it in an operation to simulate from MvNormal using the precision matrix where:

Sigma_A = kronecker( inv(Sigma_U), Ainv)

Sigma_A is a very large dense matrix and I do not want to use the multivariate_normal () for it.

Thanks.

@goldingn

This comment has been minimized.

Copy link
Member

@goldingn goldingn commented May 8, 2019

That sounds like a separate issue. Could you please open an new issue and provide a reproducible example so I can recreate the error.

@urchgene

This comment has been minimized.

Copy link
Author

@urchgene urchgene commented May 9, 2019

Hi @goldingn,

I tested and I think it's not a general issue.

I solved it by using solve(chol_to_symmetric(Sigma_U))

For a fake data example I tested chol2inv(sweep(correl_U, 2, sds, '*')) works but doesn't for my real data so I won't regard it as an issue.

Thanks again.

@urchgene urchgene closed this May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.