Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U. #31248

Closed
LiUzHiAn opened this issue Dec 13, 2019 · 7 comments
Closed

RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U. #31248

LiUzHiAn opened this issue Dec 13, 2019 · 7 comments

Comments

@LiUzHiAn
Copy link

🐛 Bug when using torch.distributions.kl_divergence(p, q)

Hi, I always get this RuntimeError during my training process:

File "train.py", line 121, in train
    loss_kl = kl_loss(mu, logvar, VAE_mu, VAE_logvar)
  File "/home/jessica/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jessica/anomaly_detection/ours/loss/losses.py", line 70, in forward
    p = torch.distributions.MultivariateNormal(p_mu, p_var)
  File "/home/jessica/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/distributions/multivariate_normal.py", line 149, in __init__
    self._unbroadcasted_scale_tril = torch.cholesky(covariance_matrix)
RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U.

Reproduce

There is a KL-loss term in my loss function, and I assume the two distributions are multivariate normal distributions, so I calculated it as follows:

class KL_Loss(nn.Module):
	def __init__(self):
		super(KL_Loss, self).__init__()

	def forward(self, p_mu, p_log_var, q_mu, q_log_var):
		# [batch_size,d]  d is the dimension of my multivariate gaussian distribution
		assert q_mu.size() == p_mu.size()
		assert q_log_var.size() == p_log_var.size()

		# suppose the Neural Network esitimate log_var 
		q_var = torch.diag_embed(torch.exp(q_log_var))
		p_var = torch.diag_embed(torch.exp(p_log_var))
		
		p = torch.distributions.MultivariateNormal(p_mu, p_var)
		q = torch.distributions.MultivariateNormal(q_mu, q_var)
		kl_loss = torch.distributions.kl_divergence(p, q).mean()

		return kl_loss

Environment

Here is my package version:

  • PyTorch 1.3.1
  • Ubuntu 18.04
  • CUDA 9.2.148 with cuDNN 7.6.3_0

Additional

I find my KL-loss falls in the range from 1e-6 to 1e-5.

Any ideas to solve this problem? thx

@nikitaved
Copy link
Collaborator

nikitaved commented Dec 13, 2019

Sorry, log_var is just an element-wise log of the covariance matrix? Ah, got it. Could you please share your reproduction script? Ideally with the fixed seed, if the parameters for the Gaussians are randomly sampled... Also, why do you add mean in the end? KL-divergence is already a scalar...

@nikitaved
Copy link
Collaborator

nikitaved commented Dec 13, 2019

I guess Cholesky is used to circumvent the computation of the inverse covariance matrix in the quadratic form... But your covariance matrix is diagonal, so maybe it is possible to use this information before constructing the probability distribution object. @LiUzHiAn, do you know whether it is possible to do so, I mean just to provide a diagonal matrix to the MultivariateNormal constructor? Also, when the code fails, is it true that q_log_var and p_log_var have huge negative values?

@LiUzHiAn
Copy link
Author

LiUzHiAn commented Dec 14, 2019

Hi, @nikitaved

In my case, I assume the p and q are both Multivariate Gaussians as I mentioned above. Further, I assume each dimension is independent of others. So, each row in log_var is the variances (i.e. [log( (sigma_1)^2), ... ,log( (sigma_N)^2)] ). Things are similar with `mu`` and other params.

Then, with the mu and sigma**2, I can construct batch Multivariate Gaussians using the torch.distributions.MultivariateNormal(). As the Doc illustrating, the second param should be covariance matrix, and that's why I use torch.diag_embed(torch.exp(q_log_var)) in advance.

The reason I use mean() in my KL loss is that the p and q here are batch distributions, with each row correspondingly.

@nikitaved
Copy link
Collaborator

nikitaved commented Dec 14, 2019

Ok, did not notice the batch dimension. Anyway, clearly your covariance matrix becomes singular, and MultivariateNormal assumes positive-definite (full-rank) matrix. What about trying LowRankMultivariteNormal instead? https://pytorch.org/docs/stable/distributions.html?highlight=multivariatenormal#torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal

Cholesky decomposition assumes that the matrix is positive-definite, and hence full-rank. This is not your case, and also the covariance matrix in your case has a simple structure, so no need to do any decompositions, you could find the (pseudo)inverse right away!

So, I do not think it is a bug. Let me know whether my suggestion works, and if so, we could close the issue.

@LiUzHiAn
Copy link
Author

I've written some test code and it seems that the batch dimension is supported.

Ok, let's leave this issue here. Thank you so much

@nikitaved
Copy link
Collaborator

No problem, just let me know whether LowRankMultivariateNormal solves your issue.

@rahulvigneswaran
Copy link

Ok, did not notice the batch dimension. Anyway, clearly your covariance matrix becomes singular, and MultivariateNormal assumes positive-definite (full-rank) matrix. What about trying LowRankMultivariteNormal instead? https://pytorch.org/docs/stable/distributions.html?highlight=multivariatenormal#torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal

Cholesky decomposition assumes that the matrix is positive-definite, and hence full-rank. This is not your case, and also the covariance matrix in your case has a simple structure, so no need to do any decompositions, you could find the (pseudo)inverse right away!

So, I do not think it is a bug. Let me know whether my suggestion works, and if so, we could close the issue.

Also @nikitaved ,

“In practice it may be necessary to add a small multiple of the identity matrix I to the covariance matrix for numerical reasons. This is because the eigenvalues of the matrix K0 can decay very rapidly and without this stabilization the Cholesky decomposition fails. The effect on the generated samples is to add additional independent noise of variance . From the context can usually be chosen to have inconsequential effects on the samples, while ensuring numerical stability.” (A.2 Gaussian Identities).

Blog from which I found this - https://juanitorduz.github.io/multivariate_normal/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants