RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U. #31248

LiUzHiAn · 2019-12-13T11:19:40Z

🐛 Bug when using `torch.distributions.kl_divergence(p, q)`

Hi, I always get this RuntimeError during my training process:

File "train.py", line 121, in train
    loss_kl = kl_loss(mu, logvar, VAE_mu, VAE_logvar)
  File "/home/jessica/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jessica/anomaly_detection/ours/loss/losses.py", line 70, in forward
    p = torch.distributions.MultivariateNormal(p_mu, p_var)
  File "/home/jessica/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/distributions/multivariate_normal.py", line 149, in __init__
    self._unbroadcasted_scale_tril = torch.cholesky(covariance_matrix)
RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U.

Reproduce

There is a KL-loss term in my loss function, and I assume the two distributions are multivariate normal distributions, so I calculated it as follows:

class KL_Loss(nn.Module):
	def __init__(self):
		super(KL_Loss, self).__init__()

	def forward(self, p_mu, p_log_var, q_mu, q_log_var):
		# [batch_size,d]  d is the dimension of my multivariate gaussian distribution
		assert q_mu.size() == p_mu.size()
		assert q_log_var.size() == p_log_var.size()

		# suppose the Neural Network esitimate log_var 
		q_var = torch.diag_embed(torch.exp(q_log_var))
		p_var = torch.diag_embed(torch.exp(p_log_var))
		
		p = torch.distributions.MultivariateNormal(p_mu, p_var)
		q = torch.distributions.MultivariateNormal(q_mu, q_var)
		kl_loss = torch.distributions.kl_divergence(p, q).mean()

		return kl_loss

Environment

Here is my package version:

PyTorch 1.3.1
Ubuntu 18.04
CUDA 9.2.148 with cuDNN 7.6.3_0

Additional

I find my KL-loss falls in the range from 1e-6 to 1e-5.

Any ideas to solve this problem? thx

The text was updated successfully, but these errors were encountered:

nikitaved · 2019-12-13T19:33:14Z

Sorry, log_var is just an element-wise log of the covariance matrix? Ah, got it. Could you please share your reproduction script? Ideally with the fixed seed, if the parameters for the Gaussians are randomly sampled... Also, why do you add mean in the end? KL-divergence is already a scalar...

nikitaved · 2019-12-13T19:50:36Z

I guess Cholesky is used to circumvent the computation of the inverse covariance matrix in the quadratic form... But your covariance matrix is diagonal, so maybe it is possible to use this information before constructing the probability distribution object. @LiUzHiAn, do you know whether it is possible to do so, I mean just to provide a diagonal matrix to the MultivariateNormal constructor? Also, when the code fails, is it true that q_log_var and p_log_var have huge negative values?

LiUzHiAn · 2019-12-14T03:51:47Z

Hi, @nikitaved

In my case, I assume the p and q are both Multivariate Gaussians as I mentioned above. Further, I assume each dimension is independent of others. So, each row in log_var is the variances (i.e. [log( (sigma_1)^2), ... ,log( (sigma_N)^2)] ). Things are similar with `mu`` and other params.

Then, with the mu and sigma**2, I can construct batch Multivariate Gaussians using the torch.distributions.MultivariateNormal(). As the Doc illustrating, the second param should be covariance matrix, and that's why I use torch.diag_embed(torch.exp(q_log_var)) in advance.

The reason I use mean() in my KL loss is that the p and q here are batch distributions, with each row correspondingly.

nikitaved · 2019-12-14T13:18:02Z

Ok, did not notice the batch dimension. Anyway, clearly your covariance matrix becomes singular, and MultivariateNormal assumes positive-definite (full-rank) matrix. What about trying LowRankMultivariteNormal instead? https://pytorch.org/docs/stable/distributions.html?highlight=multivariatenormal#torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal

Cholesky decomposition assumes that the matrix is positive-definite, and hence full-rank. This is not your case, and also the covariance matrix in your case has a simple structure, so no need to do any decompositions, you could find the (pseudo)inverse right away!

So, I do not think it is a bug. Let me know whether my suggestion works, and if so, we could close the issue.

LiUzHiAn · 2019-12-14T13:37:13Z

I've written some test code and it seems that the batch dimension is supported.

Ok, let's leave this issue here. Thank you so much

nikitaved · 2019-12-14T14:06:28Z

No problem, just let me know whether LowRankMultivariateNormal solves your issue.

rahulvigneswaran · 2021-04-06T20:49:41Z

Ok, did not notice the batch dimension. Anyway, clearly your covariance matrix becomes singular, and MultivariateNormal assumes positive-definite (full-rank) matrix. What about trying LowRankMultivariteNormal instead? https://pytorch.org/docs/stable/distributions.html?highlight=multivariatenormal#torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal

Cholesky decomposition assumes that the matrix is positive-definite, and hence full-rank. This is not your case, and also the covariance matrix in your case has a simple structure, so no need to do any decompositions, you could find the (pseudo)inverse right away!

So, I do not think it is a bug. Let me know whether my suggestion works, and if so, we could close the issue.

Also @nikitaved ,

“In practice it may be necessary to add a small multiple of the identity matrix I to the covariance matrix for numerical reasons. This is because the eigenvalues of the matrix K0 can decay very rapidly and without this stabilization the Cholesky decomposition fails. The effect on the generated samples is to add additional independent noise of variance . From the context can usually be chosen to have inconsequential effects on the samples, while ensuring numerical stability.” (A.2 Gaussian Identities).

Blog from which I found this - https://juanitorduz.github.io/multivariate_normal/

nikitaved closed this as completed Dec 14, 2019

genekogan mentioned this issue Dec 19, 2019

Fork implementing multi-region spatial control ProGamerGov/neural-style-pt#46

Open

ChawDoe mentioned this issue Jul 3, 2021

About The KL Loss Between Two Multivariate Normal Distribution LiUzHiAn/VAE-Pytorch#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U. #31248

RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U. #31248

LiUzHiAn commented Dec 13, 2019

nikitaved commented Dec 13, 2019 •

edited

nikitaved commented Dec 13, 2019 •

edited

LiUzHiAn commented Dec 14, 2019 •

edited

nikitaved commented Dec 14, 2019 •

edited

LiUzHiAn commented Dec 14, 2019

nikitaved commented Dec 14, 2019

rahulvigneswaran commented Apr 6, 2021

RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U. #31248

RuntimeError: cholesky_cuda: For batch 0: U(6,6) is zero, singular U. #31248

Comments

LiUzHiAn commented Dec 13, 2019

🐛 Bug when using torch.distributions.kl_divergence(p, q)

Reproduce

Environment

Additional

nikitaved commented Dec 13, 2019 • edited

nikitaved commented Dec 13, 2019 • edited

LiUzHiAn commented Dec 14, 2019 • edited

nikitaved commented Dec 14, 2019 • edited

LiUzHiAn commented Dec 14, 2019

nikitaved commented Dec 14, 2019

rahulvigneswaran commented Apr 6, 2021

🐛 Bug when using `torch.distributions.kl_divergence(p, q)`

nikitaved commented Dec 13, 2019 •

edited

nikitaved commented Dec 13, 2019 •

edited

LiUzHiAn commented Dec 14, 2019 •

edited

nikitaved commented Dec 14, 2019 •

edited