Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug positive definite constraints #68720

Conversation

nonconvexopt
Copy link
Contributor

@nonconvexopt nonconvexopt commented Nov 22, 2021

While implementing #68644,
during the testing of 'torch.distributions.constraint.positive_definite', I found an error in the code: location

class _PositiveDefinite(Constraint):
    """
    Constrain to positive-definite matrices.
    """
    event_dim = 2

    def check(self, value):
        # Assumes that the matrix or batch of matrices in value are symmetric
        # info == 0 means no error, that is, it's SPD
        return torch.linalg.cholesky_ex(value).info.eq(0).unsqueeze(0)

The error is caused when I check the positive definiteness of
torch.cuda.DoubleTensor([[2., 0], [2., 2]])
But it did not made a problem for
torch.DoubleTensor([[2., 0], [2., 2]])

You may easily reproduce the error by following code:

Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> const = torch.distributions.constraints.positive_definite
>>> const.check(torch.cuda.DoubleTensor([[2., 0], [2., 2]]))
tensor([False], device='cuda:0')
>>> const.check(torch.DoubleTensor([[2., 0], [2., 2]]))
tensor([True])

The cause of error can be analyzed more if you give 'check_errors = True' as a additional argument for 'torch.linalg.cholesky_ex'.
It seem that it is caused by the recent changes in 'torch.linalg'.
And, I suggest to modify the '_PositiveDefinite' class by using 'torch.linalg.eig' function like the below:

class _PositiveDefinite(Constraint):
    """
    Constrain to positive-definite matrices.
    """
    event_dim = 2

    def check(self, value):
        return (torch.linalg.eig(value)[0].real > 0).all(dim=-1)

By using above implementation, I get following result:

Python 3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> const = torch.distributions.constraints.positive_definite
>>> const.check(torch.cuda.DoubleTensor([[2., 0.], [2., 2.]]))
tensor(True, device='cuda:0')
>>> const.check(torch.DoubleTensor([[2., 0.], [2., 2.]]))
tensor(True)

FYI, I do not know what algorithm is used in 'torch.linalg.eig' and 'torch.linalg.cholesky_ex'. As far as I know, they have same time complexity generally, O(n^3). It seems that in case you used special algorithms or finer parallelization, time complexity of Cholesky decomposition may be reduced to approximately O(n^2.5). If there is a reason 'torch.distributions.constraints.positive_definite' used 'torch.linalg.cholesky_ex' rather than 'torch.linalg.eig' previously, I hope to know.

@pytorch-probot
Copy link

pytorch-probot bot commented Nov 22, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/nonconvexopt/pytorch/blob/67b3448108f974213eabbcad9268dcf7e83aa862/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-bionic-cuda11.5-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Nov 22, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 67b3448 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 22, 2021
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this code assumes that the matrix you're working with is symmetric. Note that the matrix you tried this with is not symmetric. See the definition of [definite matrix]. See also the comment that in the check function. When given a non-symmetric matrix, cholesky copies the lower half of the matrix onto the upper half. In your case, you'd get the matrix [[2, 2], [2, 2]]. This matrix is symmetric positive semidefinite. Due to numerical errors, CPU correctly classifies this as non-SPD and CUDA fails and returns that it's SPD.

Regardless of this, about your solution.

  1. It's not correct, as a matrix with complex singular values may be considered "positive definite".
  2. To get the eigenvalues, consider using in the future linalg.eigvals.

@nonconvexopt
Copy link
Contributor Author

nonconvexopt commented Nov 23, 2021

So, this code assumes that the matrix you're working with is symmetric. Note that the matrix you tried this with is not symmetric. See the definition of [definite matrix]. See also the comment that in the check function. When given a non-symmetric matrix, cholesky copies the lower half of the matrix onto the upper half. In your case, you'd get the matrix [[2, 2], [2, 2]]. This matrix is symmetric positive semidefinite. Due to numerical errors, CPU correctly classifies this as non-SPD and CUDA fails and returns that it's SPD.

Regardless of this, about your solution.

  1. It's not correct, as a matrix with complex singular values may be considered "positive definite".
  2. To get the eigenvalues, consider using in the future linalg.eigvals.

Thank you for detailed explanation! I will check linalg.eigvals. I forgotted the point about the symmetricity of the tensor.

In fact, I have a suggestion in implementation of constraints for matrices: inheritable_torch.distributions.constraints_for_matrices
@lezcano How about this design? It is different from previous implementation styles in 'torch.distributions.constraints' and it requires more computation. But I think it is more logical way.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. Now, I do not maintain this part of PyTorch, so I would not be able to accept such a change in good faith, someone else would need to review it (cc @jbschlosser to find someone that could?).

I have proposed below how to implement that approach.

torch/distributions/constraints.py Outdated Show resolved Hide resolved
Co-authored-by: Lezcano <Lezcano@users.noreply.github.com>
@nonconvexopt
Copy link
Contributor Author

I think that makes sense. Now, I do not maintain this part of PyTorch, so I would not be able to accept such a change in good faith, someone else would need to review it (cc @jbschlosser to find someone that could?).

I have proposed below how to implement that approach.

Thanks for your feedback. I will open another seperated PR for inheritable_torch.distributions.constraints_for_matrices if it is more clear.

@mruberry
Copy link
Collaborator

I think @fritzo is our distributions maintainer?

@fritzo
Copy link
Collaborator

fritzo commented Nov 23, 2021

Hi @mruberry I'm pretty busy over the next month, any chance @neerajprad has time to review these PRs?

@neerajprad
Copy link
Contributor

Hi @mruberry I'm pretty busy over the next month, any chance @neerajprad has time to review these PRs?

Thanks for tagging, I missed this. I'll take a look.

@neerajprad
Copy link
Contributor

Thanks for your feedback. I will open another seperated PR for inheritable_torch.distributions.constraints_for_matrices if it is more clear.

There are definitely constraints that assume that the matrix corresponding to the event dims is a square matrix so it is good to explicitly check for that. Feel free to update this PR with your proposed change.

Copy link
Contributor

@neerajprad neerajprad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! I just had one comment about the test. Thanks for contributing to pytorch. It will be good to also get a go-ahead from @lezcano.

test/distributions/test_constraints.py Show resolved Hide resolved
Copy link
Contributor

@neerajprad neerajprad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had one small comment. Looks good to me if unit tests pass!

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a small comment. Besidesd that LGTM

pytest.param(True, marks=pytest.mark.skipif(not TEST_CUDA,
reason='CUDA not found.'))])
def test_constraint(constraint_fn, result, value, is_cuda):
t = torch.cuda.DoubleTensor if is_cuda else torch.DoubleTensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that these constructors are deprecated. The correct way of writing this code would be to pass a device to this function and to do torch.tensor(data, dtype=torch.double, device=device).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to correct it at a new PR since we have to modify all the functions in test/distributions/test_constraints.py. Thanks for your feedback.

@facebook-github-bot
Copy link
Contributor

@neerajprad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@nonconvexopt nonconvexopt deleted the debug_positive_definite_constraints branch December 4, 2021 13:18
facebook-github-bot pushed a commit that referenced this pull request Dec 9, 2021
…puts. (#69069)

Summary:
While implementing #68720,
We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review))
`torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged.
neerajprad

Pull Request resolved: #69069

Reviewed By: mrshenli

Differential Revision: D32979362

Pulled By: neerajprad

fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44
PaliC added a commit that referenced this pull request Dec 10, 2021
…puts. (#69069)

Summary:
While implementing #68720,
We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review))
`torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged.
neerajprad

Reviewed By: mrshenli

Differential Revision: D32979362

Pulled By: neerajprad

fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44

[ghstack-poisoned]
PaliC added a commit that referenced this pull request Dec 10, 2021
…puts. (#69069)

Summary:
While implementing #68720,
We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review))
`torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged.
neerajprad

Reviewed By: mrshenli

Differential Revision: D32979362

Pulled By: neerajprad

fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44
desertfire pushed a commit that referenced this pull request Dec 13, 2021
…puts. (#69069)

Summary:
While implementing #68720,
We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review))
`torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged.
neerajprad

Pull Request resolved: #69069

Reviewed By: mrshenli

Differential Revision: D32979362

Pulled By: neerajprad

fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44
desertfire pushed a commit that referenced this pull request Dec 14, 2021
…puts. (#69069)

Summary:
While implementing #68720,
We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review))
`torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged.
neerajprad

Pull Request resolved: #69069

Reviewed By: mrshenli

Differential Revision: D32979362

Pulled By: neerajprad

fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44
facebook-github-bot pushed a commit that referenced this pull request Dec 21, 2021
Summary:
Fixes #68050

TODO:
- [x] Unit Test
- [x] Documentation
- [x] Change constraint of matrix variables with 'torch.distributions.constraints.symmetric' if it is reviewed and merged. #68720

Pull Request resolved: #68588

Reviewed By: bdhirsh

Differential Revision: D33246843

Pulled By: neerajprad

fbshipit-source-id: 825fcddf478555235e7a66de0c18368c41e935cd
facebook-github-bot pushed a commit that referenced this pull request Dec 30, 2021
Summary:
Implement #68050
Reopened merged and reverted PR #68588 worked with neerajprad
cc neerajprad

Sorry for the confusion.

TODO:

- [x] Unit Test
- [x] Documentation
- [x] Change constraint of matrix variables with 'torch.distributions.constraints.symmetric' if it is reviewed and merged. Debug positive definite constraints #68720

Pull Request resolved: #70377

Reviewed By: mikaylagawarecki

Differential Revision: D33355132

Pulled By: neerajprad

fbshipit-source-id: e968c0d9a3061fb2855564b96074235e46a57b6c
wconstab pushed a commit that referenced this pull request Jan 5, 2022
Summary:
Implement #68050
Reopened merged and reverted PR #68588 worked with neerajprad
cc neerajprad

Sorry for the confusion.

TODO:

- [x] Unit Test
- [x] Documentation
- [x] Change constraint of matrix variables with 'torch.distributions.constraints.symmetric' if it is reviewed and merged. Debug positive definite constraints #68720

Pull Request resolved: #70377

Reviewed By: mikaylagawarecki

Differential Revision: D33355132

Pulled By: neerajprad

fbshipit-source-id: e968c0d9a3061fb2855564b96074235e46a57b6c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants