Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.linalg.lstsq gives wrong result on GPU #88101

Closed
fuzzyswan opened this issue Oct 31, 2022 · 4 comments
Closed

torch.linalg.lstsq gives wrong result on GPU #88101

fuzzyswan opened this issue Oct 31, 2022 · 4 comments
Labels
module: correctness (silent) issue that returns an incorrect result silently module: cuda Related to torch.cuda, and CUDA support in general module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@fuzzyswan
Copy link

fuzzyswan commented Oct 31, 2022

馃悰 Describe the bug

torch.linalg.lstsq on cpu and gpu gives different results, and the gpu result is different from numpy.

import torch
import numpy as np

a = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
b = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
c = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]).cuda()
d = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]).cuda()
e = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]).detach().numpy()
f = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]).detach().numpy()

x = torch.linalg.lstsq(a, b)[0]
y = torch.linalg.lstsq(c, d)[0]
z = np.linalg.lstsq(e, f)[0]

print(x)
print(y)
print(z)

Results:

tensor([[ 0.8333,  0.3333, -0.1667],
        [ 0.3333,  0.3333,  0.3333],
        [-0.1667,  0.3333,  0.8333]])
tensor([[ 0.5814,  0.7631, -0.0000],
        [ 0.8372, -0.5261,  0.0000],
        [-0.4186,  0.7631,  1.0000]], device='cuda:0')
[[ 0.8333333   0.33333334 -0.16666667]
 [ 0.33333334  0.33333334  0.33333334]
 [-0.16666667  0.33333334  0.8333333 ]]

Versions

Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.22.6
Libc version: glibc-2.26

Python version: 3.7.15 (default, Oct 12 2022, 19:14:55) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 11.2.152
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.1+cu113
[pip3] torchaudio==0.12.1+cu113
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.13.1
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect

cc @ngimel @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano

@ezyang ezyang added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: correctness (silent) issue that returns an incorrect result silently labels Oct 31, 2022
@ezyang
Copy link
Contributor

ezyang commented Oct 31, 2022

your test matrix is not well conditioned

>>> import torch
>>> a = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
>>> torch.linalg.svd(a)
torch.return_types.linalg_svd(
U=tensor([[-0.2148,  0.8872,  0.4082],
        [-0.5206,  0.2496, -0.8165],
        [-0.8263, -0.3879,  0.4082]]),
S=tensor([1.6848e+01, 1.0684e+00, 1.0313e-07]),
Vh=tensor([[-0.4797, -0.5724, -0.6651],
        [-0.7767, -0.0757,  0.6253],
        [-0.4082,  0.8165, -0.4082]]))

@ezyang ezyang closed this as completed Oct 31, 2022
@ezyang ezyang closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2022
@IvanYashchuk
Copy link
Collaborator

torch.linalg.lstsq has different default algorithms on CPU and CUDA. CPU behavior matches NumPy because the same algorithm is used.

@lezcano
Copy link
Collaborator

lezcano commented Nov 1, 2022

To elaborate on @ezyang's point, as it says in the docs, the CUDA implementation just supports matrices that are full-rank.

@fuzzyswan
Copy link
Author

Thank you! I have just re-checked the docs and it is clear on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: correctness (silent) issue that returns an incorrect result silently module: cuda Related to torch.cuda, and CUDA support in general module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants