`torch.linalg.eigh` fails on GPU #94772

gallego-posada · 2023-02-13T22:15:46Z

🐛 Describe the bug

Calling torch.linalg.eigh on a CUDA tensor fails, but the computation succeeds when the tensor is on the CPU.

I have experienced this issue on CUDA 11.6, 11.7 and 11.8.

This is a blocker towards executing the Shampoo optimizer.

Minimal replication script

import torch
from torchvision import datasets, transforms

SEED = 123
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

batch_size = 512
num_classes = 10
num_features = 28**2
loss_fn = torch.nn.CrossEntropyLoss()

tforms = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
dataset = datasets.MNIST("~/data/", download=False, train=True, transform=tforms)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)

fc_layer = torch.nn.Linear(in_features=num_features, out_features=num_classes, bias=False).to(DEVICE)

for batch_ix, (inputs, targets) in enumerate(train_loader):

    inputs, targets = inputs.to(DEVICE), targets.to(DEVICE)

    fc_layer.weight.grad = None
    logits = fc_layer(inputs.view(inputs.shape[0], -1))
    loss = loss_fn(logits, targets)
    loss.backward()

    vec_grad = torch.flatten(fc_layer.weight.grad)
    precond_adagrad = torch.outer(vec_grad, vec_grad)

    # CPU computation works fine
    evals_adagrad, evecs_adagrad = torch.linalg.eigh(precond_adagrad.cpu())

    # But eigh computation on GPU fails
    evals_adagrad, evecs_adagrad = torch.linalg.eigh(precond_adagrad)

Error trace

Traceback (most recent call last):
  File "bug_report.py", line 47, in <module>
    evals_adagrad, evecs_adagrad = torch.linalg.eigh(precond_adagrad)
torch._C._LinAlgError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling `cusolverDnXsyevd( handle, params, jobz, uplo, n, CUDA_R_32F, reinterpret_cast<void*>(A), lda, CUDA_R_32F, reinterpret_cast<void*>(W), CUDA_R_32F, reinterpret_cast<void*>(bufferOnDevice), workspaceInBytesOnDevice, reinterpret_cast<void*>(bufferOnHost), workspaceInBytesOnHost, info)`. This error may appear if the input matrix contains NaN.

cc @ezyang @gchanan @zou3519 @ptrblck @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano @ngimel @hjmshi @mikerabbat @tsunghsienlee @dmudiger

Versions

PyTorch version: 2.0.0.dev20230201+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.25.2
Libc version: glibc-2.31

Python version: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 470.141.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          80
On-line CPU(s) list:             0-79
Thread(s) per core:              2
Core(s) per socket:              20
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Stepping:                        1
CPU MHz:                         2414.868
CPU max MHz:                     3600.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4399.78
Virtualization:                  VT-x
L1d cache:                       1.3 MiB
L1i cache:                       1.3 MiB
L2 cache:                        10 MiB
L3 cache:                        100 MiB
NUMA node0 CPU(s):               0-19,40-59
NUMA node1 CPU(s):               20-39,60-79
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

Versions of relevant libraries:
[pip3] mypy==0.991
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.24.0
[pip3] pytorch-triton==2.0.0+0d7e753227
[pip3] torch==2.0.0.dev20230201+cu118
[pip3] torchvision==0.15.0.dev20230201+cu118
[conda] numpy                     1.24.0                   pypi_0    pypi
[conda] pytorch-triton            2.0.0+0d7e753227          pypi_0    pypi
[conda] torch                     2.0.0.dev20230201+cu118          pypi_0    pypi
[conda] torchvision               0.15.0.dev20230201+cu118          pypi_0    pypi

The text was updated successfully, but these errors were encountered:

hjmshi · 2023-02-14T00:07:58Z

Just want to emphasize that this is a major issue for experimenting with full-matrix Adagrad and Shampoo. Any suggestions on how to proceed would be much appreciated!

cc: @shintaro-iwasaki @0x10cxR1 @kaushik88 @mhfb22 @dmudigere

dmudigere · 2023-02-14T18:13:06Z

also cc: @xwang233

IvanYashchuk · 2023-02-14T18:37:02Z

The matrix is all zeros and unfortunately cuSOLVER backend doesn't handle this edge case without an error. Could you try initializing vec_grad to non-zero values?

gallego-posada · 2023-02-14T18:57:37Z

If you are referring to precond_adagrad, this matrix is not all zero. It is the outer product of the gradient with itself, and the gradient is non-zero for at the initialization.

For a typical run I observe:

(Pdb) precond_adagrad.abs().max()
tensor(0.0697, device='cuda:0')
(Pdb) precond_adagrad.abs().min()
tensor(3.6154e-13, device='cuda:0')
(Pdb) precond_adagrad.abs().median()
tensor(0.0002, device='cuda:0')

hjmshi · 2023-02-14T19:12:07Z

Thanks for looking into it @IvanYashchuk! FYI, @mikerabbat has observed failure cases of torch.linalg.eigh on sparse diagonal matrices, with 6-7 non-zero entries on the diagonal. (This can be reproduced by running full-matrix Adagrad or Shampoo on a dense embedding table for sequence-to-sequence modeling, for example.) Should we consider incorporating checks on the matrix for these special cases, either in PyTorch or cuSOLVER? Thanks again.

lezcano · 2023-02-14T22:05:25Z

It'd be good to have the exact matrix for which this one fails. Could you guys provide that example?

My money is on the fact that the input matrix has repeated eigenvalues. Eigenvalue solvers struggle a bit with repeated eigenvalues...

gallego-posada · 2023-02-14T22:33:10Z

Thanks @lezcano -- I have saved the numpy array for to thevec_grad tensor here:
vec_grad.txt

You can restore the matrix precond_adagrad by:

import torch
import numpy as np

vec_grad = torch.tensor(np.loadtxt("vec_grad.out")).float()
precond_adagrad = torch.outer(vec_grad, vec_grad)

lezcano · 2023-02-15T00:43:27Z

Right. An outer product gives a matrix of rank one. As such, in this case, you have a matrix of shape 7840x7840 with 1 non-zero eigenvalue and 7839 eigenvalues equal to zero. Eigensolvers choke on these, as their performance often depends on the inverse of the eigengap of the matrix.

In other words, your solution to the problem is a bit numerically unstable, similar to when you try to solve a linear system with a matrix that's almost singular.

What I would suggest is that, if you know that your matrix has the rank-1 structure, or low-rank structure in general, you use other methods. For a rank-1 symmetric matrix of the form vv^T it's easy. Its non-zero eigenvalue is simply given by the square of the norm of v. For low-rank matrices, there are iterative methods in literature that are fast and stable if you have a bound on the rank of the matrix (which I believe you do in your problem). Alas, we don't have any of these methods in PyTorch.

hjmshi · 2023-02-15T05:43:12Z

@lezcano, thanks for your response. Just as additional context: torch.linalg.eigh does not fail on these matrices on CPU, this is specific to GPU/CUDA. We have observed this problem with CUDA 11.6, 11.7, and 11.8, but we do not observe these failures with CUDA 11.4. This also may be resolved in CUDA 12, but it would be great if we can have these problems ironed out on earlier versions since the current stable build of PyTorch suggests CUDA 11.6 and 11.7.

We will try to dig up some additional failure cases where the matrix is not just low-rank and post our findings as we also investigate further on our side. Thanks!

lezcano · 2023-02-15T07:47:02Z

Note that the difference of behaviour between CPU and CUDA and this exact point on matrices with repeated eigenvalues is documented here: https://pytorch.org/docs/master/notes/numerical_accuracy.html#extremal-values-in-linalg

hjmshi · 2023-02-15T19:10:53Z

Hi @lezcano, just want to clarify that on CPU, the algorithm at least succeeds (perhaps with numerical error), while on GPU, the algorithm entirely fails. We are willing to tolerate numerical error in the eigenvectors if the matrix is low-rank, as it will still produce an orthonormal basis for the zero eigenvalue subspace. I am referring here to the error bounds described here: https://netlib.org/lapack/lug/node90.html. Based on my understanding of your argument, even if we regularize by some constant here by adding epsilon * I, wouldn't the algorithm still fail? Would it even fail on a diagonal matrix c * I?

If you recommend using a different algorithm for the low rank case, which algorithms would you suggest? Thanks again.

lezcano · 2023-02-15T21:31:14Z

The error bounds in the link you provide say what I mentioned above. You see that $\theta$ is bounded by a expression that depends on the inverse of the eigengap so...
You'd have the same issue if you add eps * I as the eigenvalues of eps * I + vv^T are n-1 equal to eps and one equal to eps + \norm{v}^2.

Theoretically, these algorithms may even fail for diagonal matrices, but in this case, since the matrix is exactly low-rank on floating point precision, the algorithm often succeeds. Note that vv^T is just 1-rank in floating point precision, but when you perform operations on it you may get some very small numbers that which cause these algorithms to fail.

As to which algorithm to use... well, that very much depends on your exact problem. I'd recommend you discuss it with your local linear algebra expert. For large and very sparse matrices, you have the classic SVD approximation described in https://arxiv.org/abs/0909.4061, that I believe would work well for your case, as you can give a bound on the rank of your matrix. You even have a post from meta describing what's what, and there is a fast implementation in cusolver of it. Again, different algorithms may be more amenable to different problems.

anana10c · 2023-07-18T16:43:01Z

Thanks @lezcano for the quick response on #105359! I understand that failure is expected on nearly singular matrices, but the matrix becoming unrecoverable afterwards may be an issue for our use case. Is there anything that can be done besides implementing checks on our side to convert to float64 before calling eigh, or preconditioning the matrix as you suggested?

Additionally, I was able to reproduce the error yesterday on CUDA 12.1 as well. It seems to only occur on matrices larger than 512 x 512 - is the issue likely to be in the syevd solver?

lezcano · 2023-07-18T17:32:50Z

The issue is indeed in the cusolver side, so there's very little we can do on our end really. I don't think that there's any other solution for this ATM.

hjmshi · 2023-07-18T17:35:08Z

cc: @dmudigere @xwang233

xwang233 · 2023-07-18T17:36:27Z

Thanks for the reminder. We'll take a look

gottbrath · 2023-07-18T19:50:12Z

What is it that is left in an unrecoverable state? The tensor? The GPU?

anana10c · 2023-07-18T20:27:16Z

It seems that nothing on the GPU can be accessed anymore without a CUDA illegal memory access error.

import torch

factor_matrix = torch.load("rank7_idx0.1.3.0_iter100_factor.pt")
factor_matrix = factor_matrix.to("cuda:0")

tmp = torch.arange(10)
tmp = tmp.to("cuda:0")

torch.linalg.eigh(factor_matrix)  # fails

print(tmp)  # CUDA illegal memory access

It turns out that I may have been incorrect about the error occurring in CUDA 12.1 - after changing some environment variables, it seems to no longer fail? I'll investigate this some more and keep everyone updated.

anana10c · 2023-07-18T21:45:35Z

Seems like I was running on PyTorch nightly (2.1.0.dev20230717+cu121) by accident. Here's what I seem to have observed so far (please let me know if you can reproduce this!):

PyTorch from source (2.1.0a0+git6855053) + CUDA 11.8 fails (causes unrecoverable state)
PyTorch nightly (2.1.0.dev20230717+cu121) + CUDA 12.1 fails (causes unrecoverable state)
PyTorch from source (2.1.0a0+git6855053) + CUDA 12.1 succeeds (no cusolver error)

If another matrix were to fail in the last setup, I'm unsure whether it would cause the unrecoverable state or not.

mrogowski · 2023-07-20T13:02:30Z

I believe this issue has been fixed starting from cuSOLVER 11.4.5, which shipped with CUDA 12.1 Update 1.

Can you please confirm if this matches your observations?

anana10c · 2023-07-21T17:42:40Z

As before, while everything seems fine with PyTorch built from source + CUDA 12.1, if I use PyTorch nightly + CUDA 12.1 instead the error still occurs:

Python 3.9.17 (main, Jul  5 2023, 20:41:20) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.1.0.dev20230721+cu121'
>>> torch.version.cuda
'12.1'
>>> factor_matrix = torch.load("rank7_idx0.1.3.0_iter100_factor.pt")
>>> factor_matrix = factor_matrix.to("cuda:0")
>>> torch.linalg.eigh(factor_matrix)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling `cusolverDnXsyevd( handle, params, jobz, uplo, n, CUDA_R_32F, reinterpret_cast<void*>(A), lda, CUDA_R_32F, reinterpret_cast<void*>(W), CUDA_R_32F, reinterpret_cast<void*>(bufferOnDevice), workspaceInBytesOnDevice, reinterpret_cast<void*>(bufferOnHost), workspaceInBytesOnHost, info)`
>>> factor_matrix
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/fsx/users/annacai/conda/envs/eigh2/lib/python3.9/site-packages/torch/_tensor.py", line 430, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/fsx/users/annacai/conda/envs/eigh2/lib/python3.9/site-packages/torch/_tensor_str.py", line 674, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/fsx/users/annacai/conda/envs/eigh2/lib/python3.9/site-packages/torch/_tensor_str.py", line 605, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/fsx/users/annacai/conda/envs/eigh2/lib/python3.9/site-packages/torch/_tensor_str.py", line 357, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/fsx/users/annacai/conda/envs/eigh2/lib/python3.9/site-packages/torch/_tensor_str.py", line 393, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "/fsx/users/annacai/conda/envs/eigh2/lib/python3.9/site-packages/torch/_tensor_str.py", line 393, in <listcomp>
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "/fsx/users/annacai/conda/envs/eigh2/lib/python3.9/site-packages/torch/_tensor_str.py", line 383, in get_summarized_data
    return torch.cat(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

anana10c · 2023-07-21T17:49:43Z

Is there a way for me to check the cusolver version? My CUDA runtime version seems to be 12.1.105.

mrogowski · 2023-07-21T17:59:11Z

You can check the version stated in the filename of the library, for example: /usr/local/cuda/lib64/libcusolver.so.11.4.5.107

anana10c · 2023-07-21T18:07:36Z

I see, libcusolver.so.11.4.5.107 seems to exist in /usr/local/cuda-12.1/lib64 for me. Both versions of PyTorch that I've tried should be built on the same CUDA version (in /usr/local/cuda-12.1).

mrogowski · 2023-07-21T18:35:26Z

I was able to reproduce the issue using the script from the first post, but that error was definitely resolved in CUDA 12.1 Update 1. Can you share your file, rank7_idx0.1.3.0_iter100_factor.pt, so that I can try to reproduce your issue?

Also, please note that cuSOLVER had this issue in CUDA 12.1, and it was fixed in the update. You can check the version with nvcc -V (you will see 12.1.66 for CUDA 12.1 and 12.1.105 for CUDA 12.1 Update 1).

anana10c · 2023-07-21T18:37:32Z

The file rank7_idx0.1.3.0_iter100_factor.pt can be found here: #105359

nvcc -V returns:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

mrogowski · 2023-07-26T15:55:43Z

Regarding the CUDA version discussion, this is fixed in 12.1 Update 1.

malfet · 2023-07-26T16:32:00Z

@lezcano I'm totally fine with numerical errors, but imo any types of crashes/unrecoverable errors on a valid input should be considered as high-pri (as it breaks the pipeline). Also, based on @anana10c comment, added module: regression to the list, as the same used to work with CUDA-11.4, but is broken with CUDA-11.8 and above.

Regarding: "there is not much else we can do but wait", I disagree with you: we can stick to CUDA-11.4, we can use magma and so on.

soulitzer · 2023-07-31T18:13:23Z

From discussion in triage review:

we should add a test to prevent regressions
properly document support wrt different CUDA versions
possibly add support using MAGMA

xwang233 · 2023-08-01T17:49:05Z

We'll add a test case for those inputs.

For "possibly add support using MAGMA", users may refer to this function to prefer MAGMA as the linear algebra backend library. This is a global runtime switch. https://pytorch.org/docs/stable/backends.html#torch.backends.cuda.preferred_linalg_library

torch.backends.cuda.preferred_linalg_library('magma')

There might be performance hits when switching from default (cusolver) to MAGMA in eigh and other eigh-based functions.

…conditioned, in some cusolver version (#107082) Related: #94772, #105359 I can locally reproduce this crash with pytorch 2.0.1 stable pip binary. The test already passes with the latest cuda 12.2 release. Re: #94772 (comment) > From discussion in triage review: - [x] we should add a test to prevent regressions - [x] properly document support wrt different CUDA versions - [x] possibly add support using MAGMA Pull Request resolved: #107082 Approved by: https://github.com/lezcano

…conditioned, in some cusolver version (pytorch#107082) Related: pytorch#94772, pytorch#105359 I can locally reproduce this crash with pytorch 2.0.1 stable pip binary. The test already passes with the latest cuda 12.2 release. Re: pytorch#94772 (comment) > From discussion in triage review: - [x] we should add a test to prevent regressions - [x] properly document support wrt different CUDA versions - [x] possibly add support using MAGMA Pull Request resolved: pytorch#107082 Approved by: https://github.com/lezcano

ptrblck · 2023-08-23T19:45:41Z

Reproduced the issue using 2.0.1+cu117 as well as a recent nightly with 11.8: torch-2.1.0.dev20230823+cu118. The latest 12.1Update1 nightly works fine and does not reproduce the issue anymore: torch-2.1.0.dev20230823+cu121.

@malfet

* Set FORCE_RPATH for ROCm (pytorch#1468) * Decouple aarch64 ci setup and build (pytorch#1470) * Run git update-index --chmod=+x aarch64_ci_setup.sh (pytorch#1471) * [aarch64][CICD]Add aarch64 docker image build. (pytorch#1472) * Add aarch64 docker image build * removing ulimit for PT workflow * set aarch64 worker for docker build * Fix `install_conda.sh` By pinning conda version to 23.5.2 as latest(23.7.2 at this time) does not have a compatible version of `git` packages Fixes pytorch#1473 * Remove explicit `conda install cmake` As it's already done as part of `common/install_conda.sh` script * update to CUDA 12.1U1 (pytorch#1476) Should fix pytorch/pytorch#94772 in wheel builds * Use conda version 23.5.2 for conda pytorch build (pytorch#1477) * Use py311 miniconda install (pytorch#1479) * Windows conda build fix (pytorch#1480) * Revert "Use py311 miniconda install (pytorch#1479)" (pytorch#1481) This reverts commit 5585c05. * Remove c/cb folder on windows (pytorch#1482) * Add numpy install - fix windows smoke tests (pytorch#1483) * Add numpy install * Add numpy install * Add hostedtoolcache purge step (pytorch#1484) * Add hostedtoolcache purge step * Change step name * Update CUDA_UPGRADE_GUIDE.MD * update CUDA to 12.1U1 for Windows (pytorch#1485) * Small improvements in build pytorch script (pytorch#1486) * Undo using conda activate (pytorch#1487) * Update meta.yaml (pytorch#1389) * Add pytorch-triton-rocm as an install dependency for ROCm (pytorch#1463) * Add pytorch-triton-rocm as an install dependency for ROCm * Update build_rocm.sh * Add aarch64 to validation framework (pytorch#1474) * Add aarch64 to validation framework (pytorch#1489) * Add aarch64 to validation framework (pytorch#1490) * Add aarch64 to validation framework * Add aarch64 to validation framework * Add aarch64 to validation framework (pytorch#1491) * Add aarch64 to validation framework * Add aarch64 to validation framework * Add aarch64 to validation framework * Temporary disable poetry test (pytorch#1492) * Add torchonly option to validation workflows (pytorch#1494) * Add torchonly option to validation workflows * fix typo * Remove pipy validation temporarily (pytorch#1495) * Remove pipy validation temporarily (pytorch#1496) * Add no-sudo to linux-aarch64 tests (pytorch#1499) * Pass container image to aarch64 test jobs (pytorch#1500) * Add setup aarch64 builds for aarch64 testing (pytorch#1501) * Fix DESIRED_PYTHON setting for aarch64 validations (pytorch#1502) * Use extra-index-url for aarch64 builds (pytorch#1503) * Pypi validation enable (pytorch#1504) * Validation pypi torchonly (pytorch#1505) * Pipy validation workflow (pytorch#1506) * Pipy validation workflow (pytorch#1507) * Pipy validation workflow (pytorch#1508) * Pipy validation workflow (pytorch#1509) * Validate poetry workflow (pytorch#1511) * Validate poetry workflow (pytorch#1512) * Remove linux-aarch64 installation workaround (pytorch#1513) * Temporary change test aarch64 builds (pytorch#1514) * Remove torchonly restictions from aarch64 builds (pytorch#1517) * Fix aarch64 nightly/release version override (pytorch#1518) * Aarch64 fix overrdie passing from CI to build * Aarch64 fix overrdie passing from CI to build * Aarch64 fix overrdie passing from CI to build * Revert "Temporary change test aarch64 builds (pytorch#1514)" (pytorch#1521) This reverts commit 1e281be. * Changes related to OVERRIDE_PACKAGE_VERSION in aarch64 builds (pytorch#1520) (pytorch#1523) * Torchmetrics in S3 Index (pytorch#1522) We will need the stable torchmetrics wheel in the S3 index, since torchrec depends on it. This is similar to how pytorch depends on numpy, etc. and these binaries need to be hosted in our index when uses try to pip install from download.pytorch.org. * [aarch64] update ACL version to v23.05.1 and OpenBLAS to v0.3.20 (pytorch#1488) * Changed runner for linux arm64 (pytorch#1525) * Add torch-tensorrt to S3 PyPI Index (pytorch#1529) As pytorch/tensorrt moves off of CCI onto Nova, we must to host their nightlies on our S3 index. This change allows the indexing to occur correctly for this package. * Enable torch compile for python 3.11 smoke tests (pytorch#1534) * Enable torch compile for python 3.11 smoke tests * Make sure release is covered * Fix typo * add jinja2 (pytorch#1536) * Remove restriction on 3.11 (pytorch#1537) * Revert "add jinja2 (pytorch#1536)" (pytorch#1538) This reverts commit 224a4c5. * S3 Management Job Outside Docker (pytorch#1531) * S3 Management Job Outside Docker * job name * remove failfast * no matrix * inherit secrets * spacing? * random nits * add back secrets * add back matrix * export env vars correctlty * Update update-s3-html.yml * Add fbgemm-gpu to S3 Index (pytorch#1539) * Update builder images to ROCm5.7 (pytorch#1541) * Update docker build images for rocm5.7 * Fix erroneous logic that was skipping msccl files even for ROCm5.6; update msccl path for ROCm5.7 (cherry picked from commit 36c10cc) * missing bzip2 package install for miopen * Revert "missing bzip2 package install for miopen" This reverts commit 8ef5fc9. * ROCm 5.7 MIOpen does not need any patches, do not build from source --------- Co-authored-by: Jeff Daily <jeff.daily@amd.com> * Update docker build convenience scripts to ROCm5.7 (pytorch#1543) * Do not uninstall MIOpen if skipping build-from-source (pytorch#1544) * Install nvtx3 on Windows (pytorch#1547) * Provide file hashes in the URLs to avoid unnecessary file downloads (bandwidth saver) (pytorch#1433) Supply sha256 query parameters using boto3 to avoid hundreds of extra Gigabytes of downloads each day during pipenv and poetry resolution lock cycles. Fixes point 1 in pytorch/pytorch#76557 Fixes pytorch#1347 * Workaround for older files * Bugfixes introduced by pytorch#1433 Replace `obj` with `obj.key` in few places Dismantle pyramid of doom while iterating over objects Test plan: Run `python manage.py whl/test --generate-pep503` * [S3_management] Update boto3 to 1.28.53 * [manage_s3] Download objects metadata concurrently Using `concurrent.futures.ThreadPoolExecutor` This speeds up rebuilding `whl/test` index from 300 sec to 90 sec on my laptop * Make smoke-test runnable without envvars * [aarch64] set acl_build_flags arch=armv8a, remove editing build flags (pytorch#1550) Looking at this PR: pytorch#1370 this line: https://github.com/pytorch/builder/pull/1370/files#diff-54480d0a69ca27f54fb0736a9762caa8b03bd4736dcd77190d99ec3033c9bd2fR229 That fixed the issue: pytorch/pytorch#97226 One of the changes is to set ``` arch=armv8a ``` We are experiencing the same issue now: pytorch/pytorch#109312 Hence this fix. * [BE] Fix all flake8 violations in `smoke_test.py` (pytorch#1553) Namely: - `if(x):` -> `if x:` - `"dev\d+"` -> `"dev\\d+"` - Keep 2 newlines between functions - Add `assert foo is not None` to suppress "variable assigned but not used" warning * [aarch64] patch mkl-dnn to use 'march=armv8-a' as the default build (pytorch#1554) * [aarch64] patch pytorch 2.1 for mkl-dnn fix (pytorch#1555) * patch ci script with mkldnn fix (pytorch#1556) * [BE] Add lint workflow (pytorch#1557) And format `smoke_test.py` with `ruff` Invoke/confgure `ruff` using `lintrunner` Copy lint runner adapters from https://github.com/pytorch/pytorch/tree/main/tools/linter/adapters * [BE] Add `s3_management` to the linted folders (pytorch#1558) Add `PERF401` to list of ignored suggestions, fix the rest. * Fix path issue when building aarch64 wheels (pytorch#1560) * Fix linalg smoke tests (pytorch#1563) * Towards enabling M1 wheel builds Do not try to install MKL on Apple Silicon * And only install llvm-9 on x86 systems * Do not build tests when building natively on M1 * And fix Python-3.8 native compilation on M1 There are no numpy=3.17 for M1 * Release 2.1 update promotion scripts (pytorch#1564) * [BE] Small code cleanup Fold multiple inidices and single index generation into one loop As loop body is the same anyway... * S3_management: Add option to compute sha256 That will be used later to generate sha256 indexes in PEP503 * Remove debug print * [S3_management] Minor improvements - Refactor `fetch_obj_names` into class method - Make sure that object remains public when ACL is computed - Add `has_public_read` and `grant_public_read` class methods * s3_management: compute checksum in cloud I.e. file never gets downloaded on the client, which is a nice thing * [S3Management] Add `undelete_prefix` method That can be used to recover object in a versioned bucket * Validate poetry for release (pytorch#1567) * Validate poetry for release * test * test * fixtypo * Use released version of 3.12 (pytorch#1568) As it was released on Oct 6 2023: https://www.python.org/downloads/release/python-3120/ * Move manywheel builds to `linux.12xlarge.ephemeral` (pytorch#1569) Should be faster(<20 min vs 40+ min) and as secure as using GH ones * Add cuSparseLt-0.5.0 to manywheel images * Use `linux.12xlarge.ephemeral` for conda docker builds (pytorch#1570) As `ubuntu.20.04` often OOM/failed to fetch data from RHEL repo * Revert "Add cuSparseLt-0.5.0 to manywheel images" This reverts commit 00841b6 as cuSparseLT is not compatible with CentOS 7 * Move libtorch docker builder to `linux.12xlarge.ephemeral` (pytorch#1571) As running it on `ubutu22.04` often results in flay infra failures/running out of disk space, for example, from https://github.com/pytorch/builder/actions/runs/6484948230/job/17609933012 ``` cat: write error: No space left on device ``` * Add cuSparseLt-0.4.0 to manywheel images But set USE_CUSPARSELT to 0 by default * Add xformers to the list of indexable packages * Build wheels with cuSparseLt Build libtorch without cuSparseLt so far Factor out `DEPS_LIST` to top level and add cuSparseLt of `USE_CUSPARSELT` is set to 1 Tested in pytorch/pytorch#111245 * Do not build conda with CuSparseLT * Add ROCM_PATH env var to Dockerfile for ROCm5.7 issue with finding HIP (pytorch#1572) * [aarch64_wheel] Minor typing improvements * [aarch64_wheel] Flake8 fix * [aarch64_wheel] Cosmetic changes * [aarch64_wheel] Fix readdir crash Probably fixes pytorch/pytorch#111695 * [S3_management] generate libtorch index.html * [CI] Update ruff to 0.1.1 To keep it in sync with pytorch * Get rid of http://repo.okay.com.mx (pytorch#1575) * [S3_management] Print time it takes to fetch index * [S3_manage] Handle invalid versions * [S3_management] Fix Version on error And fix flake8 lint violation * [S3_Management] Refactor `from_S3` Move `fetch_metadata` into its own method, which could be called later on Make S3Object non-frozen and introduce implicit __hash__ method * [S3_Management] Filter nighly before `fetch_metadata` This reduces time to call `from_S3Index` from 600 to 80 sec * Add option to build -arm64- libtorch binaries * [Docker] Remove trailing whitespace And cause docker rebuild, to overwrite docker build from release/2.1 branch artifacts * [MacOS] Small changes to libtorch naming Intel x86 libtorch builds will have `x86_64` suffix and Apple Silicon ones will have `arm64` ones, but latest will point to Intel ones for now. * Update libtorch/Dockerfile to use Ubuntu-20.04 (pytorch#1578) As 18.04 EOLed * Conda builds should respect `MAX_JOBS` May be this help with OOMs * [S3_management] Fix subpackage urls Make them `lower()` * Advance versions for release 2.1.1 (pytorch#1583) * [aarch64] Release pypi prep script change for aarch64 builds (pytorch#1585) * Changes needed for core enablement of 3.12 binary wheels (pytorch#1586) * Fix aarch64 build on 3.8 (pytorch#1593) * Add some more validation checks for torch.linalg.eigh and torch.compile (pytorch#1580) * Add some more validation checks for torch.linalg.eigh and torch.compile * Update test * Also update smoke_test.py * Fix lint * Revert "Add some more validation checks for torch.linalg.eigh and torch.compile (pytorch#1580)" (pytorch#1594) This reverts commit 4c7fa06. * Release validations using release version matrix (pytorch#1611) * Release pypi prep change (pytorch#1587) * [aarch64] Release pypi prep script change for aarch64 builds * Release versions for testing Testing calling version (pytorch#1588) Upstream/release validations (pytorch#1589) * Testing calling version * add release matrix Upstream/release validations (pytorch#1590) * Testing calling version * add release matrix * test test (pytorch#1591) test (pytorch#1592) Release v1 (pytorch#1595) * test * test Release v1 (pytorch#1596) * test * test * test test (pytorch#1597) Test versions validations (pytorch#1598) * test * basedir Test versions validations (pytorch#1599) * test * basedir * test test (pytorch#1600) * test * test Add release versions everywhere (pytorch#1601) * test * test * test * test test (pytorch#1602) Test version validations (pytorch#1603) * test * test Test version validations (pytorch#1604) * test * test * test tests (pytorch#1605) More tests nov16 (pytorch#1606) * tests * test More tests nov16 (pytorch#1607) * tests * test * test More tests nov16 (pytorch#1608) * tests * test * test * test More tests nov16 (pytorch#1609) * tests * test * test * test * test * fix_lint * fix: typo (pytorch#1581) * desired_cuda -> DESIRED_CUDA (pytorch#1612) * desired_cuda -> DESIRED_CUDA Found with shellcheck * Update manywheel/build_cuda.sh Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> --------- Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> * [BE] Cleanup build unused code (pytorch#1613) 1. Upload Scripts are not used anymore. We use Github Action upload workflows 2. M1 Builds are now automated 3. build_all.bat run git grep in pytorch and builder - No result * Changes to pypi release promotion scripts introduced for 2.1.0 and 2.1.1 (pytorch#1614) * Changes topypi release promotion scripts introduced during 2.1.1 * typo * Pin miniconda version for Windows To Miniconda3-py311_23.9.0-0-Windows-x86_64.exe * Fix poetry and pypi validations when version is specified (pytorch#1622) * test (pytorch#1617) Fix validations (pytorch#1618) * test * poetry_fix * test Fix validations (pytorch#1619) * test * poetry_fix * test * test * restrict * Validate pypi build only for release (pytorch#1623) * Validate pypi build only for release (pytorch#1624) * [Manywheel] Do not hardcode triton version * [Manywheel][BE] Dedup Triton requirement spec * [Manywheel] Restrict `pytorch-triton` to x86-64 Linux Partially addresses pytorch/pytorch#114042 * Tweak py312 conda requirements * Build PyTorch without TLS for 3.12 Because GLOO still expect OpenSSL-1, but 3.12 is build with OpenSSL-3 * [conda] Skip sympy for 3.12 As at the moment it is only available for Windows %) * [conda] Do not depend on triton for 3.12 yet * Tweak mkl requirements for win+py312 * Add aarch64 conda env lib to LD_LIBRARY_PATH (pytorch#1628) After the change on pytorch#1586, nightly aarch64 wheel fails to find `libopenblas.so` which is now installed under `/opt/conda/envs/aarch64_env/lib/` instead of the base conda `/opt/conda/lib`. Using CPU nightly wheels on aarch64 from Nov 16 then ends up with the error as described in pytorch/pytorch#114862: `Calling torch.geqrf on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support`. The error can be found on night build log https://github.com/pytorch/pytorch/actions/runs/6887666324/job/18735230109#step:15:4933 Fixes pytorch/pytorch#114862 I double check `2.1.[0-1]` and the current RC for 2.1.2, the issue is not there because pytorch#1586 only change builder main, thus impacting nightly. ### Testing Build nightly wheel manually on aarch64 runner and confirm that openblas is detected correctly: ``` -- Found a library with BLAS API (open). Full path: (/opt/conda/envs/aarch64_env/lib/libopenblas.so) ... -- USE_BLAS : 1 -- BLAS : open -- BLAS_HAS_SBGEMM : -- USE_LAPACK : 1 -- LAPACK : open ... ``` * Revert "[conda] Skip sympy for 3.12" This reverts commit 88457a1. As sympy has been updated to 1.12 and it now supports Python-3.12 * [aarch64] ACL, OpenBLAS and mkldnn updates for PyTorch 2.2 (pytorch#1627) Note# ~~This PR has a dependency on updating the oneDNN version to v3.3 (via ideep submodule to v3.3)~~ ideep submodule update is done, so, this PR can be merged anytime now. This PR is for: ACL - build with fixed format kernels OpenBLAS - upgrade the version to 0.3.25 numpy - upgrade version to 1.26.2 and mkldnn - cleanup the patches that are already upstreamed. * Validation scripts, install using version (pytorch#1633) * Test Windows static lib (pytorch#1465) Add support for testing Windows Cuda static lib * Pin windows intel-openmp to 2023.2.0 (pytorch#1635) (pytorch#1636) * Torch compile test for python 3.8-3.11 linux only (pytorch#1629) This should fix failure on with Python 3.12 validations: https://github.com/pytorch/builder/actions/runs/7064433251/job/19232483984#step:11:4859 * [aarch64] cleanup mkldnn patching (pytorch#1630) pytorch is moved to oneDNN v3.3.2 and some of the old patches are not applicable any more. * Add `aarch64_linux` to the list of linted files * Actually fix lint this type * Extend test_linalg from smoke_test.py To take device as an argument and run tests on both cpu and cuda * Run smoke_test_linalg during check_binary This is a regression test for pytorch/pytorch#114862 * Fix linalg testing * [BE] Add CI for check_binary.sh changes (pytorch#1637) Make sure latest nightly passes the testing for: - Linux Wheel CPU - Linux Wheel CUDA Tweak script a bit to work correctly with relative path to executable * Keep nightly 20231010 for ExecuTorch alpha 0.1 for now (pytorch#1642) * [Validations] do conda update before starting validations (pytorch#1643) * [Validations] Validate aarch64 if all is slected (pytorch#1644) * Fix validation workflow on aarch64 with conda 23.11.0 and GLIBC_2.25 (pytorch#1645) * Debug aarch64 clone * Debug * Fix validation workflow with conda 23.11.0 and GLIBC_2.25 * Gate the change on linux-aarch64 and keep the old LD_LIBRARY_PATH * Try to unset LD_LIBRARY_PATH in the workflow instead * Fix copy/paste typo * Do not hardcode triton version in builder code (pytorch#1646) * Do not hardcode triton version in builder code * Minor tweak to use pytorch_rootdir * [Lint] Prohibit tabs in shell scripts Fix current violations * Link conda packages with cusparselt Fixes pytorch/pytorch#115085 * aarch64: patch mkl-dnn for xbyak crashes due to /sys not accessible (pytorch#1648) There are platforms with /sys not mounted. skip handling HW caps for such platforms. cherry-pick of: oneapi-src/oneDNN#1773 This fixes the issue# pytorch/pytorch#115482 * Update builder images to ROCm6.0 (pytorch#1647) * Update ROCm versions for docker images * Don't build MIOpen from source for ROCm6.0 * Temporarily use magma fork with ROCm6.0 patch * Update ROCm versions for docker images * Add gfx942 * Update MIOpen repo * Magma PR 42 is merged, so use upstream repo master branch now * gfx942 target only fully supported for ROCm6.0 and above * Avoid finding out std::basic_string_view (pytorch#1528) As pytorch moving to C++17, the binary can contain both "std::basic_string_view" and "std::__cxx11::basic_string<", change the pattern to avoid finding out std::basic_string_view, causing false positives. * Add test ops validation for validation workflows (pytorch#1650) * Add test ops validation * include workflows * Add test ops validation for validation workflows (pytorch#1651) * Add test ops validation for validation workflows (pytorch#1652) * Add test ops validation for validation workflows (pytorch#1653) * Add test ops validation for validation workflows (pytorch#1654) * Add test ops validation for validation workflows (pytorch#1655) * [validations] Add missing required packages (pytorch#1656) * [validations] Perform test_ops only on CUDA binaries (pytorch#1657) * [validations] Adjust timeout for linux jobs (pytorch#1658) * [validations] Restrict testing for python 3.8-3.11 (pytorch#1659) * [validations] Fix use case if INCLUDE_TEST_OPS is not set (pytorch#1660) * Add unit tests and one line reproducers to detect bad pytorch cuda wheels (pytorch#1663) * Add one line reproducers and unit tests that would fail when bad wheels were generated by the compiler(s). nextafter reproducer thanks to @malfet! * cosmetic fixes * fix comments * Fix quotation issues when migrating from python file to one line format (pytorch#1664) Sorry, looks like the last line had an issue while porting it from multi-line python file to one-line. Side question: when does this file get used? Is it only used during release binary generation/testing? * Add nccl version print for cuda related smoke test (pytorch#1667) * Apply nccl test to linux only (pytorch#1669) * Build nccl after installing cuda (pytorch#1670) Fix: pytorch/pytorch#116977 Nccl 2.19.3 don't exist for cuda 11.8 and cuda 12.1. Refer to https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-19-3.html#rel_2-19-3 CUDA 12.0, 12.2, 12.3 are supported. Hence we do manual build. Follow this build process: https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build We want nccl version be exactly the same as installed here: https://github.com/pytorch/pytorch/blob/main/.github/scripts/generate_binary_build_matrix.py#L45 * Update cusparselt to v0.5.2 (pytorch#1672) This PR adds in support for cuSPARSELt v0.5.2 and updates the cuda 12.1 build step to use it instead of 0.4.0 Also fixes a typo when deleting the cusparselt folder after installing. * Run test ops tests from outside of pytorch root folder (pytorch#1676) * Remove s3 update html job and scripts (pytorch#1677) * [BE] Remove unused nightly_defaults.bat (pytorch#1678) * [Conda] Mark `blas * mkl` as x86 only dependency * [Conda] Download arch appropriate Miniconda By using `$(uname -m)` as suffix, which is arm64 on Apple Silicon and x86 on Intel Macs * [Conda] Do not depend on llvmdev-9 on ARM As earliest available for the platform is llvmdev-11 * [Conda] Set correct developer dir for MacOS runners * [Conda] Add llvm-openmp dependency for ARM64 PyTorch for M1 is finally built with OpenMP, so it needs to depend on it * Use dynamic MKL on Windows (pytorch#1467) Use dynamic MKL on Windows and updated MKL to 2021.4.0 On conda python 3.12 use mkl 2023.1 * Add torchrec to promote s3 script (pytorch#1680) * Add torchrec to promote s3 script * Add torchrec version to release_version.sh * Revert "Dynamic MKL windows" (pytorch#1682) * Revert "Revert "Dynamic MKL windows"" (pytorch#1683) * Add numpy install to windows conda tests (pytorch#1684) * Windows conda test. Install numpy in conda testenv (pytorch#1685) * Add fbgemm to promote s3 script (pytorch#1681) * Release 2.2.0 pypi prep script modifications (pytorch#1686) * [Analytics] add pypi staging validations, remove circleci script (pytorch#1688) * [Analytics] Pypi validations. Add call to check-wheel-contents (pytorch#1689) * Modify Validate Nightly PyPI Wheel Binary Size to pick correct binary (pytorch#1690) * Fix test_ops scripts on release validation testing (pytorch#1691) * Add option to validate only from download.pytorch.org (pytorch#1692) * Exclude pipy and poetry tests when USE_ONLY_DL_PYTORCH_ORG is set (pytorch#1693) * [ROCm] add hipblaslt library files (pytorch#1695) With pytorch/pytorch#114329 merged, we need to include hipblaslt library files within the ROCm wheel. * Minor tweak to fbgemmgpu version to ignore RC suffix (pytorch#1694) * Remove custom PyTorch build dependency logic on 3.11 (pytorch#1697) * Remove custom PyTorch build dependency logic on 3.11 * Add a smoke test for openmp * Pin conda-build to 3.28.4 (pytorch#1698) * ci: aarch64 linux: fix torch performance issue with conda openblas package (pytorch#1696) changing the conda openblas package from pthread version to openmp version to match torch openmp runtime. The pthread version was conflicting with the openmp runtime and causing thread over-subscription and performance degradation. * Add triton version for nightly and release (pytorch#1703) * Bundle PTXAS into 11.8 wheel * Add tensorrt promo script, bump release version for 2.2.1 (pytorch#1706) * Pin Conda to 23.11.0 --------- Co-authored-by: Andrey Talman <atalman@fb.com> Co-authored-by: Mike Schneider <104035434+xncqr@users.noreply.github.com> Co-authored-by: Nikita Shulga <nshulga@meta.com> Co-authored-by: ptrblck <ptrblck@users.noreply.github.com> Co-authored-by: JYX <jyx21@mails.tsinghua.edu.cn> Co-authored-by: Omkar Salpekar <osalpekar@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Danylo Baibak <baibak@meta.com> Co-authored-by: Supadchaya <138070207+spcyppt@users.noreply.github.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: cyy <cyyever@outlook.com> Co-authored-by: Matt Davis <matteius@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Huy Do <huydhn@gmail.com> Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Luo Bo <84075753+0x804d8000@users.noreply.github.com> Co-authored-by: Sergii Dymchenko <kit1980@gmail.com> Co-authored-by: Ionuț Manța <ionut@janeasystems.com> Co-authored-by: Wei Wang <143543872+nWEIdia@users.noreply.github.com> Co-authored-by: Jesse Cai <jessecai@fb.com> Co-authored-by: henrylhtsang <91030427+henrylhtsang@users.noreply.github.com>

xwang233 added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Feb 14, 2023

albanD added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 15, 2023

anana10c mentioned this issue Jul 17, 2023

torch.linalg.eigh fails on GPU and corrupts memory #105359

Closed

malfet added the module: regression It used to work, and now it doesn't label Jul 26, 2023

soulitzer added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module actionable and removed triage review labels Jul 31, 2023

ptrblck mentioned this issue Aug 11, 2023

update to CUDA 12.1U1 pytorch/builder#1476

Merged

malfet closed this as completed in pytorch/builder@941be28 Aug 12, 2023

xwang233 mentioned this issue Aug 12, 2023

[CUDA][Linalg} Patch crash of linalg.eigh when input matrix is ill-conditioned, in some cusolver version #107082

Closed

3 tasks

albanD mentioned this issue Aug 22, 2023

torch._int_mm may yield wrong results starting cuda 12.1 update 1 #107671

Closed

atalman mentioned this issue Sep 15, 2023

PyTorch 2.1 smoke test requirements #109289

Closed

12 tasks

atalman mentioned this issue Oct 26, 2023

Validations for 2.1.1 release #112180

Closed

12 tasks

aykamko mentioned this issue Nov 11, 2023

Failed to compute eigendecomposition facebookresearch/optimizers#13

Open

This was referenced Nov 11, 2023

Add some more validation checks for torch.linalg.eigh and torch.compile pytorch/builder#1579

Closed

Add some more validation checks for torch.linalg.eigh and torch.compile pytorch/builder#1580

Merged

atalman mentioned this issue Nov 30, 2023

Validations for 2.1.2 release #114904

Closed

12 tasks

atalman mentioned this issue Jan 18, 2024

Validations for 2.2 Release. Cherrry Pick Validation and Manual pytorch/test-infra#4855

Closed

11 tasks

atalman mentioned this issue Feb 20, 2024

Validations for 2.2.1 release #120250

Closed

12 tasks

hjmshi mentioned this issue Feb 27, 2024

CUDA 11.8's syevd solver can cause an illegal memory access error when called through torch.linalg.eigh mlcommons/algorithmic-efficiency#655

Closed

atalman mentioned this issue Mar 15, 2024

Release 2.2.2 validations pytorch/test-infra#5015

Closed

12 tasks

atalman mentioned this issue Apr 10, 2024

Release 2.3 manual validations #123736

Closed

12 tasks

atalman mentioned this issue May 29, 2024

Release 2.3.1 validations checklist and cherry-picks #127441

Closed

29 tasks

atalman mentioned this issue Jul 5, 2024

Release 2.4.0 validations checklist and cherry-picks #130151

Open

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`torch.linalg.eigh` fails on GPU #94772

`torch.linalg.eigh` fails on GPU #94772

gallego-posada commented Feb 13, 2023 •

edited by pytorch-bot bot

Loading

hjmshi commented Feb 14, 2023 •

edited

Loading

dmudigere commented Feb 14, 2023

IvanYashchuk commented Feb 14, 2023

gallego-posada commented Feb 14, 2023

hjmshi commented Feb 14, 2023

lezcano commented Feb 14, 2023

gallego-posada commented Feb 14, 2023

lezcano commented Feb 15, 2023

hjmshi commented Feb 15, 2023

lezcano commented Feb 15, 2023

hjmshi commented Feb 15, 2023 •

edited

Loading

lezcano commented Feb 15, 2023

anana10c commented Jul 18, 2023 •

edited

Loading

lezcano commented Jul 18, 2023

hjmshi commented Jul 18, 2023

xwang233 commented Jul 18, 2023

gottbrath commented Jul 18, 2023

anana10c commented Jul 18, 2023

anana10c commented Jul 18, 2023

mrogowski commented Jul 20, 2023

anana10c commented Jul 21, 2023

anana10c commented Jul 21, 2023

mrogowski commented Jul 21, 2023

anana10c commented Jul 21, 2023

mrogowski commented Jul 21, 2023

anana10c commented Jul 21, 2023

mrogowski commented Jul 26, 2023

malfet commented Jul 26, 2023 •

edited

Loading

soulitzer commented Jul 31, 2023

xwang233 commented Aug 1, 2023 •

edited

Loading

ptrblck commented Aug 23, 2023

torch.linalg.eigh fails on GPU #94772

torch.linalg.eigh fails on GPU #94772

Comments

gallego-posada commented Feb 13, 2023 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Minimal replication script

Error trace

Versions

hjmshi commented Feb 14, 2023 • edited Loading

dmudigere commented Feb 14, 2023

IvanYashchuk commented Feb 14, 2023

gallego-posada commented Feb 14, 2023

hjmshi commented Feb 14, 2023

lezcano commented Feb 14, 2023

gallego-posada commented Feb 14, 2023

lezcano commented Feb 15, 2023

hjmshi commented Feb 15, 2023

lezcano commented Feb 15, 2023

hjmshi commented Feb 15, 2023 • edited Loading

lezcano commented Feb 15, 2023

anana10c commented Jul 18, 2023 • edited Loading

lezcano commented Jul 18, 2023

hjmshi commented Jul 18, 2023

xwang233 commented Jul 18, 2023

gottbrath commented Jul 18, 2023

anana10c commented Jul 18, 2023

anana10c commented Jul 18, 2023

mrogowski commented Jul 20, 2023

anana10c commented Jul 21, 2023

anana10c commented Jul 21, 2023

mrogowski commented Jul 21, 2023

anana10c commented Jul 21, 2023

mrogowski commented Jul 21, 2023

anana10c commented Jul 21, 2023

mrogowski commented Jul 26, 2023

malfet commented Jul 26, 2023 • edited Loading

soulitzer commented Jul 31, 2023

xwang233 commented Aug 1, 2023 • edited Loading

ptrblck commented Aug 23, 2023

`torch.linalg.eigh` fails on GPU #94772

`torch.linalg.eigh` fails on GPU #94772

gallego-posada commented Feb 13, 2023 •

edited by pytorch-bot bot

Loading

hjmshi commented Feb 14, 2023 •

edited

Loading

hjmshi commented Feb 15, 2023 •

edited

Loading

anana10c commented Jul 18, 2023 •

edited

Loading

malfet commented Jul 26, 2023 •

edited

Loading

xwang233 commented Aug 1, 2023 •

edited

Loading