CUFFT_INTERNAL_ERROR on RTX 4090 #88038

Yujia-Yan · 2022-10-29T17:53:14Z

🐛 Describe the bug

>>> import torch
>>> torch.fft.rfft(torch.randn(1000).cuda())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

There is a discussion on https://forums.developer.nvidia.com/t/bug-ubuntu-on-wsl2-rtx4090-related-cufft-runtime-error/230883/7 .

Versions

Using pytorch installed with
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

cc @ezyang @gchanan @zou3519 @peterjc123 @mszhanyi @skyline75489 @nbcsm @ngimel @mruberry @peterbell10

The text was updated successfully, but these errors were encountered:

malfet · 2022-11-07T18:09:04Z

Just to clarify, this happen only in Windows Subsystem for Linux or elsewhere as well?

malfet · 2022-11-07T18:14:40Z

@ptrblck can you please confirm that this indeed happens when with 4090 on Linux or only in WSL config?

Yujia-Yan · 2022-11-07T18:56:28Z

Just to clarify, this happen only in Windows Subsystem for Linux or elsewhere as well?

Actually I am using ubuntu Server 22.04.
Update:
I later tried to compile pytorch with CUDA 11.8 and this problem disappears.

ngimel · 2022-11-07T18:59:45Z

So in this case it looks like cufft library doesn't support forward compatibility guarantee (you can run code compiled with older toolkit version, as long as driver on the system supports the new hardware). cc @ptrblck, and we should start producing 11.8 nightlies.

Blackhex · 2022-11-07T19:45:52Z

I don't have 4090 available so I can only add that it is not reproducible on Windows 11 and Ubuntu WSL with 3080.

ptrblck · 2022-11-08T06:33:15Z

can you please confirm that this indeed happens when with 4090 on Linux or only in WSL config?

Yes, this is a cuFFT error which is also visible on Linux.

and we should start producing 11.8 nightlies.

Also yes, and I've already started with its bringup, e.g. in: pytorch/builder#1186

malfet · 2022-11-14T16:24:15Z

Some updates:

this seems to be the bug in CuFFT in CUDA-11.7 that happens on both Linux and Windows, but seems to be fixed in 11.8
It worth trying (and I think some investigation has already been done) to use CuFFT from 11.8 in 11.7 build to see if the fix could be deployed/verified to nightlies first

malfet · 2022-11-28T17:29:02Z

Let's create a frankenbuild for cuda-11.7 for nightlies and see what will happen

malfet · 2022-11-29T21:52:29Z

First thing, that worries me a lot is 2x binary size increase for cuda-11.8 and nvprune does not help much:

$ ls -lah 11.7/libcufft/lib64/libcufft.so.10.7.2.50 11.8/libcufft/lib64/libcufft.so.10.9.0.58 
-rwxr-xr-x 1 nshulga nshulga 131M Nov 29 21:48 11.7/libcufft/lib64/libcufft.so.10.7.2.50
-rwxr-xr-x 1 nshulga nshulga 267M Nov 29 21:49 11.8/libcufft/lib64/libcufft.so.10.9.0.58

Considering that, I'm not sure if it will be safe to include it as an update in 1.13.1

malfet · 2022-12-19T21:24:10Z

Removing high priority label, as this is a bug in a 3rd party library and there were big changes between cufft in 11.7 and 11.8

This PR adds more nvidia pypi dependencies for cuda 11.7 wheel. Additionally, it pins cufft version to 10.9.0.58 to resolve pytorch#88038 Depends on: pytorch/builder#1196 Pull Request resolved: pytorch#89944 Approved by: https://github.com/atalman

pranavmalikk · 2023-02-04T06:10:50Z

Still getting this error on RTX 4090 with Cuda 11.7 on Ubuntu 22.04, any recommendations?

ptrblck · 2023-02-04T06:43:45Z

@pranavmalikk Yes, please use the nightly binaries with CUDA 11.8.

pranavmalikk · 2023-02-04T23:29:55Z

@pranavmalikk Yes, please use the nightly binaries with CUDA 11.8.

I'm still getting this on CUDA 11.8:

----> 1 torch.fft.rfft(torch.randn(1000).cuda())

RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

ptrblck · 2023-02-05T01:35:02Z

Could you post the output of torch.__version__ as well as python -m torch.utils.collect_env, please, as I cannot reproduce the error in the CUDA 11.8 nightly binaries anymore:

import torch
print(torch.__version__)
out = torch.fft.rfft(torch.randn(1000).cuda())
print(out.sum())

with 11.7 it fails as reported:

python tmp.pt 
2.0.0.dev20230204+cu117
Traceback (most recent call last):
  File "tmp.pt", line 3, in <module>
    torch.fft.rfft(torch.randn(1000).cuda())
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

Update the nightlies to the 11.8 build:

pip uninstall torch -y
Found existing installation: torch 2.0.0.dev20230204+cu117
Uninstalling torch-2.0.0.dev20230204+cu117:
  Successfully uninstalled torch-2.0.0.dev20230204+cu117
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
...
Successfully installed torch-2.0.0.dev20230204+cu118

and it works:

python tmp.py 
2.0.0.dev20230204+cu118
tensor(670.6870+11.1756j, device='cuda:0')

pranavmalikk · 2023-02-05T05:38:56Z

Could you post the output of torch.__version__ as well as python -m torch.utils.collect_env, please, as I cannot reproduce the error in the CUDA 11.8 nightly binaries anymore:

import torch
print(torch.__version__)
out = torch.fft.rfft(torch.randn(1000).cuda())
print(out.sum())

with 11.7 it fails as reported:

python tmp.pt 
2.0.0.dev20230204+cu117
Traceback (most recent call last):
  File "tmp.pt", line 3, in <module>
    torch.fft.rfft(torch.randn(1000).cuda())
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

Update the nightlies to the 11.8 build:

pip uninstall torch -y
Found existing installation: torch 2.0.0.dev20230204+cu117
Uninstalling torch-2.0.0.dev20230204+cu117:
  Successfully uninstalled torch-2.0.0.dev20230204+cu117
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
...
Successfully installed torch-2.0.0.dev20230204+cu118

and it works:

python tmp.py 
2.0.0.dev20230204+cu118
tensor(670.6870+11.1756j, device='cuda:0')

Thank you for the help, it works now as i mistakenly did 'pip install torchaudio' so it set me back to an older version of torch. I fixed this by installing the nightly version of torchaudio

bensonbs · 2023-12-28T01:22:13Z

Due to package dependency issues, I am limited to using versions of PyTorch that are below 2.0.0. I understand that PyTorch 1.13.1 supports up to CUDA 11.7. Could you kindly advise if there are any alternative solutions apart from upgrading to CUDA 11.8?

soulitzer added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: fft labels Oct 31, 2022

mruberry added high priority and removed triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 31, 2022

pytorch-bot bot added the triage review label Oct 31, 2022

malfet added the module: windows Windows support for PyTorch label Nov 7, 2022

cpuhrsch added the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label Nov 7, 2022

cpuhrsch added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed high priority triage review labels Nov 7, 2022

malfet added high priority and removed needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user module: windows Windows support for PyTorch labels Nov 14, 2022

pytorch-bot bot added the triage review label Nov 14, 2022

malfet added this to the 1.13.1 milestone Nov 14, 2022

malfet added the module: third_party label Nov 14, 2022

malfet removed the triage review label Nov 14, 2022

zou3519 mentioned this issue Nov 28, 2022

cuFFT error raised after trying FFT #89556

Closed

atalman assigned malfet Nov 28, 2022

syed-ahmed mentioned this issue Nov 30, 2022

Adds more nvidia pypi dependencies #89944

Closed

malfet removed the high priority label Dec 19, 2022

pytorchmergebot closed this as completed in 66b3325 Jan 4, 2023

ptrblck mentioned this issue Jan 11, 2023

CUDA not available on RTX 4080 #91973

Closed

Adel-Moumen mentioned this issue Feb 16, 2023

[Bug]: SpeechBrain compatibility issue? speechbrain/speechbrain#1839

Closed

pathnirvana mentioned this issue Mar 5, 2023

[Bug] cuFFT error coqui-ai/TTS#2365

Closed

dhstsw mentioned this issue Apr 1, 2023

CUFFT_INTERNAL_ERROR on RTX 4090 voicepaw/so-vits-svc-fork#200

Closed

aleksandr-smechov mentioned this issue Apr 3, 2023

CUFFT_INTERNAL_ERROR error on WSL + Docker Desktop Wordcab/wordcab-transcribe#13

Closed

MaxMax2016 mentioned this issue Apr 10, 2023

这个是显存不够吗？ PlayVoice/lora-svc#32

Open

tijszwinkels mentioned this issue May 16, 2023

CuFFT issue with Cuda 11.7 on RTX6000 ADA m-bain/whisperX#254

Closed

realCrush mentioned this issue May 30, 2023

Build Custom CUDA Kernel with PyTorch 2.x state-spaces/s4#106

Closed

ThreepE0 mentioned this issue Oct 25, 2023

Automatic Training - Google Collab Errors dscripka/openWakeWord#70

Closed

ei23fxg mentioned this issue Dec 3, 2023

No training possible on RTX 4090: CUFFT_INTERNAL_ERROR with torch < 2 (WSL2 & native Ubuntu Linux) rhasspy/piper#295

Open

violetdenim mentioned this issue Jan 3, 2024

Changes for torch2.1 wavmark/wavmark#6

Merged

xaroth8088 mentioned this issue Jan 4, 2024

[ERROR] Get target tone color error cuFFT error: CUFFT_INTERNAL_ERROR myshell-ai/OpenVoice#41

Open

albertz mentioned this issue Feb 7, 2024

torch.stft sometimes raises RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR on low free memory #119420

Open

countrsignal mentioned this issue Mar 12, 2024

Sanity Checking DataLoader error HazyResearch/hyena-dna#42

Open

rlee3359 mentioned this issue Apr 1, 2024

CUDA/PyTorch Version Issues Junyi42/GeoAware-SC#1

Open

shine-xia mentioned this issue Apr 10, 2024

CUFFT_INTERNAL_ERROR on RTX4090 myshell-ai/MeloTTS#96

Open

rsxdalv mentioned this issue Apr 30, 2024

RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR rsxdalv/tts-generation-webui#306

Open

rbs-sci mentioned this issue May 4, 2024

Template picking crashes with CUFFT_INTERNAL_ERROR warpem/warp#44

Closed

SamuelBroughton mentioned this issue May 31, 2024

CUFFT_INTERNAL_ERROR for certain hardware microsoft/NOTSOFAR1-Challenge#40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUFFT_INTERNAL_ERROR on RTX 4090 #88038

CUFFT_INTERNAL_ERROR on RTX 4090 #88038

Yujia-Yan commented Oct 29, 2022 •

edited by pytorch-bot bot

Loading

malfet commented Nov 7, 2022

malfet commented Nov 7, 2022

Yujia-Yan commented Nov 7, 2022

ngimel commented Nov 7, 2022

Blackhex commented Nov 7, 2022

ptrblck commented Nov 8, 2022

malfet commented Nov 14, 2022

malfet commented Nov 28, 2022

malfet commented Nov 29, 2022 •

edited

Loading

malfet commented Dec 19, 2022

pranavmalikk commented Feb 4, 2023 •

edited

Loading

ptrblck commented Feb 4, 2023

pranavmalikk commented Feb 4, 2023 •

edited

Loading

ptrblck commented Feb 5, 2023

pranavmalikk commented Feb 5, 2023 •

edited

Loading

bensonbs commented Dec 28, 2023

CUFFT_INTERNAL_ERROR on RTX 4090 #88038

CUFFT_INTERNAL_ERROR on RTX 4090 #88038

Comments

Yujia-Yan commented Oct 29, 2022 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

malfet commented Nov 7, 2022

malfet commented Nov 7, 2022

Yujia-Yan commented Nov 7, 2022

ngimel commented Nov 7, 2022

Blackhex commented Nov 7, 2022

ptrblck commented Nov 8, 2022

malfet commented Nov 14, 2022

malfet commented Nov 28, 2022

malfet commented Nov 29, 2022 • edited Loading

malfet commented Dec 19, 2022

pranavmalikk commented Feb 4, 2023 • edited Loading

ptrblck commented Feb 4, 2023

pranavmalikk commented Feb 4, 2023 • edited Loading

ptrblck commented Feb 5, 2023

pranavmalikk commented Feb 5, 2023 • edited Loading

bensonbs commented Dec 28, 2023

Yujia-Yan commented Oct 29, 2022 •

edited by pytorch-bot bot

Loading

malfet commented Nov 29, 2022 •

edited

Loading

pranavmalikk commented Feb 4, 2023 •

edited

Loading

pranavmalikk commented Feb 4, 2023 •

edited

Loading

pranavmalikk commented Feb 5, 2023 •

edited

Loading