-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUFFT_INTERNAL_ERROR on RTX 4090 #88038
Comments
Just to clarify, this happen only in Windows Subsystem for Linux or elsewhere as well? |
@ptrblck can you please confirm that this indeed happens when with 4090 on Linux or only in WSL config? |
Actually I am using ubuntu Server 22.04. |
So in this case it looks like cufft library doesn't support forward compatibility guarantee (you can run code compiled with older toolkit version, as long as driver on the system supports the new hardware). cc @ptrblck, and we should start producing 11.8 nightlies. |
I don't have 4090 available so I can only add that it is not reproducible on Windows 11 and Ubuntu WSL with 3080. |
Yes, this is a cuFFT error which is also visible on Linux.
Also yes, and I've already started with its bringup, e.g. in: pytorch/builder#1186 |
Some updates:
|
Let's create a frankenbuild for cuda-11.7 for nightlies and see what will happen |
First thing, that worries me a lot is 2x binary size increase for cuda-11.8 and nvprune does not help much:
Considering that, I'm not sure if it will be safe to include it as an update in 1.13.1 |
Removing high priority label, as this is a bug in a 3rd party library and there were big changes between cufft in 11.7 and 11.8 |
This PR adds more nvidia pypi dependencies for cuda 11.7 wheel. Additionally, it pins cufft version to 10.9.0.58 to resolve pytorch#88038 Depends on: pytorch/builder#1196 Pull Request resolved: pytorch#89944 Approved by: https://github.com/atalman
Still getting this error on RTX 4090 with Cuda 11.7 on Ubuntu 22.04, any recommendations? |
@pranavmalikk Yes, please use the nightly binaries with CUDA 11.8. |
I'm still getting this on CUDA 11.8: ----> 1 torch.fft.rfft(torch.randn(1000).cuda()) RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR |
Could you post the output of import torch
print(torch.__version__)
out = torch.fft.rfft(torch.randn(1000).cuda())
print(out.sum()) with 11.7 it fails as reported: python tmp.pt
2.0.0.dev20230204+cu117
Traceback (most recent call last):
File "tmp.pt", line 3, in <module>
torch.fft.rfft(torch.randn(1000).cuda())
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR Update the nightlies to the 11.8 build:
and it works: python tmp.py
2.0.0.dev20230204+cu118
tensor(670.6870+11.1756j, device='cuda:0') |
Thank you for the help, it works now as i mistakenly did 'pip install torchaudio' so it set me back to an older version of torch. I fixed this by installing the nightly version of torchaudio |
Due to package dependency issues, I am limited to using versions of PyTorch that are below 2.0.0. I understand that PyTorch 1.13.1 supports up to CUDA 11.7. Could you kindly advise if there are any alternative solutions apart from upgrading to CUDA 11.8? |
🐛 Describe the bug
There is a discussion on https://forums.developer.nvidia.com/t/bug-ubuntu-on-wsl2-rtx4090-related-cufft-runtime-error/230883/7 .
Versions
Using pytorch installed with
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
cc @ezyang @gchanan @zou3519 @peterjc123 @mszhanyi @skyline75489 @nbcsm @ngimel @mruberry @peterbell10
The text was updated successfully, but these errors were encountered: