Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUFFT_INTERNAL_ERROR on RTX4090 #96

Open
shine-xia opened this issue Apr 10, 2024 · 4 comments
Open

CUFFT_INTERNAL_ERROR on RTX4090 #96

shine-xia opened this issue Apr 10, 2024 · 4 comments

Comments

@shine-xia
Copy link

shine-xia commented Apr 10, 2024

Requirements.txt in meloTTS:torch<2.0
but the codes below can only be valid in torch version higher than 1.13.0, So my choices are torch 13.0/13.1.
All torch 13.0/13.1 packages are built against cuda 11.6/11.7.

torch.backends.cudnn.benchmark = True
torch.backends.cuda.sdp_kernel("flash")
torch.backends.cuda.enable_flash_sdp(True)
# torch.backends.cuda.enable_mem_efficient_sdp(
#     True
# )  # Not available if torch version is lower than 2.0
torch.backends.cuda.enable_math_sdp(True)

Now that I have a RTX4090, I can't train meloTTS on it with torch 1.13.1 for a cuda-bug which is fixed in cuda 11.8:
pytorch/pytorch#88038

So I hope you the developers of MeloTTS could take the torch version up to 2.0 or higher.

@shine-xia
Copy link
Author

But it turns out to run successfully with some warnings on torch 2.0.1...

@MissingTwins
Copy link

MissingTwins commented Apr 10, 2024

1. open `MeloTTS\requirements.txt'
    Change `torch<2.0` to `torch`
    remove extra `mecab-python3==1.0.5`
    Change only remained `mecab-python3==1.0.5` to `mecab-python3`
    save changes
2. go to melo virtual Env  
    run `cd MeloTTS`   
    run `pip install -e .`  
    run `python -m unidic download`  

Should work with CUDA 12.2, but not CUDA 12.3

$ nvidia-smi 
Wed Apr 10 23:45:53 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  | 00000000:0A:00.0 Off |                  N/A |
|  0%   26C    P8              19W / 275W |      1MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

@dennis-wr
Copy link

Here is my method (Ubuntu 2204) :

  1. Uninstall CUDA completely.
$ sudo /usr/local/cuda-11.7/bin/cuda-uninstaller
$ sudo /usr/bin/nvidia-uninstall

$ sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
$ sudo apt-get --purge remove "*nvidia*"
$ sudo apt-get autoremove
$ sudo apt-get autoclean
$ sudo rm -rf /usr/local/cuda*

$ sudo dpkg -r cuda
$ sudo dpkg -r $(dpkg -l | grep '^ii  cudnn' | awk '{print $2}')

$ sudo apt-get update
  1. Install Nvidia Drivers.
$ ubuntu-drivers devices
$ sudo apt install nvidia-driver-525 (After checking the desired version using the above command)

$ sudo ubuntu-drivers autoinstall (OR use this command)

$ sudo reboot
  1. Install CUDA Toolkit 11.8
$ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
$ sudo sh cuda_11.8.0_520.61.05_linux.run (Uncheck Driver)

$ vi ~/.bashrc
export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

$ source ~/.bashrc
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
  1. Install cuDDN 8 (Maybe Optional?)
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.7.29/cudnn-local-8AE81B24-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8 libcudnn8-dev libcudnn8-samples
  1. Install PyTorch for CUDA 11.8
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

You may need to modify your code because of warnings or errors.

@johnPertoft
Copy link

Same as #80 for visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants