CUFFT_INTERNAL_ERROR on RTX4090 #96

shine-xia · 2024-04-10T07:55:51Z

Requirements.txt in meloTTS：torch<2.0
but the codes below can only be valid in torch version higher than 1.13.0, So my choices are torch 13.0/13.1.
All torch 13.0/13.1 packages are built against cuda 11.6/11.7.

torch.backends.cudnn.benchmark = True
torch.backends.cuda.sdp_kernel("flash")
torch.backends.cuda.enable_flash_sdp(True)
# torch.backends.cuda.enable_mem_efficient_sdp(
#     True
# )  # Not available if torch version is lower than 2.0
torch.backends.cuda.enable_math_sdp(True)

Now that I have a RTX4090, I can't train meloTTS on it with torch 1.13.1 for a cuda-bug which is fixed in cuda 11.8:
pytorch/pytorch#88038

So I hope you the developers of MeloTTS could take the torch version up to 2.0 or higher.

The text was updated successfully, but these errors were encountered:

shine-xia · 2024-04-10T08:09:19Z

But it turns out to run successfully with some warnings on torch 2.0.1...

MissingTwins · 2024-04-10T14:38:12Z

1. open `MeloTTS\requirements.txt'
    Change `torch<2.0` to `torch`
    remove extra `mecab-python3==1.0.5`
    Change only remained `mecab-python3==1.0.5` to `mecab-python3`
    save changes
2. go to melo virtual Env  
    run `cd MeloTTS`   
    run `pip install -e .`  
    run `python -m unidic download`

Should work with CUDA 12.2, but not CUDA 12.3

$ nvidia-smi 
Wed Apr 10 23:45:53 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  | 00000000:0A:00.0 Off |                  N/A |
|  0%   26C    P8              19W / 275W |      1MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

dennis-wr · 2024-04-11T11:48:17Z

Here is my method (Ubuntu 2204) :

Uninstall CUDA completely.

$ sudo /usr/local/cuda-11.7/bin/cuda-uninstaller
$ sudo /usr/bin/nvidia-uninstall

$ sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
$ sudo apt-get --purge remove "*nvidia*"
$ sudo apt-get autoremove
$ sudo apt-get autoclean
$ sudo rm -rf /usr/local/cuda*

$ sudo dpkg -r cuda
$ sudo dpkg -r $(dpkg -l | grep '^ii  cudnn' | awk '{print $2}')

$ sudo apt-get update

Install Nvidia Drivers.

$ ubuntu-drivers devices
$ sudo apt install nvidia-driver-525 (After checking the desired version using the above command)

$ sudo ubuntu-drivers autoinstall (OR use this command)

$ sudo reboot

Install CUDA Toolkit 11.8

$ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
$ sudo sh cuda_11.8.0_520.61.05_linux.run (Uncheck Driver)

$ vi ~/.bashrc
export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

$ source ~/.bashrc
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Install cuDDN 8 (Maybe Optional?)

sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.7.29/cudnn-local-8AE81B24-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8 libcudnn8-dev libcudnn8-samples

Install PyTorch for CUDA 11.8

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

You may need to modify your code because of warnings or errors.

johnPertoft · 2024-04-12T13:01:54Z

Same as #80 for visibility.

jadechip mentioned this issue Jun 13, 2024

Request for PR review: Add support for Thai language #120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUFFT_INTERNAL_ERROR on RTX4090 #96

CUFFT_INTERNAL_ERROR on RTX4090 #96

shine-xia commented Apr 10, 2024 •

edited

Loading

shine-xia commented Apr 10, 2024

MissingTwins commented Apr 10, 2024 •

edited

Loading

dennis-wr commented Apr 11, 2024

johnPertoft commented Apr 12, 2024

CUFFT_INTERNAL_ERROR on RTX4090 #96

CUFFT_INTERNAL_ERROR on RTX4090 #96

Comments

shine-xia commented Apr 10, 2024 • edited Loading

shine-xia commented Apr 10, 2024

MissingTwins commented Apr 10, 2024 • edited Loading

dennis-wr commented Apr 11, 2024

johnPertoft commented Apr 12, 2024

shine-xia commented Apr 10, 2024 •

edited

Loading

MissingTwins commented Apr 10, 2024 •

edited

Loading