Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow tacotron training 1step/sec on AWS p3.2xlarge (Tesla V100) #228

Open
ScottyBauer opened this issue Feb 26, 2021 · 1 comment
Open

Comments

@ScottyBauer
Copy link

ScottyBauer commented Feb 26, 2021

I'm fooling around with this project and I'm getting throughput I think is too slow, which leads me to believe I may have mis-configured something or there are other issues.

I'm reusing the pre-trained models with my own custom audio of ~750 audio clips ranging from 4-10 seconds.

I'm using:
PyTorch 1.7.1 with Python3.7 (CUDA 11.0 and Intel MKL)

In order to get the code to run properly I had to apply the fix from this bug (not sure if this is relevant just want to give all details):
#201

and I applied this pull request:
521179e

The only changes I've made to hyperparams is changing peak_norm from false to true:

peak_norm =True                   # Normalise to the peak of each wav file  

and setting my paths.

I can confirm that it is using the GPU (at least GPU memory), but I've never seen nvidia-smi show utilization above 38%:

Things I've tried:
upping the batch size in hyperparams, also the learning rate, up to 64, which didn't help.

here is nvidia-smi output:

Fri Feb 26 00:40:21 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   35C    P0    44W / 300W |   4461MiB / 16160MiB |     10%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      6844      C   python                           4459MiB |
+-----------------------------------------------------------------------------+

what it's up to:

Trainable Parameters: 11.088M
Restoring from latest checkpoint...
Loading latest weights: /home/ubuntu/WaveRNN/checkpoints/ljspeech_lsa_smooth_attention.tacotron/latest_weights.pyt
/home/ubuntu/WaveRNN/models/tacotron.py:308: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than tensor.new_tensor(sourceTensor).
  self.decoder.r = self.decoder.r.new_tensor(value, requires_grad=False)
Loading latest optimizer state: /home/ubuntu/WaveRNN/checkpoints/ljspeech_lsa_smooth_attention.tacotron/latest_optim.pyt
+----------------+------------+---------------+------------------+
| Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) |
+----------------+------------+---------------+------------------+
|   170k Steps   |     8      |    0.0001     |        2         |
+----------------+------------+---------------+------------------+

| Epoch: 1/1869 (61/91) | Loss: 0.7363 | 0.41 steps/s | Step: 180k |

If I change some of the learning rate parameters:

(pytorch_latest_p37) ubuntu@ip-172-31-46-96:~/WaveRNN$ python train_tacotron.py 
Using device: cuda

Initialising Tacotron Model...

Trainable Parameters: 11.088M
Restoring from latest checkpoint...
Loading latest weights: /home/ubuntu/WaveRNN/checkpoints/ljspeech_lsa_smooth_attention.tacotron/latest_weights.pyt
Loading latest optimizer state: /home/ubuntu/WaveRNN/checkpoints/ljspeech_lsa_smooth_attention.tacotron/latest_optim.pyt
+----------------+------------+---------------+------------------+
| Steps with r=7 | Batch Size | Learning Rate | Outputs/Step (r) |
+----------------+------------+---------------+------------------+
|   169k Steps   |     64     |    0.0001     |        7         |
+----------------+------------+---------------+------------------+
 
| Epoch: 1/14154 (12/12) | Loss: 0.7744 | 0.91 steps/s | Step: 180k |  
| Epoch: 2/14154 (12/12) | Loss: 0.7742 | 0.94 steps/s | Step: 180k |  
| Epoch: 3/14154 (12/12) | Loss: 0.7733 | 0.92 steps/s | Step: 180k |  
| Epoch: 4/14154 (12/12) | Loss: 0.7785 | 0.93 steps/s | Step: 180k |  

and smi:

Every 1.0s: nvidia-smi                                                                                                                                                                                                                           ip-172-31-46-96: Fri Feb 26 00:50:23 2021

Fri Feb 26 00:50:23 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   40C    P0   202W / 300W |  11715MiB / 16160MiB |     33%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8291      C   python                          11713MiB |
+-----------------------------------------------------------------------------+

Let me know what other information I can provide to help debug this.

Thank you,
Scott

@ghost
Copy link

ghost commented Mar 7, 2021

It may be constrained by the disk read. Move your dataset to a faster storage, like copying it to RAM in /dev/shm.

Tesla V100 is an old GPU. For the default model size, you're going to top out around 2-3 step/sec at r=7 and 1 step/sec at r=2. It will be faster if you discard your longer utterances.

You may wish to check out CorentinJ/Real-Time-Voice-Cloning. It uses the same Tacotron and WaveRNN models as this repo. Once you get the hang of Tacotron (synthesizer) training, check out CorentinJ/Real-Time-Voice-Cloning#437 as it describes what you are trying to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant