Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running yaapt on-the-fly extremely slows the training #4

Closed
seungwonpark opened this issue Nov 18, 2021 · 4 comments
Closed

Running yaapt on-the-fly extremely slows the training #4

seungwonpark opened this issue Nov 18, 2021 · 4 comments

Comments

@seungwonpark
Copy link

Hi, thanks for kindly releasing the code for the paper. (Also congratulations on the acceptance in INTERSPEECH!)

While I was running the code, I encountered a significant issue - pYAAPT.yaapt extremely slow the training.
Here's how I found out such a bottleneck on speed:

  • I tried to run train_f0_vq.py as specified in README.
  • However, training was too slow; looks like we need to train an f0 vq model for 400000 steps, but a single epoch (about 700 steps) took 2657 seconds to run. GPU util was really low, and CPUs were running like crazy. (My server has 3080 Ti with 64 CPU cores.)
  • I suspected pYAAPT.yaapt to be a cause for this. To test that, I forked a repository to add a caching functionality: https://github.com/seungwonpark/speech-resynthesis
  • After that, a single epoch after the first epoch (for an initial caching) took only 36 seconds.

So my question is, how did you manage to run yaapt on-the-fly without caching? Though I succeeded in training the model fast enough, I shall need to disable caching again since it requires the _sample_interval method to sample the same interval for each audio (i.e. disabling the data augmentation via randomly choosing the interval).

@seungwonpark
Copy link
Author

ping @adampolyak

@adampolyak
Copy link
Contributor

Hi,

In our experiments we were able to finish 1 epoch in ~760 seconds, on the VCTK dataset. It might be possible that our naive implementation ran faster on our hardware.

Going forward, it seems that adding caching speeds up training! Another option is to add a preprocessing step that will extract pitch values from all wav samples + update the dataset to load preprocessed values instead of calculating them on the fly.

Happy to review and add any pull-requests!

@seungwonpark
Copy link
Author

Got it. Thanks for your reply!

Closing this issue.

@aereobert
Copy link

Hi, I think you might be training on some shared platform with a weak cpu. E.g. Google Colab container. When training on Colab, I got the same time as yours does. However, when I train on my 1080ti, I got the same training speed as the author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants