Do multigpu training with weight sampler work with vits in this repo. #103

mrakotos · 2024-10-12T10:16:26Z

mrakotos
Oct 12, 2024

It was broken for years in the orginal coqui repo. Thanks

Oct 17, 2024

Hi, I have tested multigpu training and it does work if you dont use batch weighed sampler, and use accelerate set to true. I did not encounter any issue with ljspeech. But when I try to train on a larger dataset (libritts) I got nccl watchdog timout issues, I have tryed setting os.environ["NCCL_BLOCKING_WAIT"] = "1" but without success. How to disable timeout as precomputing the phoneme take almost 45 minutes.
P.S The formatter for libritts dont allow to continue training if there is missing audio in libritts. I have made some modification to it if you want I can PR.
Thank you!

View full answer

eginhard · 2024-10-13T20:28:49Z

eginhard
Oct 13, 2024
Maintainer

Can you test it and let me know? I haven't made any specific changes in that area, but happy to merge any fixes.

3 replies

mrakotos Oct 14, 2024
Author

OK, I'll test it later. Thank you

mrakotos Oct 17, 2024
Author

Hi, I have tested multigpu training and it does work if you dont use batch weighed sampler, and use accelerate set to true. I did not encounter any issue with ljspeech. But when I try to train on a larger dataset (libritts) I got nccl watchdog timout issues, I have tryed setting os.environ["NCCL_BLOCKING_WAIT"] = "1" but without success. How to disable timeout as precomputing the phoneme take almost 45 minutes.
P.S The formatter for libritts dont allow to continue training if there is missing audio in libritts. I have made some modification to it if you want I can PR.
Thank you!

Answer selected by mrakotos

eginhard Oct 18, 2024
Maintainer

Thanks for testing! You can try this to increase the timeout. Otherwise you can also do the phoneme precomputation step separately on a CPU. It only needs to be done once and the outputs are then saved and reused.

Sure, PRs are always welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do multigpu training with weight sampler work with vits in this repo. #103

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Do multigpu training with weight sampler work with vits in this repo. #103

Uh oh!

mrakotos Oct 12, 2024

Replies: 1 comment · 3 replies

Uh oh!

eginhard Oct 13, 2024 Maintainer

Uh oh!

mrakotos Oct 14, 2024 Author

Uh oh!

mrakotos Oct 17, 2024 Author

Uh oh!

eginhard Oct 18, 2024 Maintainer

mrakotos
Oct 12, 2024

Replies: 1 comment 3 replies

eginhard
Oct 13, 2024
Maintainer

mrakotos Oct 14, 2024
Author

mrakotos Oct 17, 2024
Author

eginhard Oct 18, 2024
Maintainer