You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, this is something very strange that has been happening to me more and more frequently.
When fine tuning a tacotron model, the end result sounds distorted.
I have trained it for more than 5k steps (as indicated in the notebook I am using and will leave below).
The dataset is about 40 minutes with very good quality audios. all are in 22 khz, mono, 16 bits.
Tacotron could train it without problems, but HIFI GAN could not.
Hello, this is something very strange that has been happening to me more and more frequently.
When fine tuning a tacotron model, the end result sounds distorted.
I have trained it for more than 5k steps (as indicated in the notebook I am using and will leave below).
The dataset is about 40 minutes with very good quality audios. all are in 22 khz, mono, 16 bits.
Tacotron could train it without problems, but HIFI GAN could not.
This is the notebook: https://colab.research.google.com/github/justinjohn0306/FakeYou-Tacotron2-Notebook/blob/main/FakeYou_HiFi_GAN_Fine_Tuning.ipynb?authuser=1#scrollTo=teF-Ut8Z7Gjp
This is the demo distorted audio: https://drive.google.com/file/d/1cuqfWGS1JmSMNlcnyd_PaH3Atv-PPxvB/view?usp=share_link
This is the original dataset audio sample: https://drive.google.com/file/d/1ReqoxwHSRfu3D186jhQynCJXPQ1vhWZx/view?usp=share_link
This is what I get when I synthesize:
Thank you!
The text was updated successfully, but these errors were encountered: