You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But i got poor quality for synthesize japanese data.
My data that has about 12hrs audios and 16 speakers was extracted from 3 visual novels Riddle Joker, CafeStella, and SenrenBanka.
But, when i finetune text2sementic and vits_decoder, the models are not converged at all.
Is 12 hours of data not enough to fine-tune?
The text was updated successfully, but these errors were encountered:
Hi, Thank you for great work.
But i got poor quality for synthesize japanese data.
My data that has about 12hrs audios and 16 speakers was extracted from 3 visual novels Riddle Joker, CafeStella, and SenrenBanka.
But, when i finetune text2sementic and vits_decoder, the models are not converged at all.
Is 12 hours of data not enough to fine-tune?
The text was updated successfully, but these errors were encountered: