New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training error #8
Comments
Perhaps because of the lexicon, I'll fix and try again. |
Hi @MingZJU , sorry for the late response. Thanks for sharing your experiments with AISHELL3 dataset. Hope to see it with PR. For the error you mentioned, I think it's from preprocessing stage. Please double-check that the length of the input audio and all the other audio-related features have the same length during data loading. |
Thanks @keonlee9420 . I will check the audio and features and try again recently. |
Solved. The error was caused by mfa. I installed mfa following the official instructions and it works good. |
Great to hear that! thanks for sharing. If you'd like to share your experience further, then please make PR with it. It will be helpful for all users who need Chinese dataset. |
@MingZJU : I am getting the same error, can you explain how did you solve it? I am using conda installed |
Thanks for your sharing!
I tried both naive and main branches using your checkpoints, it seems the former one is much better. So I trained AISHELL3 models with small changes on your code and the synthesized waves are good for me.
However when I add my own data into AISHELL3, some error occurred:
Training: 0%| | 3105/900000 [32:05<154:31:49, 1.61it/s]
Epoch 2: 69%|██████████████████████▏ | 318/459 [05:02<02:14, 1.05it/s]
File "train.py", line 211, in
main(args, configs)
File "train.py", line 87, in main
output = model(*(batch[2:]))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/StyleSpeech-naive/model/StyleSpeech.py", line 83, in forward
) = self.variance_adaptor(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/StyleSpeech-naive/model/modules.py", line 404, in forward
x = x + pitch_embedding
RuntimeError: The size of tensor a (52) must match the size of tensor b (53) at non-singleton dimension 1
I only replaced two speakers and preprocessed data the same as the in readme.
Do you have any advice for this error ? Any suggestion is appreciated.
The text was updated successfully, but these errors were encountered: