Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training error #8

Closed
MingZJU opened this issue Nov 8, 2021 · 6 comments
Closed

training error #8

MingZJU opened this issue Nov 8, 2021 · 6 comments

Comments

@MingZJU
Copy link

MingZJU commented Nov 8, 2021

Thanks for your sharing!

I tried both naive and main branches using your checkpoints, it seems the former one is much better. So I trained AISHELL3 models with small changes on your code and the synthesized waves are good for me.

However when I add my own data into AISHELL3, some error occurred:
Training: 0%| | 3105/900000 [32:05<154:31:49, 1.61it/s]
Epoch 2: 69%|██████████████████████▏ | 318/459 [05:02<02:14, 1.05it/s]
File "train.py", line 211, in
main(args, configs)
File "train.py", line 87, in main
output = model(*(batch[2:]))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/StyleSpeech-naive/model/StyleSpeech.py", line 83, in forward
) = self.variance_adaptor(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/StyleSpeech-naive/model/modules.py", line 404, in forward
x = x + pitch_embedding
RuntimeError: The size of tensor a (52) must match the size of tensor b (53) at non-singleton dimension 1

I only replaced two speakers and preprocessed data the same as the in readme.

Do you have any advice for this error ? Any suggestion is appreciated.

@MingZJU
Copy link
Author

MingZJU commented Nov 9, 2021

Perhaps because of the lexicon, I'll fix and try again.

@keonlee9420
Copy link
Owner

Hi @MingZJU , sorry for the late response. Thanks for sharing your experiments with AISHELL3 dataset. Hope to see it with PR. For the error you mentioned, I think it's from preprocessing stage. Please double-check that the length of the input audio and all the other audio-related features have the same length during data loading.

@MingZJU
Copy link
Author

MingZJU commented Nov 9, 2021

Thanks @keonlee9420 . I will check the audio and features and try again recently.

@MingZJU
Copy link
Author

MingZJU commented Nov 17, 2021

Solved. The error was caused by mfa. I installed mfa following the official instructions and it works good.

@keonlee9420
Copy link
Owner

Great to hear that! thanks for sharing. If you'd like to share your experience further, then please make PR with it. It will be helpful for all users who need Chinese dataset.

@sirius0503
Copy link

Solved. The error was caused by mfa. I installed mfa following the official instructions and it works good.

@MingZJU : I am getting the same error, can you explain how did you solve it? I am using conda installed mfa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants