Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Error #1

Closed
thanhdo99 opened this issue Jun 23, 2021 · 8 comments
Closed

Training Error #1

thanhdo99 opened this issue Jun 23, 2021 · 8 comments

Comments

@thanhdo99
Copy link

In this case, , i ran the scripts python3 train.py -p config/vietnam/preprocess.yaml -m config/vietnam/model.yaml -t config/vietnam/train.yaml
File "train.py", line 199, in
main(args, configs)
File "train.py", line 85, in main
losses = Loss(batch, output)
File "/home/thanhdo/envs/diffsinger_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/thanhdo/Documents/DiffSinger/model/loss.py", line 69, in forward
log_duration_targets = log_duration_targets.masked_select(src_masks)
RuntimeError: The size of tensor a (39) must match the size of tensor b (136) at non-singleton dimension 1

Screen Shot 2021-06-23 at 3 56 10 PM

@keonlee9420
Copy link
Owner

keonlee9420 commented Jun 23, 2021

Hi @thanhdo99 , I didn't check the parallel training yet. If you have several GPUs inserted, please specify a single GPU by
CUDA_VISIBLE_DEVICES= as follows:

CUDA_VISIBLE_DEVICES=0 python3 train.py -p config/vietnam/preprocess.yaml -m config/vietnam/model.yaml -t config/vietnam/train.yaml
, which will run the code on the first GPU.

@thanhdo99
Copy link
Author

thanhdo99 commented Jun 23, 2021

Hi @keonlee9420
i have tried as much as i can however, it still appears to be the same old problem as the last time
Screen Shot 2021-06-23 at 5 54 04 PM

@keonlee9420
Copy link
Owner

I see. Can you print and share the shape of log_duration_targets and src_masks? And what's the option for config["preprocessing"]["pitch"]["feature"] and config["preprocessing"]["energy"]["feature"]?

@thanhdo99
Copy link
Author

Here, thanks a lot bro

Screen Shot 2021-06-24 at 2 53 18 PM

Screen Shot 2021-06-24 at 2 42 18 PM

@keonlee9420
Copy link
Owner

Umm, did you follow the process described in README.md for preprocessing your dataset? If so, your src_masks should have the same shape as log_duration_targets. Also, it would be helpful to check whether your data loader works correctly (in other words, duration information and input phoneme sequence should be aligned in each minibatch).

@thanhdo99
Copy link
Author

when I used the Fastspeech 2 model (https://github.com/ming024/FastSpeech2) the result was noised like this below
hifi-gan https://drive.google.com/file/d/1QAWD9f9HYX1dNlojjUXtxi9p0QRAygQ6/view?usp=sharing
mel-gan https://drive.google.com/file/d/1tWBhH8ekGUYCX8YzH8dugi1KMofM7vfH/view?usp=sharing

And I wonder if your diffsinger model could help me fix this problem?
Thanks

@keonlee9420
Copy link
Owner

Did you also train the vocoders under the dataset? If not, I recommend to train the vocoder too. I think the noise can be from either 1. an insufficient amount of dataset or 2. mismatch between synthesizer and vocoder. If both problems are not fixed, then the diffsinger model may show similar issues since it is a synthesizer and have 2 stage pipeline (synthesizer-vocoder).

@keonlee9420
Copy link
Owner

Close the issue due to the inactivity. you can reopen it anytime if you have issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants