Training Error #1

thanhdo99 · 2021-06-23T08:57:00Z

In this case, , i ran the scripts python3 train.py -p config/vietnam/preprocess.yaml -m config/vietnam/model.yaml -t config/vietnam/train.yaml
File "train.py", line 199, in
main(args, configs)
File "train.py", line 85, in main
losses = Loss(batch, output)
File "/home/thanhdo/envs/diffsinger_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/thanhdo/Documents/DiffSinger/model/loss.py", line 69, in forward
log_duration_targets = log_duration_targets.masked_select(src_masks)
RuntimeError: The size of tensor a (39) must match the size of tensor b (136) at non-singleton dimension 1

keonlee9420 · 2021-06-23T09:00:37Z

Hi @thanhdo99 , I didn't check the parallel training yet. If you have several GPUs inserted, please specify a single GPU by
CUDA_VISIBLE_DEVICES= as follows:

CUDA_VISIBLE_DEVICES=0 python3 train.py -p config/vietnam/preprocess.yaml -m config/vietnam/model.yaml -t config/vietnam/train.yaml
, which will run the code on the first GPU.

thanhdo99 · 2021-06-23T11:03:32Z

Hi @keonlee9420
i have tried as much as i can however, it still appears to be the same old problem as the last time

keonlee9420 · 2021-06-23T14:22:22Z

I see. Can you print and share the shape of log_duration_targets and src_masks? And what's the option for config["preprocessing"]["pitch"]["feature"] and config["preprocessing"]["energy"]["feature"]?

thanhdo99 · 2021-06-24T07:56:27Z

Here, thanks a lot bro

keonlee9420 · 2021-06-24T09:03:07Z

Umm, did you follow the process described in README.md for preprocessing your dataset? If so, your src_masks should have the same shape as log_duration_targets. Also, it would be helpful to check whether your data loader works correctly (in other words, duration information and input phoneme sequence should be aligned in each minibatch).

thanhdo99 · 2021-06-24T11:27:19Z

when I used the Fastspeech 2 model (https://github.com/ming024/FastSpeech2) the result was noised like this below
hifi-gan https://drive.google.com/file/d/1QAWD9f9HYX1dNlojjUXtxi9p0QRAygQ6/view?usp=sharing
mel-gan https://drive.google.com/file/d/1tWBhH8ekGUYCX8YzH8dugi1KMofM7vfH/view?usp=sharing

And I wonder if your diffsinger model could help me fix this problem?
Thanks

keonlee9420 · 2021-06-27T11:20:29Z

Did you also train the vocoders under the dataset? If not, I recommend to train the vocoder too. I think the noise can be from either 1. an insufficient amount of dataset or 2. mismatch between synthesizer and vocoder. If both problems are not fixed, then the diffsinger model may show similar issues since it is a synthesizer and have 2 stage pipeline (synthesizer-vocoder).

keonlee9420 · 2021-07-22T04:13:52Z

Close the issue due to the inactivity. you can reopen it anytime if you have issues.

keonlee9420 closed this as completed Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Error #1

Training Error #1

thanhdo99 commented Jun 23, 2021

keonlee9420 commented Jun 23, 2021 •

edited

thanhdo99 commented Jun 23, 2021 •

edited

keonlee9420 commented Jun 23, 2021

thanhdo99 commented Jun 24, 2021

keonlee9420 commented Jun 24, 2021

thanhdo99 commented Jun 24, 2021

keonlee9420 commented Jun 27, 2021

keonlee9420 commented Jul 22, 2021

Training Error #1

Training Error #1

Comments

thanhdo99 commented Jun 23, 2021

keonlee9420 commented Jun 23, 2021 • edited

thanhdo99 commented Jun 23, 2021 • edited

keonlee9420 commented Jun 23, 2021

thanhdo99 commented Jun 24, 2021

keonlee9420 commented Jun 24, 2021

thanhdo99 commented Jun 24, 2021

keonlee9420 commented Jun 27, 2021

keonlee9420 commented Jul 22, 2021

keonlee9420 commented Jun 23, 2021 •

edited

thanhdo99 commented Jun 23, 2021 •

edited