problems of noisy loud audios? #58

attitudechunfeng · 2017-12-04T03:35:28Z

Hi, @r9y9, I followed the document tutorial of synthesis audios. I have tried both your DNN based and RNN based models, step by step following the python tutorial. However, when the model trained, in the test stage, i got a noisy loud audio. Can you give me some advices about this problem. In fact, i make some little changes, i split the train and test phase into different files, and both model file and parameters are saved in local. In the test stage, I just load those files from disk and go through the following test steps. Also, I don't use the test files in your tutorial, but randomly select a lab file from 'data/slt_arctic_full_data/label_state_align/' instead. Is there anything wrong with me?

r9y9 · 2017-12-04T04:11:25Z

No, everything you did seems ok to me. Could you upload code to https://gist.github.com/ or whatever so that I can reproduce?

attitudechunfeng · 2017-12-04T04:21:04Z

This is the gist link, https://gist.github.com/attitudechunfeng/58a052f18f6aa24235000cc50618e887. There're 2 files, nn.py is the training phase of the DNN based model and nn_synth.py is the test phase.

r9y9 · 2017-12-04T06:52:23Z

Thank you. I can reproduce and here is the fix:

https://gist.github.com/attitudechunfeng/58a052f18f6aa24235000cc50618e887#file-nn_synth-py-L181

replace

wavfile.write('1.wav', rate=fs, data=waveform)

with

wavfile.write('1.wav', rate=fs, data=waveform.astype(np.int16))

By the way, I noticed you save models as torch.save(model, "model.pkl"), but I think the recommended way to save model in pytorch is to use torch.save(model.state_dict(), "model.pth"). See https://discuss.pytorch.org/t/how-to-save-load-torch-models/718 for details.

attitudechunfeng · 2017-12-04T06:59:16Z

It works! I will adopt your advice about saving model, finally, many thanks!

r9y9 · 2017-12-04T07:33:41Z

You are welcome. Feel free to open new issues if you have any other problems.

attitudechunfeng · 2017-12-04T08:07:45Z

Another question, if i want to synthesis arbitrary text, how do i generate the lab file. I compared the hts full-context lab file and the one in the example. They're almost the same except each line in the example has an '[2-6]' at the end position. How could i unify them?

r9y9 · 2017-12-04T09:15:01Z

Generating full-context labels (which requires language-dependent text processor) is out of scope of the library. For this purpose, you can follow Merlin's guide to generate labels. https://github.com/CSTR-Edinburgh/merlin/tree/master/egs/build_your_own_voice/s1#prepare-labels.

See also CSTR-Edinburgh/merlin#28.

attitudechunfeng · 2017-12-06T12:00:32Z

I have met another question when i tried to adapt the rnn and dnn model to my own dataset. First, i generate full-text labs using the hts-engine front end tools, then normalize them to phone and state alignment files, and finally using the prepare_feature script to extract features. But during the phase of training acoustic model, errors occured. With dnn part, it warns as follows:
Traceback (most recent call last): File "nn.py", line 247, in <module> X_min[ty], X_max[ty], Y_mean[ty], Y_scale[ty], utt_lengths[ty]) File "nn.py", line 214, in train for x, y in dataset_loaders[phase]: File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 201, in __next__ return self._process_next_batch(batch) File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch raise batch.exc_type(batch.exc_msg) IndexError: Traceback (most recent call last): File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/disk2/wangcf/TTS/siri/virsiri/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in <listcomp> samples = collate_fn([dataset[i] for i in batch_indices]) File "nn.py", line 127, in __getitem__ x, y = self.X[idx], self.Y[idx] File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 371, in __getitem__ return self._getitem_one_sample(frame_idx) File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 362, in _getitem_one_sample return frames[frame_idx_in_focused_utterance] IndexError: index 944 is out of bounds for axis 0 with size 880

and for the rnn part, it warns a padding size problem, but the code have implied to pad to the max utt length.
Traceback (most recent call last): File "rnn.py", line 118, in <module> Y_mean[typ], Y_var[typ] = meanvar(Y[typ]["train"], utt_lengths[typ]["train"]) File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/preprocessing/generic.py", line 328, in meanvar for idx, x in enumerate(dataset): File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 245, in __getitem__ return self._getitem_one_sample(idx) File "/disk2/wangcf/TTS/nnmnkwii/nnmnkwii/datasets/__init__.py", line 234, in _getitem_one_sample len(x), self.padded_length)) RuntimeError: Num frames 1576 exceeded: 1546. Try larger value for padded_length.

any ideas about that?

r9y9 · 2017-12-06T16:53:03Z

IndexError: index 944 is out of bounds for axis 0 with size 880

Did you make sure that you did valid indexing?

RuntimeError: Num frames 1576 exceeded: 1546. Try larger value for padded_length.

Did you try to set large padded_length, e.g. 2000?

attitudechunfeng · 2017-12-07T01:58:48Z

i've figured this problem by increasing the padded_length. However, the audio trained on my own data performs poor quality. I didn't change the training parameters. Any advices to improve the quality? ps: my dataset is 16khz

r9y9 · 2017-12-07T08:15:38Z

Hard to say much without seeing code and data, but I guess:

You have small amount data
Alignments are failing
Linguistic features you use are not suited for your data. If you are using https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/questions/questions-radio_dnn_416.hed for linguistic feature extraction, it may not work well for non-radio sentences.

Also, training parameters should be turned for you data.

r9y9 closed this as completed Dec 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problems of noisy loud audios? #58

problems of noisy loud audios? #58

attitudechunfeng commented Dec 4, 2017

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 4, 2017 •

edited

Loading

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 4, 2017

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 4, 2017

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 6, 2017 •

edited

Loading

r9y9 commented Dec 6, 2017

attitudechunfeng commented Dec 7, 2017 •

edited

Loading

r9y9 commented Dec 7, 2017

problems of noisy loud audios? #58

problems of noisy loud audios? #58

Comments

attitudechunfeng commented Dec 4, 2017

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 4, 2017 • edited Loading

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 4, 2017

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 4, 2017

r9y9 commented Dec 4, 2017

attitudechunfeng commented Dec 6, 2017 • edited Loading

r9y9 commented Dec 6, 2017

attitudechunfeng commented Dec 7, 2017 • edited Loading

r9y9 commented Dec 7, 2017

attitudechunfeng commented Dec 4, 2017 •

edited

Loading

attitudechunfeng commented Dec 6, 2017 •

edited

Loading

attitudechunfeng commented Dec 7, 2017 •

edited

Loading