Reproducing the results #41

pfriesch · 2018-03-09T17:29:28Z

Hi, thanks for open sourcing the code!

I am trying to reproduce your results. However, I am running into problems. I have been training:

sequence length: 100
epoch: 90
only American accent VCTK speaker samples
noise level 4

So the problem is that only some speakers actually produce a speech signal based on the input. The majority of speakers only produce noise. However, the speech producing speakers are depended on the actual phoneme input. The problem seems to be that the attention does not work correctly for these samples. The attention basically stays at the beginning of the sequence and does not advance.

Did you have a similar issue when training the model? Or do you might have an idea what the problem could be?

good attention with speech output:
p226_009_11.pdf
p225_005_4.pdf

somewhat working:
p226_009_2.pdf

Most examples:
p226_009_9.pdf
p226_009_13.pdf
p226_009_1.pdf

Thanks!

macarbonneau · 2018-03-12T18:21:38Z

Hi isn0gud,
I had similar problem when I tried to learn from the feature the I extracted from VCTK. When I use the VCTK data from the authors everything is ok. In a previous post it is mentioned that silence at the start of the wav file will cause problem for the attention module. After I removed these silences, everything worked perfectly. Otherwise I get the same attention figure you get.

Hope it helps!

pfriesch · 2018-03-12T20:42:07Z

Hmm, wired. I did my own feature extraction and I removed all the silences longer than 100ms. But thanks for your input! Gonna investigate :)

macarbonneau · 2018-03-12T21:34:40Z

Yeah the script is something crazy... and beware, some of the package are sometimes unavailable.... Just to be clear, you need to remove the silence at the beginning of the sentences. There are many! also there are a number of files filled with silence. I would cut tighter than 100 ms, but it is just a feeling... A good way to check if what your are doing make sense is to compare the size of your feature with those from FAIR.

pfriesch · 2018-03-13T13:00:51Z

Did you rewrite the feature extraction?

macarbonneau · 2018-03-13T21:21:20Z

no I cleaned it up a little bit and made it independent from the web (I downloaded all package on my computer).

pfriesch · 2018-03-14T08:50:14Z

Would you mind to share yours? 😍The silence removal using merlin is done by phone alignment from HTK/HTS. But It does not remove all silences. Actually, it seems like it leaves quite a few silences in there.

macarbonneau · 2018-03-16T15:33:33Z

I'm sorry, I can't because I did this code for my company. But silence removal is pretty straight forward:

compute the signal envelop
find the index of the first time envelop is higher than a threshold.
find the index of the last time envelop is higher than a threshold.
save the part of the file between these two index.

Remember that you do not want the remove silence inside the sentence because they are important for the model...

pfriesch · 2018-03-16T15:51:20Z

Remember that you do not want the remove silence inside the sentence because they are important for the model...

That was my intuition too, but the provided extracted features have some silences removed in the middle of the sentences. But this is done by phone alignment, so the important silences are probably kept. So just removing everything below 35dB seems to be insufficient.

Anyway, I am extracting the features with the script now. It just takes forever.

adampolyak · 2018-03-19T08:34:49Z

Hi friesch,

You can try out librosa trim method -https://librosa.github.io/librosa/generated/librosa.effects.trim.html#librosa-effects-trim.

It does exactly what you want, I found top_db=15 to work well enough.

adampolyak closed this as completed Mar 19, 2018

wanshun123 mentioned this issue Nov 15, 2018

Parameters for dataset in the wild #58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing the results #41

Reproducing the results #41

pfriesch commented Mar 9, 2018 •

edited

macarbonneau commented Mar 12, 2018

pfriesch commented Mar 12, 2018 •

edited

macarbonneau commented Mar 12, 2018

pfriesch commented Mar 13, 2018

macarbonneau commented Mar 13, 2018

pfriesch commented Mar 14, 2018 •

edited

macarbonneau commented Mar 16, 2018

pfriesch commented Mar 16, 2018 •

edited

adampolyak commented Mar 19, 2018

Reproducing the results #41

Reproducing the results #41

Comments

pfriesch commented Mar 9, 2018 • edited

macarbonneau commented Mar 12, 2018

pfriesch commented Mar 12, 2018 • edited

macarbonneau commented Mar 12, 2018

pfriesch commented Mar 13, 2018

macarbonneau commented Mar 13, 2018

pfriesch commented Mar 14, 2018 • edited

macarbonneau commented Mar 16, 2018

pfriesch commented Mar 16, 2018 • edited

adampolyak commented Mar 19, 2018

pfriesch commented Mar 9, 2018 •

edited

pfriesch commented Mar 12, 2018 •

edited

pfriesch commented Mar 14, 2018 •

edited

pfriesch commented Mar 16, 2018 •

edited