Skip to content
This repository has been archived by the owner on May 28, 2019. It is now read-only.

Reproducing the results #41

Closed
pfriesch opened this issue Mar 9, 2018 · 9 comments
Closed

Reproducing the results #41

pfriesch opened this issue Mar 9, 2018 · 9 comments

Comments

@pfriesch
Copy link

pfriesch commented Mar 9, 2018

Hi, thanks for open sourcing the code!

I am trying to reproduce your results. However, I am running into problems. I have been training:

  • sequence length: 100
  • epoch: 90
  • only American accent VCTK speaker samples
  • noise level 4

So the problem is that only some speakers actually produce a speech signal based on the input. The majority of speakers only produce noise. However, the speech producing speakers are depended on the actual phoneme input. The problem seems to be that the attention does not work correctly for these samples. The attention basically stays at the beginning of the sequence and does not advance.

Did you have a similar issue when training the model? Or do you might have an idea what the problem could be?

good attention with speech output:
p226_009_11.pdf
p225_005_4.pdf

somewhat working:
p226_009_2.pdf

Most examples:
p226_009_9.pdf
p226_009_13.pdf
p226_009_1.pdf

Thanks!

@macarbonneau
Copy link

Hi isn0gud,
I had similar problem when I tried to learn from the feature the I extracted from VCTK. When I use the VCTK data from the authors everything is ok. In a previous post it is mentioned that silence at the start of the wav file will cause problem for the attention module. After I removed these silences, everything worked perfectly. Otherwise I get the same attention figure you get.

Hope it helps!

@pfriesch
Copy link
Author

pfriesch commented Mar 12, 2018

Hmm, wired. I did my own feature extraction and I removed all the silences longer than 100ms. But thanks for your input! Gonna investigate :)

@macarbonneau
Copy link

Yeah the script is something crazy... and beware, some of the package are sometimes unavailable.... Just to be clear, you need to remove the silence at the beginning of the sentences. There are many! also there are a number of files filled with silence. I would cut tighter than 100 ms, but it is just a feeling... A good way to check if what your are doing make sense is to compare the size of your feature with those from FAIR.

@pfriesch
Copy link
Author

Did you rewrite the feature extraction?

@macarbonneau
Copy link

no I cleaned it up a little bit and made it independent from the web (I downloaded all package on my computer).

@pfriesch
Copy link
Author

pfriesch commented Mar 14, 2018

Would you mind to share yours? 😍The silence removal using merlin is done by phone alignment from HTK/HTS. But It does not remove all silences. Actually, it seems like it leaves quite a few silences in there.

@macarbonneau
Copy link

I'm sorry, I can't because I did this code for my company. But silence removal is pretty straight forward:

  1. compute the signal envelop
  2. find the index of the first time envelop is higher than a threshold.
  3. find the index of the last time envelop is higher than a threshold.
  4. save the part of the file between these two index.

Remember that you do not want the remove silence inside the sentence because they are important for the model...

@pfriesch
Copy link
Author

pfriesch commented Mar 16, 2018

Remember that you do not want the remove silence inside the sentence because they are important for the model...

That was my intuition too, but the provided extracted features have some silences removed in the middle of the sentences. But this is done by phone alignment, so the important silences are probably kept. So just removing everything below 35dB seems to be insufficient.

Anyway, I am extracting the features with the script now. It just takes forever.

@adampolyak
Copy link
Contributor

Hi friesch,

You can try out librosa trim method -https://librosa.github.io/librosa/generated/librosa.effects.trim.html#librosa-effects-trim.

It does exactly what you want, I found top_db=15 to work well enough.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants