Skip to content
This repository has been archived by the owner on May 28, 2019. It is now read-only.

Feature extraction #17

Closed
jayavanth opened this issue Sep 30, 2017 · 3 comments
Closed

Feature extraction #17

jayavanth opened this issue Sep 30, 2017 · 3 comments

Comments

@jayavanth
Copy link

jayavanth commented Sep 30, 2017

I trained loop with a subset of vctk data (American speakers). I found that the audio from those speakers when I run generate.py using my trained model are pretty bad. I just hear only a couple of words in a sentence and the rest is silence or noise.

My guess is that something went wrong during feature extraction. When I compare same feature extracted files i.e. p294_001.npz from the given s3 bucket and the one I feature extracted by running extract_feats.py, I see that vuv_idx from s3 has larger numbers (range: -5 to 5) compared to mine (range: -10e-02 to 5 )

I also noticed that text_features and audio_features are of different shape:
(226, 420) - s3
(540, 420) - me

Other features like durations and code2phone also look different.

May I know what changes I've to make to the extract_feats.py to get similar features as the one in s3?

@hepower
Copy link

hepower commented Oct 25, 2017

I also find the issue that the npz file size generated using extract_feats.py in local is different with the ones download from the site with download_data.sh , what is the problem?

@hepower
Copy link

hepower commented Nov 9, 2017

@jayavanth have you fix the shape mismatch issue?

@adampolyak
Copy link
Contributor

Mismatch in length is due to silence removal - preceding silence was removed from the uploaded data.
See #25.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants