Feature extraction #17

jayavanth · 2017-09-30T02:16:44Z

I trained loop with a subset of vctk data (American speakers). I found that the audio from those speakers when I run generate.py using my trained model are pretty bad. I just hear only a couple of words in a sentence and the rest is silence or noise.

My guess is that something went wrong during feature extraction. When I compare same feature extracted files i.e. p294_001.npz from the given s3 bucket and the one I feature extracted by running extract_feats.py, I see that vuv_idx from s3 has larger numbers (range: -5 to 5) compared to mine (range: -10e-02 to 5 )

I also noticed that text_features and audio_features are of different shape:
(226, 420) - s3
(540, 420) - me

Other features like durations and code2phone also look different.

May I know what changes I've to make to the extract_feats.py to get similar features as the one in s3?

The text was updated successfully, but these errors were encountered:

hepower · 2017-10-25T04:00:08Z

I also find the issue that the npz file size generated using extract_feats.py in local is different with the ones download from the site with download_data.sh , what is the problem?

hepower · 2017-11-09T12:22:48Z

@jayavanth have you fix the shape mismatch issue?

adampolyak · 2017-11-12T13:40:22Z

Mismatch in length is due to silence removal - preceding silence was removed from the uploaded data.
See #25.

adampolyak closed this as completed Nov 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature extraction #17

Feature extraction #17

jayavanth commented Sep 30, 2017 •

edited

hepower commented Oct 25, 2017

hepower commented Nov 9, 2017

adampolyak commented Nov 12, 2017

Feature extraction #17

Feature extraction #17

Comments

jayavanth commented Sep 30, 2017 • edited

hepower commented Oct 25, 2017

hepower commented Nov 9, 2017

adampolyak commented Nov 12, 2017

jayavanth commented Sep 30, 2017 •

edited