You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 28, 2019. It is now read-only.
I trained loop with a subset of vctk data (American speakers). I found that the audio from those speakers when I run generate.py using my trained model are pretty bad. I just hear only a couple of words in a sentence and the rest is silence or noise.
My guess is that something went wrong during feature extraction. When I compare same feature extracted files i.e. p294_001.npz from the given s3 bucket and the one I feature extracted by running extract_feats.py, I see that vuv_idx from s3 has larger numbers (range: -5 to 5) compared to mine (range: -10e-02 to 5 )
I also noticed that text_features and audio_features are of different shape:
(226, 420) - s3
(540, 420) - me
Other features like durations and code2phone also look different.
May I know what changes I've to make to the extract_feats.py to get similar features as the one in s3?
The text was updated successfully, but these errors were encountered:
I also find the issue that the npz file size generated using extract_feats.py in local is different with the ones download from the site with download_data.sh , what is the problem?
I trained loop with a subset of vctk data (American speakers). I found that the audio from those speakers when I run generate.py using my trained model are pretty bad. I just hear only a couple of words in a sentence and the rest is silence or noise.
My guess is that something went wrong during feature extraction. When I compare same feature extracted files i.e. p294_001.npz from the given s3 bucket and the one I feature extracted by running extract_feats.py, I see that
vuv_idx
from s3 has larger numbers (range: -5 to 5) compared to mine (range: -10e-02 to 5 )I also noticed that
text_features
andaudio_features
are of different shape:(226, 420) - s3
(540, 420) - me
Other features like
durations
andcode2phone
also look different.May I know what changes I've to make to the
extract_feats.py
to get similar features as the one in s3?The text was updated successfully, but these errors were encountered: