The generated wav is not good #14

pangtouyuqqq · 2022-11-30T13:47:55Z

Hi, thank you for open source the wonderful work !
I followed your instructions 1) install lightconv_cuda, 2) download the checkpoint, 3) download the speaker embedding npy.
However, the generated result is not good.

Below is my running command

python3 synthesize.py \
  --text "Hello world" \
  --speaker_id Actor_22 \
  --emotion_id sad \
  --restore_step 450000 \
  --mode single \
  --dataset RAVDESS

# sh run.sh 
2022-11-30 13:45:22.626404: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Device of XSpkEmoTrans: cuda
Removing weight norm...
Raw Text Sequence: Hello world
Phoneme Sequence: {HH AH0 L OW1 W ER1 L D}

ENV

python 3.6.8
fairseq                 0.10.2
torch                   1.7.0+cu110
CUDA 11.0

Hello world_Actor_22_sad.wav.zip

The text was updated successfully, but these errors were encountered:

keonlee9420 · 2022-12-01T12:07:07Z

Hi @pangtouyuqqq , thanks for your attention. It is because of the dataset where there are only two different texts (It will give you more natural output when you try with one of them). If you need to generate unseen text, you may get some helps by training on other dataset which has more generic text-speech pairs. It would be also helpful to replace light convolution with transformer when you do that.

deep-convai mentioned this issue Jul 6, 2023

audio produced by pretrained model is not correct... #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The generated wav is not good #14

The generated wav is not good #14

pangtouyuqqq commented Nov 30, 2022 •

edited

keonlee9420 commented Dec 1, 2022 •

edited

The generated wav is not good #14

The generated wav is not good #14

Comments

pangtouyuqqq commented Nov 30, 2022 • edited

keonlee9420 commented Dec 1, 2022 • edited

pangtouyuqqq commented Nov 30, 2022 •

edited

keonlee9420 commented Dec 1, 2022 •

edited