You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Check: Does VAD change speech data in data prep (P1)
No. The VAD step computes the VAD information only, and store it in the dumpdir, in the file vad.scp. The VAD step is used to mark to non-speech segments, and then exclude those segment information from training. However, it is true that these missing blanks could affect our reconstruction loss. But it can improve the quality of synthesized audios. It is a tradeoff we need to be aware.
No, the decoded wav sample rate is still 22050. Trying the following steps.
check the training process
check tts_inference.py file on sample rate usage.
Inference jobs are not eligible to submit since Feb 13th. Couldn't decode to see if meet correct requirement.
Applied retrained model. Speaker information is integrated! /ocean/projects/cis210027p/zzhou5/espnet/egs2/librispeech_100/tts_vits/exp/16k_xvector/tts_beta_lib100_vits_tts_all16k_char_xvector/decode_with_trained_16k_vocoder
If 3 does not work, consult Jiatong (p2)
Run inference w/o trained vocoder
Integrate VITS model in cyclic systems (p3)
The text was updated successfully, but these errors were encountered:
Todos
Check: Does VAD change speech data in data prep (P1)
Keep VITS with xvector and VAD training
Used trained decoder (with aligned sample rate) to re-decode, see if speaker information is perceptible (p2) #322050
. Trying the following steps.tts_inference.py
file on sample rate usage./ocean/projects/cis210027p/zzhou5/espnet/egs2/librispeech_100/tts_vits/exp/16k_xvector/tts_beta_lib100_vits_tts_all16k_char_xvector/decode_with_trained_16k_vocoder
If 3 does not work, consult Jiatong (p2)
Run inference w/o trained vocoder
Integrate VITS model in cyclic systems (p3)
The text was updated successfully, but these errors were encountered: