GitHub - jsvir/ivc: Provable Speech Attributes Conversion via Latent Independence

Audio samples for the paper: Provable Speech Attributes Conversion via Latent Independence

Model performance comparison after expanding the training data to include a larger and more diverse speaker pool. The same architecture was retrained on multiple datasets:

VCTK (110 speakers)
LibriTTS train-clean-360 (904)
LibriSpeech train-clean-100 (251)
ESD English (10)

totaling 1,265 speakers. During inference, the reference speaker is provided using a single full-length utterance, whereas training is performed on 3-second audio chunks for 300 epochs.

Model	WER↓	CER↓	EER↑
Target*	5.96	2.38	50.00
YourTTS*	11.93	5.51	25.32
Free-VC*	7.61	3.17	8.97
KNN-VC	17.37	8.55	24.01
IVC (LibriSpeech-train-clean-100)	11.38	4.92	10.77
IVC (More data)	9.11	3.78	16.50

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

License

jsvir/ivc

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks