TSDFVC

This is an official pytorch implementation of the paper: Arbitrary Voice Conversion via Phoneme Attention

Demo:https://luckyluckyjl.github.io/TSDFVC-demo/

Abstract: Arbitrary voice conversion, which is also called zero-shot voice conversion, is a challenging task that involves transforming voices from one speaker to another. Most of the existing solutions either compress the speaker information of an utterance into a fixed-length vector and then directly fuse the deep content information without considering the ground content, or adaptively normalize deep content features with the style to match their global statistics. To overcome this problem, we design a novel module which refered Two Stride Style to Content Attention Net (TSCNet) to capture time-varying speaking-style embedding by using an attention mechanism. Considering both the global statistics and local information, we proposed the Two Scale Deep Fusion Voice Conversion (TSDF-VC) mdoel for more similar and style-adaptive voice conversion. The code and pre-trained model are available at luckyluckyjl/TSDFVC.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TSDFVC

About

Releases

Packages

luckyluckyjl/TSDFVC

Folders and files

Latest commit

History

Repository files navigation

TSDFVC

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages