Skip to content

This is an official pytorch implementation of the paper: Two Scale Deep Fusion Voice Conversion with Style to Content Attention

Notifications You must be signed in to change notification settings

luckyluckyjl/TSDFVC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

TSDFVC

This is an official pytorch implementation of the paper: Arbitrary Voice Conversion via Phoneme Attention

Demo:https://luckyluckyjl.github.io/TSDFVC-demo/

Abstract: Arbitrary voice conversion, which is also called zero-shot voice conversion, is a challenging task that involves transforming voices from one speaker to another. Most of the existing solutions either compress the speaker information of an utterance into a fixed-length vector and then directly fuse the deep content information without considering the ground content, or adaptively normalize deep content features with the style to match their global statistics. To overcome this problem, we design a novel module which refered Two Stride Style to Content Attention Net (TSCNet) to capture time-varying speaking-style embedding by using an attention mechanism. Considering both the global statistics and local information, we proposed the Two Scale Deep Fusion Voice Conversion (TSDF-VC) mdoel for more similar and style-adaptive voice conversion. The code and pre-trained model are available at luckyluckyjl/TSDFVC.

About

This is an official pytorch implementation of the paper: Two Scale Deep Fusion Voice Conversion with Style to Content Attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages