Skip to content

kaen2891/voice_conversion_ijcnn2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"Vocoder-free End-to-End Voice Conversion with Transformer Network"

Authors: June-Woo Kim, Ho-Young Jung, Minho Lee

This paper was accepted in IJCNN 2020

ABstract: Mel-frequency filter bank (MFB) based approaches have the advantage of higher learning speeds compared to using the raw spectrum due to a smaller number of features. However, speech generators with the MFB approach require an additional computationally expensive vocoder for the training process. The pre- and post-processing needed by the MFB and the vocoder is not essential to convert human voices, because it is possible to use only the raw spectrum to generate different style of voices with clear pronunciation. In this paper, we introduce a vocoder-free end-to-end voice conversion method using a transformer network to alleviate the computational burden from additional pre- and post-processing. Our transformer-based architecture, which does not have any CNN or RNN layers, has shown the benefit of learning fast while solving the limitation of sequential computation of the conventional RNN. For this reason, our model is a fast and effective approach to convert realistic voices using raw spectra in a parallel manner to generate different style of voices with clear pronunciation. Furthermore, we can get an adapted MFB for speech recognition by multiplying the converted magnitude with the phase information, and therefore our conversion model is also suitable for speaker adaptation. We perform our voice conversion experiments on TIDIGITS-dataset using the naturalness, similarity, and clarity with Mean Opinion Score as metrics.

The first column is source voice, second is target voice, and last is converted voice.

First domain voice : Girl

Source: Girl, Target: Man, Result: Converted Man

Saying 'Three'

Girl to Man (source)

source.

Girl to Man (target)

target.

Girl to Man (predict)

predict.

Saying 'Seven'

Girl to Man (source)

source.

Girl to Man (target)

target.

Girl to Man (predict)

predict.

Saying 'Nine'

Girl to Man (source)

source.

Girl to Man (target)

target.

Girl to Man (predict)

predict.

Saying 'Zero'

Girl to Man (source)

source.

Girl to Man (target)

target.

Girl to Man (predict)

predict.

Source: Girl, Target: Woman, Result: Converted Woman

Saying 'Two'

Girl to Woman (source)

source.

Girl to Woman (target)

target.

Girl to Woman (predict)

predict.

Saying 'Five'

Girl to Woman (source)

source.

Girl to Woman (target)

target.

Girl to Woman (predict)

predict.

Saying 'Eight'

Girl to Woman (source)

source.

Girl to Woman (target)

target.

Girl to Woman (predict)

predict.

Saying 'Oh'

Girl to Woman (source)

source.

Girl to Woman (target)

target.

Girl to Woman (predict)

predict.

Source: Girl, Target: Boy, Result: Converted Boy

Saying 'Four'

Girl to Boy (source)

source.

Girl to Boy (target)

target.

Girl to Boy (predict)

predict.

Saying 'Five'

Girl to Boy (source)

source.

Girl to Boy (target)

target.

Girl to Boy (predict)

predict.

Saying 'Six'

Girl to Boy (source)

source.

Girl to Boy (target)

target.

Girl to Boy (predict)

predict.

Saying 'Eight'

Girl to Boy (source)

source.

Girl to Boy (target)

target.

Girl to Boy (predict)

predict.

Second domain voice : Boy

Source: Boy, Target: Man, Result: Converted Man

Saying 'Four'

Boy to Man (source)

source.

Boy to Man (target)

target.

Boy to Man (predict)

predict.

Saying 'Six'

Boy to Man (source)

source.

Boy to Man (target)

target.

Boy to Man (predict)

predict.

Saying 'Nine'

Boy to Man (source)

source.

Boy to Man (target)

target.

Boy to Man (predict)

predict.

Saying 'Oh'

Boy to Man (source)

source.

Boy to Man (target)

target.

Boy to Man (predict)

predict.

Source: Boy, Target: Woman, Result: Converted Woman

Saying 'One'

Boy to Woman (source)

source.

Boy to Woman (target)

target.

Boy to Woman (predict)

predict.

Saying 'Two'

Boy to Woman (source)

source.

Boy to Woman (target)

target.

Boy to Woman (predict)

predict.

Saying 'Four'

Boy to Woman (source)

source.

Boy to Woman (target)

target.

Boy to Woman (predict)

predict.

Saying 'Eight'

Boy to Woman (source)

source.

Boy to Woman (target)

target.

Boy to Woman (predict)

predict.

Source: Boy, Target: Girl, Result: Converted Girl

Saying 'Two'

Boy to Girl (source)

source.

Boy to Girl (target)

target.

Boy to Girl (predict)

predict.

Saying 'Five'

Boy to Girl (source)

source.

Boy to Girl (target)

target.

Boy to Girl (predict)

predict.

Saying 'Eight'

Boy to Girl (source)

source.

Boy to Girl (target)

target.

Boy to Girl (predict)

predict.

Saying 'Zero'

Boy to Girl (source)

source.

Boy to Girl (target)

target.

Boy to Girl (predict)

predict.

Third domain voice : Woman

Source: Woman, Target: Man, Result: Converted Man

Saying 'One'

Woman to Man (source)

source.

Woman to Man (target)

target.

Woman to Man (predict)

predict.

Saying 'Two'

Woman to Man (source)

source.

Woman to Man (target)

target.

Woman to Man (predict)

predict.

Saying 'Three'

Woman to Man (source)

source.

Woman to Man (target)

target.

Woman to Man (predict)

predict.

Saying 'Nine'

Woman to Man (source)

source.

Woman to Man (target)

target.

Woman to Man (predict)

predict.

Source: Woman, Target: Boy, Result: Converted Boy

Saying 'Four'

Woman to Boy (source)

source.

Woman to Boy (target)

target.

Woman to Boy (predict)

predict.

Saying 'Six'

Woman to Boy (source)

source.

Woman to Boy (target)

target.

Woman to Boy (predict)

predict.

Saying 'Nine'

Woman to Boy (source)

source.

Woman to Boy (target)

target.

Woman to Boy (predict)

predict.

Saying 'Zero'

Woman to Boy (source)

source.

Woman to Boy (target)

target.

Woman to Boy (predict)

predict.

Source: Woman, Target: Girl, Result: Converted Girl

Saying 'Four'

Woman to Girl (source)

source.

Woman to Girl (target)

target.

Woman to Girl (predict)

predict.

Saying 'Five'

Woman to Girl (source)

source.

Woman to Girl (target)

target.

Woman to Girl (predict)

predict.

Saying 'Six'

Woman to Girl (source)

source.

Woman to Girl (target)

target.

Woman to Girl (predict)

predict.

Saying 'Zero'

Woman to Girl (source)

source.

Woman to Girl (target)

target.

Woman to Girl (predict)

predict.

Fourth domain voice : Man

Source: Man, Target: Woman, Result: Converted Woman

Saying 'Three'

Man to Woman (source)

source.

Man to Woman (target)

target.

Man to Woman (predict)

predict.

Saying 'Five'

Man to Woman (source)

source.

Man to Woman (target)

target.

Man to Woman (predict)

predict.

Saying 'Seven'

Man to Woman (source)

source.

Man to Woman (target)

target.

Man to Woman (predict)

predict.

Saying 'Oh'

Man to Woman (source)

source.

Man to Woman (target)

target.

Man to Woman (predict)

predict.

Source: Man, Target: Boy, Result: Converted Boy

Saying 'One'

Man to Boy (source)

source.

Man to Boy (target)

target.

Man to Boy (predict)

predict.

Saying 'Three'

Man to Boy (source)

source.

Man to Boy (target)

target.

Man to Boy (predict)

predict.

Saying 'Six'

Man to Boy (source)

source.

Man to Boy (target)

target.

Man to Boy (predict)

predict.

Saying 'Seven'

Man to Boy (source)

source.

Man to Boy (target)

target.

Man to Boy (predict)

predict.

Source: Man, Target: Girl, Result: Converted Girl

Saying 'Three'

Man to Girl (source)

source.

Man to Girl (target)

target.

Man to Girl (predict)

predict.

Saying 'Five'

Man to Girl (source)

source.

Man to Girl (target)

target.

Man to Girl (predict)

predict.

Saying 'Oh'

Man to Girl (source)

source.

Man to Girl (target)

target.

Man to Girl (predict)

predict.

Saying 'Zero'

Man to Girl (source)

source.

Man to Girl (target)

target.

Man to Girl (predict)

predict.


About

Research Results

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages