Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFSD, SECS metrics for TTS #5235

Merged
merged 3 commits into from Jun 22, 2023
Merged

CFSD, SECS metrics for TTS #5235

merged 3 commits into from Jun 22, 2023

Conversation

imdanboy
Copy link
Contributor

Hi, I tried to implement following TTS objective metrics (#1665 ) by simply utilizing related espnet script and the pretrained models.

  1. Conditional Frechet Speech Distance (CFSD)
  2. Speaker Embedding Cosine Similarity (SECS)

(Example TTS paper where those metrics are used: ADAPTER-BASED EXTENSION OF MULTI-SPEAKER TEXT-TO-SPEECH MODEL FOR NEW SPEAKERS)

Example results on LJSPEECH eval set:

CFSD SECS MCD logF0 RMSE
JETS 6.6239 ± 2.4985 0.8465 ± 0.0534 6.6817 ± 0.5373 0.2905 ± 0.0682
VITS 6.8165 ± 1.6455 0.8440 ± 0.0585 6.9935 ± 0.6215 0.2812 ± 0.0589
joint_finetune 6.1623 ± 1.4250 0.8481 ± 0.0564 6.7344 ± 0.6166 0.2839 ± 0.0739

How about this?

@mergify mergify bot added the ESPnet2 label Jun 16, 2023
@codecov
Copy link

codecov bot commented Jun 16, 2023

Codecov Report

Merging #5235 (52220f6) into master (2839fb7) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5235   +/-   ##
=======================================
  Coverage   74.43%   74.43%           
=======================================
  Files         642      642           
  Lines       57611    57611           
=======================================
  Hits        42885    42885           
  Misses      14726    14726           
Flag Coverage Δ
test_integration_espnet1 66.28% <ø> (+<0.01%) ⬆️
test_integration_espnet2 47.52% <ø> (ø)
test_python 65.15% <ø> (ø)
test_utils 23.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added Enhancement Enhancement TTS Text-to-speech labels Jun 16, 2023
Copy link
Member

@kan-bayashi kan-bayashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!
Could you update the doc (adding example command)?
https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1#evaluation

@mergify mergify bot added the README label Jun 20, 2023
@imdanboy
Copy link
Contributor Author

@kan-bayashi
Sure, I updated the doc!

@sw005320
Copy link
Contributor

Can you add a test script (in later PR?)?
We can add some consistency checks as well (e.g., same waveforms yields the zero distance, check to see that the case with the numerical issue is correctly handled).
The evaluation script should not include any errors.

@imdanboy
Copy link
Contributor Author

@sw005320
Ok, I can add a test script in later PR 😄

@sw005320 sw005320 merged commit 753d847 into espnet:master Jun 22, 2023
24 of 25 checks passed
@sw005320
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Enhancement ESPnet2 README TTS Text-to-speech
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants