CFSD, SECS metrics for TTS #5235

imdanboy · 2023-06-16T06:33:42Z

Hi, I tried to implement following TTS objective metrics (#1665 ) by simply utilizing related espnet script and the pretrained models.

Conditional Frechet Speech Distance (CFSD)
Speaker Embedding Cosine Similarity (SECS)

(Example TTS paper where those metrics are used: ADAPTER-BASED EXTENSION OF MULTI-SPEAKER TEXT-TO-SPEECH MODEL FOR NEW SPEAKERS)

Example results on LJSPEECH eval set:

	CFSD	SECS	MCD	logF0 RMSE
JETS	6.6239 ± 2.4985	0.8465 ± 0.0534	6.6817 ± 0.5373	0.2905 ± 0.0682
VITS	6.8165 ± 1.6455	0.8440 ± 0.0585	6.9935 ± 0.6215	0.2812 ± 0.0589
joint_finetune	6.1623 ± 1.4250	0.8481 ± 0.0564	6.7344 ± 0.6166	0.2839 ± 0.0739

How about this?

for more information, see https://pre-commit.ci

codecov · 2023-06-16T07:07:15Z

Codecov Report

Merging #5235 (52220f6) into master (2839fb7) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5235   +/-   ##
=======================================
  Coverage   74.43%   74.43%           
=======================================
  Files         642      642           
  Lines       57611    57611           
=======================================
  Hits        42885    42885           
  Misses      14726    14726

Flag	Coverage Δ
test_integration_espnet1	`66.28% <ø> (+<0.01%)`	⬆️
test_integration_espnet2	`47.52% <ø> (ø)`
test_python	`65.15% <ø> (ø)`
test_utils	`23.28% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kan-bayashi

Very cool!
Could you update the doc (adding example command)?
https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1#evaluation

imdanboy · 2023-06-20T06:54:30Z

@kan-bayashi
Sure, I updated the doc!

sw005320 · 2023-06-20T10:10:27Z

Can you add a test script (in later PR?)?
We can add some consistency checks as well (e.g., same waveforms yields the zero distance, check to see that the case with the numerical issue is correctly handled).
The evaluation script should not include any errors.

imdanboy · 2023-06-20T14:15:17Z

@sw005320
Ok, I can add a test script in later PR 😄

sw005320 · 2023-06-22T11:17:38Z

Thanks!

add pyscript for evaluating CFSD, SECS

b53ef12

mergify bot added the ESPnet2 label Jun 16, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

48cf53d

for more information, see https://pre-commit.ci

sw005320 added Enhancement Enhancement TTS Text-to-speech labels Jun 16, 2023

sw005320 requested a review from kan-bayashi June 16, 2023 07:54

kan-bayashi requested changes Jun 19, 2023

View reviewed changes

Update README.md

52220f6

mergify bot added the README label Jun 20, 2023

kan-bayashi approved these changes Jun 22, 2023

View reviewed changes

sw005320 merged commit 753d847 into espnet:master Jun 22, 2023
24 of 25 checks passed

imdanboy mentioned this pull request Jul 28, 2023

add test script on evaluate_secs, cfsd #5377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CFSD, SECS metrics for TTS #5235

CFSD, SECS metrics for TTS #5235

imdanboy commented Jun 16, 2023

codecov bot commented Jun 16, 2023 •

edited

kan-bayashi left a comment

imdanboy commented Jun 20, 2023

sw005320 commented Jun 20, 2023

imdanboy commented Jun 20, 2023

sw005320 commented Jun 22, 2023

CFSD, SECS metrics for TTS #5235

CFSD, SECS metrics for TTS #5235

Conversation

imdanboy commented Jun 16, 2023

codecov bot commented Jun 16, 2023 • edited

Codecov Report

kan-bayashi left a comment

Choose a reason for hiding this comment

imdanboy commented Jun 20, 2023

sw005320 commented Jun 20, 2023

imdanboy commented Jun 20, 2023

sw005320 commented Jun 22, 2023

codecov bot commented Jun 16, 2023 •

edited