TTS evaluation script and monitoring functionality using MOS prediction model #5485

Takaaki-Saeki · 2023-10-19T07:18:21Z

What?

This PR:

Adds an evaluation script for TTS using a MOS prediction model. It uses SpeechMOS toolkit developed by @tarepan. It currently uses a pretrained UTMOS (Best performing method in VoiceMOS Challenge 2022) strong learner model. Thanks to the toolkit, the predictor can be loaded directly from torch.hub without the need to install a specific module.
Adds a functionality to monitor the predicted MOS values during training of fully end-to-end TTS models. Currently, it is supported for VITS and JETS.

Why?

In addition to MCD and F0 RMSE, an evaluation script based on MOS prediction should be added. Evaluation by MOS prediction can be done even without ground truth speech.
For TTS (especially GAN-based TTS), it is often difficult to determine the best number of training steps. Monitoring the predicted MOS during training should help to improve the TTS performance.

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1c55053) 70.62% compared to head (c9bfeb4) 76.55%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5485      +/-   ##
==========================================
+ Coverage   70.62%   76.55%   +5.93%     
==========================================
  Files         719      720       +1     
  Lines       66513    66616     +103     
==========================================
+ Hits        46972    50998    +4026     
+ Misses      19541    15618    -3923

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <ø> (ø)`
test_integration_espnet2	`50.09% <18.75%> (-0.02%)`	⬇️
test_python_espnet1	`19.08% <0.00%> (?)`
test_python_espnet2	`52.40% <100.00%> (+0.01%)`	⬆️
test_utils	`22.15% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ftshijt · 2023-10-19T08:19:45Z

Wow, it is super cool! I always do the utmos evaluation with the generated audios from espnet. It would speed up some of my works a lot. Many thanks for the contribution!

egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_pseudomos.py

…spmos

for more information, see https://pre-commit.ci

sw005320 · 2023-10-21T11:50:36Z

@Takaaki-Saeki, very cool!

Can you add a test? You can check https://github.com/espnet/espnet/tree/master/test_utils
I was thinking of suggesting to add this evaluation metric to the results of some recipes, but then I found that the TTS recipe usually do not provide the results (only pre-trained models). It does not even include the evaluation in template. We should do some updates…

sw005320 · 2023-10-23T21:01:11Z

After you add a test, I can merge this PR.

Takaaki-Saeki · 2023-10-24T03:40:17Z

I was thinking of suggesting to add this evaluation metric to the results of some recipes, but then I found that the TTS recipe usually do not provide the results (only pre-trained models). It does not even include the evaluation in template. We should do some updates…

Thanks! This is an interesting direction and the objective results should be included in the TTS recipes.
For an objective evaluation of TTS recipes, it would be good to have a clearly defined test and dev sets, as in ASR (although this criterion needs to be discussed). For example, IIUC, LJSpeech and VCTK do not have predefined test sets, while LibriTTS has one as in Librispeech.
Along to this, TTS benchmarking (like superb) with espnet2 would also be worthwhile.
Let me consider it as a future PR.

sw005320 · 2023-10-24T03:43:03Z

Cool!
We can discuss and design it!

for more information, see https://pre-commit.ci

ftshijt · 2023-12-17T11:25:42Z

Thanks for the contribution!

Takaaki-Saeki added 4 commits October 18, 2023 14:55

Pseudo MOS evaluation with speechmos

91e2989

Plot predicted MOS values during training.

def8fd3

Add VITS config to plot predicted MOS

a8fce54

fix config

a798d8f

mergify bot added the ESPnet2 label Oct 19, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

382e4fa

for more information, see https://pre-commit.ci

ftshijt reviewed Oct 19, 2023

View reviewed changes

egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_pseudomos.py Outdated Show resolved Hide resolved

ftshijt reviewed Oct 19, 2023

View reviewed changes

egs2/TEMPLATE/asr1/pyscripts/utils/evaluate_pseudomos.py Show resolved Hide resolved

Takaaki-Saeki added 2 commits October 20, 2023 11:26

Add README and other VITS configs.

9f7fd11

Merge branch 'spmos' of https://github.com/Takaaki-Saeki/espnet into …

b9a2d7b

…spmos

mergify bot added the README label Oct 20, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

b8b79e1

for more information, see https://pre-commit.ci

sw005320 added New Features TTS Text-to-speech labels Oct 20, 2023

sw005320 added this to the v.202312 milestone Oct 20, 2023

add test_utils

a8ca235

Takaaki-Saeki mentioned this pull request Oct 24, 2023

Compatibility with older Python versions tarepan/SpeechMOS#3

Closed

Takaaki-Saeki and others added 3 commits October 25, 2023 00:29

update speechmos version

bc816f0

[pre-commit.ci] auto fixes from pre-commit.com hooks

c611753

for more information, see https://pre-commit.ci

Merge branch 'master' into spmos

0bc97cf

kan-bayashi modified the milestones: v.202310, v.202312 Oct 25, 2023

Takaaki-Saeki and others added 2 commits November 4, 2023 12:44

add test for vits and jets

616c79c

Merge branch 'master' into spmos

f621d75

ftshijt added 2 commits December 11, 2023 07:11

Merge branch 'master' into spmos

0df0f17

Merge branch 'master' into spmos

c9bfeb4

ftshijt merged commit 4771515 into espnet:master Dec 17, 2023
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS evaluation script and monitoring functionality using MOS prediction model #5485

TTS evaluation script and monitoring functionality using MOS prediction model #5485

Takaaki-Saeki commented Oct 19, 2023

codecov bot commented Oct 19, 2023 •

edited

ftshijt commented Oct 19, 2023 •

edited

sw005320 commented Oct 21, 2023

sw005320 commented Oct 23, 2023

Takaaki-Saeki commented Oct 24, 2023 •

edited

sw005320 commented Oct 24, 2023

ftshijt commented Dec 17, 2023

TTS evaluation script and monitoring functionality using MOS prediction model #5485

TTS evaluation script and monitoring functionality using MOS prediction model #5485

Conversation

Takaaki-Saeki commented Oct 19, 2023

What?

Why?

See also

codecov bot commented Oct 19, 2023 • edited

Codecov Report

ftshijt commented Oct 19, 2023 • edited

sw005320 commented Oct 21, 2023

sw005320 commented Oct 23, 2023

Takaaki-Saeki commented Oct 24, 2023 • edited

sw005320 commented Oct 24, 2023

ftshijt commented Dec 17, 2023

codecov bot commented Oct 19, 2023 •

edited

ftshijt commented Oct 19, 2023 •

edited

Takaaki-Saeki commented Oct 24, 2023 •

edited