Generating MFA aligments #4803

Fhrozen · 2022-12-04T09:43:15Z

I prepared this draft based on PRs: #4557 #4801.
It is almost completed for a general support on different data sets.

I only tested on LJSpeech, so some parts are hard coded (stage 1 of .sh script)
In this case the code trains from scratch the mfa's models.
I had some issues with the original PRs and some word did not obtained a phoneme list, instead a spn token.

Training the models, allows to use espnet2-based g2p and reduces the spns to (1) or none (I cannot remember but still got a very few. But using espnet2's g2p then allow to generate the phoneme list and the py script generate a fixated duration for the phone list.

Some samples generated from a trained Fastspeech 2 model + hifigan is located at: https://1drv.ms/u/s!AliZ3I0uDW8HhKB14mq0Sumx_vSbpg?e=IlmQ6T

Let me know if you have any comment.

iamanigeeit · 2022-12-05T11:58:39Z

Hi @Fhrozen, i remember that spn comes from OOV words, which is why i added the steps to generate OOVs + add them into the MFA dictionaries in my proposed mfa.sh script.

Training MFA models from scratch is definitely another option.

sw005320 · 2022-12-05T12:29:54Z

Thanks a lot!
Should we first check and merge #4801 and then move to this one?
I think #4801 becomes in a good shape now, and it will be merged soon.

kan-bayashi · 2022-12-06T23:57:41Z

@Fhrozen I just merged #4801. If your PR is alternative or updated version of #4801, please refactor or update scripts added in #4801.

Fhrozen · 2022-12-07T10:21:26Z

@kan-bayashi I am wondering which option would be better. This PR includes mfa training and It would cover others recipes such as VCTK, so the mfa option should be keep separated as optional (described in the TTS readme) or would become a permanent option in all the tts recipes?

kan-bayashi · 2022-12-12T00:00:41Z

the mfa option should be keep separated as optional (described in the TTS readme)

I like this one.

Fhrozen · 2022-12-14T13:33:49Z

@kan-bayashi
I am not sure if this suits your taste.

Instead of executing ./run.sh --stop-stage 1 ==> ./local/mfa.sh ==> ./run.sh --stage 2, I added the local/data.sh in the mfa_align.sh file (scrips/utils/prep_data_mfa_align.sh), so it will only requires to run ./local/mfa.sh ==> ./run.sh --stage 2 (Line 99)

Let me know about it, I will try to finish the PR.

kan-bayashi · 2022-12-17T06:58:43Z

It looks good :)

Fhrozen · 2022-12-25T13:18:47Z

@kan-bayashi

I completed the code, so please let me know about any comments on the code.
I will be adding the documentation.

I test the code in LJ, VCTK, and tsukuyomi.
The current code supports:

single and multiple speakers
any data set
can be trained or use pre-trained models.

I will be adding a function lather for downloading pre-trained models from HuggingFace (There are some languages that MFA currently does not support).

A frontend was added in the lab generation for Japanese (and probably Korean and Chinese will require).
This is because the MFA does not recognize Japanese characters without spaces, and the dictionary generation never gets completed.

codecov · 2022-12-25T13:47:07Z

Codecov Report

Merging #4803 (07ee914) into master (a55853b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4803   +/-   ##
=======================================
  Coverage   79.18%   79.18%           
=======================================
  Files         557      557           
  Lines       49279    49279           
=======================================
  Hits        39020    39020           
  Misses      10259    10259

Flag	Coverage Δ
test_integration_espnet1	`66.39% <ø> (ø)`
test_integration_espnet2	`49.33% <ø> (ø)`
test_python	`67.99% <ø> (ø)`
test_utils	`23.34% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kan-bayashi

Sorry for the late review.
The code looks great, nice design that allows various recipe to use MFA.
And I'm glad to hear that Japanese case also works.

Once you complete this PR, let us merge it.

Fhrozen · 2023-01-06T10:52:36Z

Finish adding some fixes related to the frame generation found due to floating point division.
Documentation was also added.

Fhrozen · 2023-01-06T11:07:17Z

@kan-bayashi
After testing, it should support any language that the ESPnet front-end supports.

In case of english, I tested with VCTK also and some samples are here
vctk.zip

The files were generating using:

MFA trained model (cleaner: tacotron, g2p_model: espeak_ng_english_us_vits),
ProDiff + X-vector + GST model, and
HifiGan Vocder

For Japanese, I tested with JVS multispeaker + Tsukuyomi
jvs.zip

MFA trained model (cleaner: jaconv, g2p_model: pyopenjtalk_prosody),
ProDiff + X-vector + GST model, and
HifiGan Vocder (vctk)

The issue with the Japanese is the vocoder (I suppose), so I am training a large set for training a vocoder a test again.
Meanwhile, this PR is already finished.
Let me know if there is any fix it is required.

iamanigeeit · 2023-01-30T10:50:09Z

Hi @Fhrozen -- thanks for integrating the MFA scripts.

I wonder about --g2p_model espeak_ng_english_us_vits in run_mfa.sh. It seems MFA uses a custom IPA symbol set, not just having more symbols but some are different as well (aw vs aʊ̯).

Also, the readme says:

./scripts/mfa.sh --split_sets "train_set dev_set test_set" \
    --stage 1 \
    --stop-stage 2 \
    --train true --nj 36 --g2p_model espeak_ng_english_vits

I think we're supposed to run all (5) stages right?

Fhrozen · 2023-01-30T11:17:11Z

In case of using espeak_ng_english_us_vits (which is one the choices of the ESPnet frontend), you need to setup --train true and yes, it is required to run all stages to generate the transcriptions with the specified g2p.
In the readme, it is only as general reference.

iamanigeeit · 2023-06-28T06:56:50Z

@kan-bayashi @Fhrozen There seems to be a naming conflict in the LibriTTS corpus because MFA expects .lab files to be plain text transcriptions, but LibriTTSLabel is in <start> <end> <phone> format. This could potentially impact mfa.sh and trim_silence.py. I am running mfa.sh on LibriTTS to see if the alignment failures are still there because i suspect it's due to OOVs. If it works, i can help to update LibriTTSLabel.

iamanigeeit · 2023-06-28T12:16:44Z

Also, in mfa.sh, there are two different g2p_model:

ESPnet g2p model name for mfa_format.py
MFA g2p model for the mfa commands

mfa.sh will complain that either MFA or ESPnet doesn't have this g2p model name.

Fhrozen · 2023-06-28T12:53:34Z

@iamanigeeit I do not understand the current issue.

There seems to be a naming conflict in the LibriTTS corpus because MFA expects .lab files to be plain text transcriptions, but LibriTTSLabel is in format.

As further I see, it uses normalized text to generate the annotations:

espnet/egs/libritts/tts1/local/data_prep.sh

Lines 57 to 58 in aa88f3a

    
           txt=$(cat $(echo $wav_file | sed -e "s/\.wav$/.normalized.txt/")) 
        
           echo "$id $txt" >>$trans

espnet/egs2/libritts/tts1/local/data.sh

Lines 78 to 81 in aa88f3a

    
           cd "${db_root}/LibriTTS/${name}" 
        
           find . -follow -name "*.normalized.txt" -print0 \ 
        
               | tar c --null -T - -f - | tar xf - -C "${cwd}/data/local/${name}" 
        
           cd "${cwd}"

The lab files are employed for phoneme alignment I supposed.

Also, in mfa.sh, there are two different g2p_model:

ESPnet g2p model name for mfa_format.py
MFA g2p model for the mfa commands
mfa.sh will complain that either MFA or ESPnet doesn't have this g2p model name.

The setup is for using pretrained espnet mfa models.
There is no issue about that.
if you espnet_ as prefix, the program will automatically use a espnet pretrained mfa model instead of download it from the MFA official page.

I will update this soon for including mfa models stored in Huggingface.

iamanigeeit · 2023-06-28T17:55:05Z

There seems to be a naming conflict in the LibriTTS corpus because MFA expects .lab files to be plain text transcriptions, but LibriTTSLabel is in <start> <end> <phone> format.

I think this is OK since they are in separate folders (MFA .lab in data/local/mfa/corpus/xx and LibriTTS .lab in downloads/LibriTTS/xx/xx). I was confused because they have the same name.

The setup is for using pretrained espnet mfa models.

I see what you mean now, thanks. If train=true then g2p_model is an existing ESPnet G2P that we use to train MFA G2P. If train=false then g2p_model is a MFA G2P model.

Fhrozen added 3 commits December 4, 2022 18:23

installation of mfa

937e3a0

files for performing mfa alignment

afc0d0e

add details at tts readme

f8dc2bd

mergify bot added ESPnet2 README Installation labels Dec 4, 2022

fix run command

1d6f751

sw005320 requested a review from kan-bayashi December 5, 2022 12:26

sw005320 added this to the v.202211 milestone Dec 5, 2022

fixing location of files

27a91ab

Merge remote-tracking branch 'upstream/master' into dft-mfa

00cd149

kan-bayashi modified the milestones: v.202211, v.202301 Dec 11, 2022

Fhrozen added 3 commits December 14, 2022 21:56

format files

36a5c74

[skip ci] add local/data to mfa_align

e99d784

Merge remote-tracking branch 'upstream/master' into dft-mfa

93579bf

Fhrozen added 7 commits December 19, 2022 23:52

Merge remote-tracking branch 'upstream/master' into dft-mfa

321ab6b

[skip ci] update files locations

e93d0cc

Merge branch 'dft-mfa' of https://github.com/Fhrozen/espnet into dft-mfa

feb050a

mfa data prep and mdl dwnl

07281f1

name fix

bb6e3f0

Merge remote-tracking branch 'upstream/master' into dft-mfa

8d82bfd

Merge branch 'dft-mfa' of https://github.com/Fhrozen/espnet into dft-mfa

51d8e2a

Fhrozen added 6 commits December 24, 2022 16:09

[skip ci] update mfa.sh

5dfb706

update mfa for pretrained mfa_model

7804fff

remove unnec files

f3dc376

updates mfa for training

8a64362

ci fixes

cb107ff

fix fs of durations

74593dd

Fhrozen changed the title ~~[Draft] Generating MFA aligments~~ Generating MFA aligments Dec 25, 2022

Merge remote-tracking branch 'upstream/master' into dft-mfa

3168120

kan-bayashi approved these changes Jan 4, 2023

View reviewed changes

Fhrozen added 3 commits January 6, 2023 19:10

Fixing frame generation

a5f8e0d

update docs

e800f6b

Merge remote-tracking branch 'upstream/master' into dft-mfa

07ee914

Fhrozen marked this pull request as ready for review January 6, 2023 10:50

kan-bayashi merged commit 47de6af into espnet:master Jan 13, 2023

Fhrozen deleted the dft-mfa branch January 13, 2023 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating MFA aligments #4803

Generating MFA aligments #4803

Fhrozen commented Dec 4, 2022

iamanigeeit commented Dec 5, 2022

sw005320 commented Dec 5, 2022

kan-bayashi commented Dec 6, 2022

Fhrozen commented Dec 7, 2022

kan-bayashi commented Dec 12, 2022

Fhrozen commented Dec 14, 2022

kan-bayashi commented Dec 17, 2022

Fhrozen commented Dec 25, 2022

codecov bot commented Dec 25, 2022 •

edited

kan-bayashi left a comment •

edited

Fhrozen commented Jan 6, 2023

Fhrozen commented Jan 6, 2023

iamanigeeit commented Jan 30, 2023 •

edited

Fhrozen commented Jan 30, 2023

iamanigeeit commented Jun 28, 2023

iamanigeeit commented Jun 28, 2023

Fhrozen commented Jun 28, 2023

iamanigeeit commented Jun 28, 2023

Generating MFA aligments #4803

Generating MFA aligments #4803

Conversation

Fhrozen commented Dec 4, 2022

iamanigeeit commented Dec 5, 2022

sw005320 commented Dec 5, 2022

kan-bayashi commented Dec 6, 2022

Fhrozen commented Dec 7, 2022

kan-bayashi commented Dec 12, 2022

Fhrozen commented Dec 14, 2022

kan-bayashi commented Dec 17, 2022

Fhrozen commented Dec 25, 2022

codecov bot commented Dec 25, 2022 • edited

Codecov Report

kan-bayashi left a comment • edited

Choose a reason for hiding this comment

Fhrozen commented Jan 6, 2023

Fhrozen commented Jan 6, 2023

iamanigeeit commented Jan 30, 2023 • edited

Fhrozen commented Jan 30, 2023

iamanigeeit commented Jun 28, 2023

iamanigeeit commented Jun 28, 2023

Fhrozen commented Jun 28, 2023

iamanigeeit commented Jun 28, 2023

codecov bot commented Dec 25, 2022 •

edited

kan-bayashi left a comment •

edited

iamanigeeit commented Jan 30, 2023 •

edited