How to add marker of sil, sp to TextGrid after MFA? #21

nampdn · 2022-04-30T08:15:01Z

Hi @NTT123,
First of all thank you for your brilliant work! I have successfully trained my dataset with MFA, but it is not generated .TextGrid as a marker for silence, space. Could you please help me on how we can detect and add these symbol to the TextGrid file?

The text was updated successfully, but these errors were encountered:

NTT123 · 2022-04-30T11:39:20Z

~~Hi @nampdn, thank you for reporting this. The newest version of MFA removes these markers.~~

~~According to MontrealCorpusTools/Montreal-Forced-Aligner#377~~
~~you have to run mfa align or mfa train with an additional argument --disable_textgrid_cleanup.~~

NTT123 · 2022-04-30T15:32:31Z

@nampdn, please checkout the fix_sil branch for a quick fix. This branch can read textgrid files that have no "sil" or "sp" markers.

nampdn · 2022-05-01T04:15:34Z

Woot! I'm so grateful. I'll try it now.
Have a happy holiday!

nampdn · 2022-05-24T18:04:45Z

Hi @NTT123 ,
After pull latest fixes for sil. I still have problem with some utterance that has number in it.

('n', 'g', 'ư', 'ờ', 'i', ' ', 'đ', 'o', ' ', 'c', 'h', 'i', 'ề', 'u', ' ', 'r', 'ộ', 'n', 'g', ' ', 'c', 'ủ', 'a', ' ', 'l', 'ố', 'i', ' ', 'v', 'à', 'o', ' ', 'c', 'ổ', 'n', 'g', ' ', 'sil', 'l', 'à', ' ', 'n', 'ă', 'm', ' ', 'sil', '3', ' ', 'm', 'é', 't', ' ', 'sil', 'v', 'à', ' ', 'c', 'h', 'i', 'ề', 'u', ' ', 'd', 'à', 'i', ' ', 'l', 'à', ' ', 's', 'á', 'u', ' ', 'sil', '9', ' ', 'm', 'é', 't', ' ', 'sil')
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 181, in <module>
    train()
  File "/content/vietTTS/vietTTS/nat/acoustic_trainer.py", line 100, in train
    batch = next(train_data_iter)
  File "/content/vietTTS/vietTTS/nat/data_loader.py", line 111, in load_textgrid_wav
    ps = [phonemes.index(p) for p in ps]
  File "/content/vietTTS/vietTTS/nat/data_loader.py", line 111, in <listcomp>
    ps = [phonemes.index(p) for p in ps]
ValueError: '3' is not in list

Can you take a look on this sample? Can I add 0-9 into the phonemes list or I have to flatten the number into readable text?

NTT123 · 2022-05-25T01:01:38Z

You have to normalize the transcripts. For example, "3" should be converted to "ba".
This is the reason why numbers are not includes in the phonemes list.

nampdn · 2022-05-25T01:04:11Z

Oh I got that point, cheers!

nampdn closed this as completed May 16, 2022

nampdn reopened this May 24, 2022

nampdn closed this as completed May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to add marker of sil, sp to TextGrid after MFA? #21

How to add marker of sil, sp to TextGrid after MFA? #21

nampdn commented Apr 30, 2022

NTT123 commented Apr 30, 2022 •

edited

Loading

NTT123 commented Apr 30, 2022

nampdn commented May 1, 2022

nampdn commented May 24, 2022 •

edited

Loading

NTT123 commented May 25, 2022

nampdn commented May 25, 2022

How to add marker of sil, sp to TextGrid after MFA? #21

How to add marker of sil, sp to TextGrid after MFA? #21

Comments

nampdn commented Apr 30, 2022

NTT123 commented Apr 30, 2022 • edited Loading

NTT123 commented Apr 30, 2022

nampdn commented May 1, 2022

nampdn commented May 24, 2022 • edited Loading

NTT123 commented May 25, 2022

nampdn commented May 25, 2022

NTT123 commented Apr 30, 2022 •

edited

Loading

nampdn commented May 24, 2022 •

edited

Loading