Lexicon update #952

nikigre · 2020-02-22T10:20:17Z

Hi!
So I successfully built a few voices using HMM and UnitSelection but now I have a problem, that a lot of words are mispronounced. I updated SAMA text and did a process 3 and 4 again (https://github.com/marytts/marytts/wiki/New-Language-Support), then build marytts-lang-sl-5.1.2 and tried synthesising again but the results are the same.
Do I need to build the whole voice again or what? Is this still applicable
Thank you for all your help :D

nikigre · 2020-03-21T15:52:25Z

Hi!
Does anyone have an idea on what to do?
Thank you

seblemaguer · 2020-03-21T17:12:48Z

Hello,

have you tried to generate only the XML file to see if your change are taken into account in the front end?

nikigre · 2020-03-22T11:47:07Z

Hi!
I tried again. I changed the pronunciation of word "Avstrija". The first example is before the change and second after the change.

<?xml version="1.0" encoding="UTF-8"?><maryxml xmlns="http://mary.dfki.de/2002/MaryXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.5" xml:lang="sl">
<p>
<s>
<prosody pitch="+5%" range="+20%">
<phrase>
<t accent="L+H*" g2p_method="lexicon" ph="s O s e: d_n n j E" pos="content">
Sosednje
<syllable ph="s O s e: d_n n j E">
<ph p="s"/>
<ph p="O"/>
<ph p="s"/>
<ph p="e:"/>
<ph p="d_n"/>
<ph p="n"/>
<ph p="j"/>
<ph p="E"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="d r Z a: v E" pos="content">
države
<syllable ph="d r Z a: v E">
<ph p="d"/>
<ph p="r"/>
<ph p="Z"/>
<ph p="a:"/>
<ph p="v"/>
<ph p="E"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="s o:" pos="content">
so
<syllable ph="s o:">
<ph p="s"/>
<ph p="o:"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="a v s t r i: j a" pos="content">
Avstrija
<syllable ph="a v s t r i: j a">
<ph p="a"/>
<ph p="v"/>
<ph p="s"/>
<ph p="t"/>
<ph p="r"/>
<ph p="i:"/>
<ph p="j"/>
<ph p="a"/>
</syllable>
</t>
<t pos="$PUNCT">
,
</t>
<boundary breakindex="4" tone="H-L%"/>
</phrase>
</prosody>
<prosody pitch="-5%" range="-20%">
<phrase>
<t accent="L+H*" g2p_method="lexicon" ph="m a dZ a: r s k a" pos="content">
Madžarska
<syllable ph="m a dZ a: r s k a">
<ph p="m"/>
<ph p="a"/>
<ph p="dZ"/>
<ph p="a:"/>
<ph p="r"/>
<ph p="s"/>
<ph p="k"/>
<ph p="a"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="i n" pos="content">
in
<syllable ph="i n">
<ph p="i"/>
<ph p="n"/>
</syllable>
</t>
<t accent="!H*" g2p_method="lexicon" ph="x r v a: S k a" pos="content">
Hrvaška
<syllable ph="x r v a: S k a">
<ph p="x"/>
<ph p="r"/>
<ph p="v"/>
<ph p="a:"/>
<ph p="S"/>
<ph p="k"/>
<ph p="a"/>
</syllable>
</t>
<t pos="$PUNCT">
.
</t>
<boundary breakindex="5" tone="L-L%"/>
</phrase>
</prosody>
</s>
</p>
</maryxml>

Second:

<?xml version="1.0" encoding="UTF-8"?><maryxml xmlns="http://mary.dfki.de/2002/MaryXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.5" xml:lang="sl">
<p>
<s>
<prosody pitch="+5%" range="+20%">
<phrase>
<t accent="L+H*" g2p_method="lexicon" ph="s O s e: d_n n j E" pos="content">
Sosednje
<syllable ph="s O s e: d_n n j E">
<ph p="s"/>
<ph p="O"/>
<ph p="s"/>
<ph p="e:"/>
<ph p="d_n"/>
<ph p="n"/>
<ph p="j"/>
<ph p="E"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="d r Z a: v E" pos="content">
države
<syllable ph="d r Z a: v E">
<ph p="d"/>
<ph p="r"/>
<ph p="Z"/>
<ph p="a:"/>
<ph p="v"/>
<ph p="E"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="s o:" pos="content">
so
<syllable ph="s o:">
<ph p="s"/>
<ph p="o:"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="a v s t r i: j a" pos="content">
Avstrija
<syllable ph="a v s t r i: j a">
<ph p="a"/>
<ph p="v"/>
<ph p="s"/>
<ph p="t"/>
<ph p="r"/>
<ph p="i:"/>
<ph p="j"/>
<ph p="a"/>
</syllable>
</t>
<t pos="$PUNCT">
,
</t>
<boundary breakindex="4" tone="H-L%"/>
</phrase>
</prosody>
<prosody pitch="-5%" range="-20%">
<phrase>
<t accent="L+H*" g2p_method="lexicon" ph="m a dZ a: r s k a" pos="content">
Madžarska
<syllable ph="m a dZ a: r s k a">
<ph p="m"/>
<ph p="a"/>
<ph p="dZ"/>
<ph p="a:"/>
<ph p="r"/>
<ph p="s"/>
<ph p="k"/>
<ph p="a"/>
</syllable>
</t>
<t accent="L+H*" g2p_method="lexicon" ph="i n" pos="content">
in
<syllable ph="i n">
<ph p="i"/>
<ph p="n"/>
</syllable>
</t>
<t accent="!H*" g2p_method="lexicon" ph="x r v a: S k a" pos="content">
Hrvaška
<syllable ph="x r v a: S k a">
<ph p="x"/>
<ph p="r"/>
<ph p="v"/>
<ph p="a:"/>
<ph p="S"/>
<ph p="k"/>
<ph p="a"/>
</syllable>
</t>
<t pos="$PUNCT">
.
</t>
<boundary breakindex="5" tone="L-L%"/>
</phrase>
</prosody>
</s>
</p>
</maryxml>

So as you can see, the word "Avstrija" is the same...

seblemaguer · 2020-03-22T19:11:47Z

Hello,

I don't see the word "tako" in your xml. So I am confused about your problem.

nikigre · 2020-03-22T19:45:27Z

@seblemaguer
I am so sorry!
Now I have update the coment. So now it is about word "Avstrija". The correct form should be: a: v s t r i j a. And not a v s t r i: j a.

psibre · 2020-03-23T06:39:01Z

As you can see from the token attribute g2p_method="lexicon", the pronunciation is taken from your lexicon. So if you want to fix that, you need to update that component. Note that you do not need to rebuild your voice all the time while you're working on the lexicon component. I recommend you run a local MaryTTS server with the lexicon and language components (but without your voice), then ensure your pronunciation is correct via GET requests to http://localhost:59125, with OUTPUT_TYPE=PHONEMES and LOCALE=sl. You can easily do this through your browser from http://localhost:59125/documentation.html (scroll to the bottom).
Note that you will have to restart the local MaryTTS server after changing the loaded components.

After you fix your lexicon, you can rebuild your voice.

nikigre · 2020-03-26T11:25:06Z

Hi!
I did as you said. I removed my voice and tried with GET request. But I got the same result. I was very mystified.
So I tried again the whole process (changing a word in the lexicon, train it, rebuild NLP) and now looks like it works!
Ps: One question about steps that MaryTTS takes to convert text to audio file. Is there any order that MaryTTS calls functions that filters text and where could I set that? I wrote a method that converts numbers to strings but I don't know wehere to call it. I tired in "Tokenizer.java" but all values appear to be NULL.
@psibre and @seblemaguer Thank you both for your help :D

psibre added newlanguage question voicebuilding labels Mar 23, 2020

nikigre closed this as completed Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexicon update #952

Lexicon update #952

nikigre commented Feb 22, 2020

nikigre commented Mar 21, 2020

seblemaguer commented Mar 21, 2020

nikigre commented Mar 22, 2020 •

edited

seblemaguer commented Mar 22, 2020 •

edited

nikigre commented Mar 22, 2020

psibre commented Mar 23, 2020

nikigre commented Mar 26, 2020

Lexicon update #952

Lexicon update #952

Comments

nikigre commented Feb 22, 2020

nikigre commented Mar 21, 2020

seblemaguer commented Mar 21, 2020

nikigre commented Mar 22, 2020 • edited

seblemaguer commented Mar 22, 2020 • edited

nikigre commented Mar 22, 2020

psibre commented Mar 23, 2020

nikigre commented Mar 26, 2020

nikigre commented Mar 22, 2020 •

edited

seblemaguer commented Mar 22, 2020 •

edited