Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cmn voice not correctly translated #1370

Closed
hgneng opened this issue Sep 13, 2022 · 23 comments
Closed

Cmn voice not correctly translated #1370

hgneng opened this issue Sep 13, 2022 · 23 comments

Comments

@hgneng
Copy link
Contributor

hgneng commented Sep 13, 2022

@jaacoppi

This is a following issue of #1044.

I apologize for my not careful test. I find issues after @cary-rowen tell me that there are still problems with Chinese Mandarin voice.

Here is the current voice output:

$ espeak-ng -x -vcmn '此电脑 means this computer'
c'i3_| di11'a4n_| na11'o3_| (en)mi:11nz_| DI11s_| k@11mpju:11t311(cmn)_|

And following output from older version is much better than current situation:

tsh'i[21_| t'iE51n_| n'Au21_| (en)mi:44nz_| DI11s_| k@11mpju:11t311(cmn)_|

Maybe this is why you had comment: #1044 (comment)

Here is the revert code:
cmn_revert.patch.txt

In fact, it's a temporary solution of #1163 .

I will try to compare the rules of Cantonese and Mandarin to find whether there are clues to make both Mandarin and English read correctly.

@hgneng
Copy link
Contributor Author

hgneng commented Sep 15, 2022

I think the problem is that eSpeak not correctly handle diphthongs in cmn. It inserts a low level tone 11 between diphthongs in cmn.

Current situaltion is:
$ espeak-ng -x -vcmn '此电脑'
c'i3_| di11'a4n_| na11'o3_|

It should be translated as:
c'i3_| d'ia4n_| n'ao3_|

@hgneng
Copy link
Contributor Author

hgneng commented Sep 16, 2022

"此电脑" in the cmn_list is translated as "ci3 dian4 nao3" in Pinyin.

As I have mentioned above, the pronunciation of "此电脑" in the old version (i.e. 1.50) is correct: tsh'i[21_| t'iE51n_| n'Au21_|

The latest pronunciation of "此电脑" is not correct: c'i3_| di11'a4n_| na11'o3_|

So I try to make some changes to phsource/ph_cmn:

  1. add phoneme ia like iE
phoneme iE
  vwl  starttype #i  endtype #e
  length 250 
  FMT(vwl_zh/ie)
endphoneme

phoneme ia
  vwl  starttype #i  endtype #a
  length 250 
  FMT(vwl_zh/ie)
endphoneme
  1. add phoneme ao like Au
wl starttype #a endtype #u
  length 250
  FMT(vwl_zh/aau)
endphoneme

phoneme ao
  vwl starttype #a endtype #o
  length 250
  FMT(vwl_zh/aau)
endphoneme
  1. replace phoneme i with i[
//phoneme i
//  vwl  starttype #i  endtype #i
//  length 250
//  IF nextPh(N) THEN
//    FMT(vowel/ii_2)
//  ENDIF
//  IF nextPh(n) THEN
//    FMT(vowel/ii_5)
//  ENDIF
//  FMT(vowel/i)
//endphoneme

phoneme i
  vwl  starttype #i  endtype #i
  length 250
  FMT(vowel/i#_7)
endphoneme

phoneme i[ //after ts tsh s 
  vwl  starttype #i  endtype #i
  length 250
  FMT(vowel/i#_7)
endphoneme

phoneme i. //after ts. ts.h s. z.
  vwl  starttype #i  endtype #i
  length 250
  FMT(vowel/i#_6)
endphoneme

After applying the changes and rebuild, it sounds fine:

$ espeak-ng -x -vcmn '此电脑'
c'i3_| d'ia4n_| n'ao3_|

@jaacoppi
I need you help. Change 1 and 2 above seems fine. Shall I make changes to other cmn phonemes?

Change 3 should not work. I just don't understand how the rules work. How does eSpeak know when to use phoneme "i[" or "i." instead of "i"? I suppose this is not defined by the comment "//after ts tsh s" or "//after ts. ts.h s. z.". How do I write a rule for "i after c"?

Thank you very much!

@hgneng
Copy link
Contributor Author

hgneng commented Sep 16, 2022

I see. In yue_rules, the rules are quite simple:

.group i
?1        i        _^_EN
?2        i        i
?2        iu       iu

In cmn_rules, the rules are complicated. Selecting "i", "i[" or "i." is defined in cmn_rules. After defining dictrules 1 in lang/sit/cmn, rules without prefix $1 will be ignore. So cmn phoneme translation break.

.group i
?1     i        _^_EN
       i        i //i in ing
    z) i        i[
    c) i        i[
    s) i        i[
    h) i        i. //after zh ch sh
    r) i        i.
       ia       iA
       ia (DnK  iE
       ia (DngK iA
       iao      jAu
       ia1o     jAu55
       ia2o     jAu35
       ia3o     jAu214
       ia4o     jAu51
       ie       iE
       io (DngK y
    q) io (DngK u
       iu       iou

@jaacoppi
What do you think we should do to "ci3"? How do we write a rule for "i" after "c" for Pinyin?

@jaacoppi
Copy link
Collaborator

@hgneng I speak very little chinese. If you think it sounds better then you should do a PR. Just remember to test with both combinations: chinese characters with english text and chinese characters with pinyin text.

Syntax for rules is explained in docs/dictionary.md. For "i" aftter "c", use something like:
c) i

@hgneng
Copy link
Contributor Author

hgneng commented Sep 19, 2022

@jaacoppi Thank you for replying.

Since the current cmn_rules is complicate, I think adding English rules basing on Mandarin rules is easier than the reverse one. I will try to revert cmn dictrules to 2 and fix English reading. The following test case seems works. I will write more test cases and fix other rules.

$ espeak-ng -x -vcmn '此电脑 means this computer 你好吗 ma is a horse'
tsh'i[21_| t'iE51n_| n'Au21_| (en)mi:44nz_| DI11s_| k@11mpju:11t311(cmn)_| n'i35_| X'Au21_| mA44_| (en)mA:11_| I11z_| a#11_| hO@11s(cmn)_|

@hgneng
Copy link
Contributor Author

hgneng commented Sep 20, 2022

I have rewritten most of the rules of cmn. Vowel will be spoken as Mandarin only when it's with a tone number. Otherwise, it will be regarded as English. This will make English words translated more correctly.

$ espeak-ng -x -vcmn '人之初,性本善。性相近,习相远。Men at their birth are naturally good. Their natures are much the same; their habits become widely different.'
z.'@35n_| ts.'i.55_| ts.h'u55_|
S;'i51N_| p'@21n_| s.'a51n_|
S;'i51N_| S;'iA55N_| tS;'i51n_|
S;'i35_| S;'iA55N_| 'y&214n_|
(en)m'E55n_| a55t_| De@55_| b'3:55T_| A@55_| n'a55tS@55r@L55i55_| g'U55d(cmn)_|
(en)De@55_| n'eI55tS355z_| A@55_| m'V55tS_| D@55_| s'eI55m(cmn)_|
(en)De@55_| h'a55bI55ts_| bI55k,V55m_| w'aI55dli55_| d'I55fr@55nt(cmn)_|

However, there is one more thing that can be improved. All English vowel is added with a tone 55. Cantonese also has the same problem. I don't know how to fix it. It should be more natural without the tone for English.

@jaacoppi
Copy link
Collaborator

jaacoppi commented Sep 20, 2022 via email

@hgneng
Copy link
Contributor Author

hgneng commented Sep 20, 2022

@ssb22
Dear Silas,
Have you been using espeak-ng these years? I find that the Mandarin extra dictionary doesn't work any more.

For example, there is an item in cmn_listx:
(天 地) tian1di4

But it is still matching the default item in cmn_list:

$ espeak-ng -X -vcmn '天地'
Replace: 天   tian1 
Translate 'tian1'
  1	t        [th]

  1	i        [_^_]
 57	ia (nL02 [iE]

  1	n        [n]

 22	1        [55]

Replace: 地   de5 
Translate 'de5'
  1	d        [t]

 22	d) e     [@]
  1	e        [_^_]
 21	e (L02   [o-]

 22	5        [11]

th'iE55n_| t@22_|

I pick up another word (长 大) zhang3da4 randomly and have the same issue.

The Cantonese extra dictionary yue_listx doesn't have this issue. espeak-ng can translate Cantonese words correctly.

This issue seems exists for a long time at least since 1.50. Could you help to confirm it? And do you have any idea why cmn_listx doesn't work?

I also find that there are meaningless items in cmn_listx. For example there is an item (天 主) tian1zhu3 but either 天 or 主 has unique Pinyin. There is no need to add an item for 天主. I don't know when and why this happen.

@hgneng
Copy link
Contributor Author

hgneng commented Sep 21, 2022

Single character in cmn_listx is correctly loaded. For example, 㐀 doesn't exists in cmn_list but only in cmn_listx. espeak-ng can translate 㐀 correctly.

$ espeak-ng -x -vcmn '㐀'
tS;h'iou55_|

@hgneng
Copy link
Contributor Author

hgneng commented Sep 21, 2022

I test with eSpeak 1.47.11 and get correct result:

$ ./espeak -X -vzh '天地'
Replace: 天 地   tian1di4 
Translate 'tian1di4'
  1	t        [th]

  1	i        [i]
 22	ia       [iA]
 65	ia (DnK  [iE]

  1	n        [n]

 22	1        [55]

  1	d        [t]

  1	i        [i]

 22	4        [51]

 thiE55nt'i51_|

eSpeak 1.47 with Chinese dictionaries can be downloaded from eSpeak-Chinese: https://sourceforge.net/projects/e-guidedog/files/eSpeak-Chinese/1.47.11/

(I no longer maintained eSpeak-Chinese since 2013)

It seems that the problem lies in "Replace" stage.

@hgneng
Copy link
Contributor Author

hgneng commented Sep 22, 2022

Cantonese goes another route "Found" different from Mandarin's "Replace".

$ espeak-ng -X -vyue '长江'
Found: '长' [coeng4]   $text
Found: '江' [gong1]   $text
c'oeng4_| g'ong1_|
$ espeak-ng -X -vyue '长大'
Found: '长 大
' [zoeng2daai6]   $text
zoeng2d'aai6_|

@hgneng
Copy link
Contributor Author

hgneng commented Sep 22, 2022

Following code can fix the Mandarin dictionary word matching issue:

diff --git a/src/libespeak-ng/tr_languages.c b/src/libespeak-ng/tr_languages.c
index 610676bc..a2050b8a 100644
--- a/src/libespeak-ng/tr_languages.c
+++ b/src/libespeak-ng/tr_languages.c
@@ -1590,8 +1590,8 @@ Translator *SelectTranslator(const char *name)
                tr->langopts.our_alphabet = 0x3100;
                tr->langopts.word_gap = 0x21; // length of a final vowel is less dependent on the next consonant, don't merge consonant with next word
                tr->langopts.textmode = true;
+               tr->langopts.listx = 1; // compile *_listx after *_list
                if (name2 == L3('y', 'u', 'e')) {
-                       tr->langopts.listx = 1; // compile zh_listx after zh_list
                        tr->langopts.numbers = NUM_DEFAULT;
                        tr->langopts.numbers2 = NUM2_ZERO_TENS;
                        tr->langopts.break_numbers = BREAK_INDIVIDUAL;

However, the second character of the word is read twice. It seems that the string pointer doesn't advance correctly for Mandarin word. I haven't found the cause yet.

$ espeak-ng -X -vcmn '天地'
Replace: 天 地   tian1di4 
Translate 'tian1di4'
  1	t        [th]

  1	i        [_^_]
 57	ia (nL02 [iE]

  1	n        [n]

 22	1        [55]

  1	d        [t]

  1	i        [_^_]
 21	i (L02   [i]

 22	4        [51]

Replace: 地   de5 
Translate 'de5'
  1	d        [t]

 22	d) e     [@]
  1	e        [_^_]
 21	e (L02   [o-]

 22	5        [11]

thiE55nt'i51_| t@11_|

@jaacoppi
Copy link
Collaborator

jaacoppi commented Sep 22, 2022 via email

@hgneng
Copy link
Contributor Author

hgneng commented Sep 22, 2022

@jaacoppi
After a whole days' trying, here is my fix for the Mandarin word issue. Do you think it's reasonable? Is there any test script in espeak-ng that I can run to confirm other functions not breaking by this patch?

$ git diff src/libespeak-ng/translate.c
diff --git a/src/libespeak-ng/translate.c b/src/libespeak-ng/translate.c
index 8759b48c..ca60cc84 100644
--- a/src/libespeak-ng/translate.c
+++ b/src/libespeak-ng/translate.c
@@ -164,6 +164,10 @@ int TranslateWord(Translator *tr, char *word_start, WORD_TAB *wtab, char *word_o
                                wtab->flags &= ~FLAG_FIRST_UPPER;
                        }
 
+                       // dictionary_skipwords is a global variable and TranslateWord3 will reset it to 0 at the beginning.
+                       // However, dictionary_skipwords value is still needed outside this scope.
+                       // So we backup and restore it at the end of this scope.
+                       int skipwords = dictionary_skipwords;
                        TranslateWord3(tr, word_out, wtab, NULL, &any_stressed_words, current_alphabet, word_phonemes, sizeof(word_phonemes));
 
                        int n;
@@ -182,6 +186,7 @@ int TranslateWord(Translator *tr, char *word_start, WORD_TAB *wtab, char *word_o
                                while (!isspace(*word_out)) ++word_out;
                                while (isspace(*word_out))  ++word_out;
                        }
+                       dictionary_skipwords = skipwords;
                }
 
                // If the list file contains a text replacement to another

@jaacoppi
Copy link
Collaborator

jaacoppi commented Sep 22, 2022 via email

@hgneng
Copy link
Contributor Author

hgneng commented Sep 23, 2022

make check passed in both my branch and latest espeak-ng master.

I will do some more testings and commit later.

@ssb22
Copy link
Contributor

ssb22 commented Sep 23, 2022

Sorry I'm away from my computer and haven't been able to look into this properly. I have not used espeak-ng myself.
Gradint still ships with original eSpeak because Jonathan didn't mind me bundling OALD pronunciation data with the English voice in the Mac and Windows installers. This OALD data is "non-commercial use only", which is not GPL compatible, so I wouldn't be able to make a Gradint bundle with OALD preloaded if I didn't have special permission (and I assume Jonathan's separate permission applies only to original eSpeak and not to eSpeak-NG). It is entirely possible that at some point eSpeak-NG accidentally broke the Mandarin dictionary parser and I didn't notice because I've not been testing every new espeak-ng release. This may need an investigation into the git history.

@ssb22
Copy link
Contributor

ssb22 commented Sep 23, 2022

Regarding unnecessary items in zh_listx (now cmn_listx), my script was programmed not to include words which would have been correctly pronounced anyway, but it also looked out for situations like: if there exists a 2-character word AB which needs no override, but there also exists a word BC which requires C to be overridden, and yet in the 3-character combination ABC the C should not be overridden because it's AB + C rather than A + BC, then do include the "unnecessary" word AB in the dictionary just to prevent eSpeak from using the overridden version of BC if there's an A before it. I need to get back to my computer before I can check the specific reason why 天主 is included, but if my script did it then it will be because there exists another word starting with 主 with an override which should have lower priority than 天主. But it is also possible that the script was faulty or the word was put in at another time. I should probably redo that script anyway and double-check the licensing of all sources it uses. It's on an old recordable CD in our loft when we get back to Cambridge.

@ssb22
Copy link
Contributor

ssb22 commented Sep 23, 2022

Oh and I also seem to remember putting in a rule about words ending with third tone to help with sandhi.

Beginners are taught "if there are two third-tones together, then the first one becomes a second tone" and "if there are three third-tones together, then the middle one becomes a second tone". Generalisation: if there are N third-tones together, then the N-2x-1 becomes second tone. But, crucially, this rule stops at word boundaries. So for example 给你好消息 is read with gei2 ni3 hao3, not gei3 ni2 hao3, because 给你 acts as one word and 好消息 acts as another, and the sandhi inside each is treated separately. I have verified this by asking multiple native speakers to record various phrases involving 3 or more tone-3 sequences and not telling them why I was asking before they recorded, to see how they naturally read it. (Some of the speakers did more complex things, like using a high tone-2 for the first and a low tone-2 for the second, but I assumed that was a regional variation and it would require a lower-level code change to implement, plus we wouldn't be able to do it in Ekho without PRATT-processing all the tone 2s into high tone 2s, so I only focused on the recordings that used a normal tone 2 to replace tone 3 in some places.) I implemented this in Gradint (sort_out_pinyin_3rd_tones in synth.py) and relied on eSpeak's dictionary having words that end with third tone. If eSpeak itself won't implement this then I guess the correct solution would have been to ship a custom zh_listx with Gradint instead of putting extra entries into upstream eSpeak, sorry.

@hgneng
Copy link
Contributor Author

hgneng commented Sep 23, 2022

I have compared code of espeak and espeak-ng. The dictionary bug should be injected when re-structing method TranslateWord3. I have made a patch for it and seems work.

I am afraid that I have never thought that third tone rules would be such complicated. I am still at beginner level :(

I see. (天 主) tian1zhu3 may be added for preventing 主 from pronouncing zhu2 in some third tone rules.

I find another interesting item (主 体) zhu3ti3. 主体 should be read as zhu2ti3. Why there is such item exists in cmn_listx? And more interesting is that espeak-ng can correctly read 主体 as zhu2ti3. It seems that a third tone rule is hit in espeak-ng.

@jaacoppi
Copy link
Collaborator

@hgneng can you make a PR of all your current changes? I think they could be merged already.

For 主 体, check out intonation.c and especially this code block:

// Mandarin
if (tr->translator_name == L('z', 'h') || tr->translator_name == L3('c', 'm', 'n')) {

hgneng added a commit to hgneng/espeak-ng that referenced this issue Sep 26, 2022
1. Rewrite cmn_rules. Vowel will be spoken as Mandarin only when it's with a tone number. Otherwise, it will be regarded as English. This will make English words translated more correctly.
2. Fix issue of word item in cmn_listx not taking effect.
3. Fix dictionary_skipwords bug. It should be injected when re-structing method TranslateWord3.
@hgneng
Copy link
Contributor Author

hgneng commented Sep 28, 2022

@jaacoppi I have make a PR. Could you please have a review?

@jaacoppi
Copy link
Collaborator

jaacoppi commented Sep 28, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants