Unable to compile zhy dictionay on Windows #933

feerrenrut · 2021-05-04T03:56:31Z

The NVDA project has had an issue open for a while. We'd like to ask for some assistance to identify the issue. This process works for all other languages.

Our build script for espeak dictionaries:

Calls espeak_Initialize
Constructs a espeak_VOICE struct (see struct definition below) with language set to zhy\0
Calls espeak_SetVoiceByProperties which returns 2 (ENS_VOICE_NOT_FOUND)

Struct:

class espeak_VOICE(ctypes.Structure):
	_fields_=[
		('name',ctypes.c_char_p),
		('languages',ctypes.c_char_p),
		('identifier',ctypes.c_char_p),
		('gender',ctypes.c_byte),
		('age',ctypes.c_byte),
		('variant',ctypes.c_byte),
		('xx1',ctypes.c_byte),
		('score',ctypes.c_int),
		('spare',ctypes.c_void_p),
	]

Thanks!

The text was updated successfully, but these errors were encountered:

jaacoppi · 2021-05-04T06:03:12Z

Short answer: Try "yue" for Cantonese. Note that Cantonese support is very basic. There are open issues about how to deal with mixed chinese characters and latin letters. We welcome all contributions to both Cantonese and Mandarin. Long answer: Confusingly, Mandarin and Cantonese have multiple names across the codebase. docs/languages.md and espeak-ng --voices list them as: - "yue" for Cantonese - "cmn" for Mandarin When calling espeak-ng, both "cmn" and "zh" are accepted for Mandarin. Even more strangely, the files related to Mandarin are in dictsource/zhy_* and zh_* for Cantonese. This doesn't make any sense to me. The history behind these names can probably be found in git log. I suspect it having something to do with BCP-47 classification standard mentiond in the documentation. I think the naming convention should be clarified. Hope this helps.

rhdunn · 2021-05-05T07:38:43Z

Yes, that's correct. The zhy name is not a valid BCP 47 name. The IANA language subtag registry (based on ISO 636-* for language codes) lists yue for Cantonese and cmn for Mandarin. Other voices have a similar change as well.

In the voice files, the old espeak names are listed as options for compatibility reasons.

The code still refers to them by the old names because they haven't been refactored to align with the changes. The naming of the phoneme files is not consistent either (espeak used the language names, e.g. ph_dutch, but not for cantonese and mandarin for some reason). I just hadn't got around to addressing it, as other things like emoji support were higher priority and I got burned out after that.

feerrenrut · 2021-05-06T09:42:56Z

Thanks for the explanations. We are currently splitting the "zhy_rules" to get the language to switch to. I'll add an exception for that language.

jaacoppi · 2021-05-06T10:45:43Z

Would it be easier for NVDA if there was yue_dict and cmn_dict instead of current zhy_dict and zh_dict? The change for us should be easy since it's mostly about renaming files, not about changing code.

feerrenrut · 2021-05-06T10:54:34Z

I misspoke in my last comment (now edited), we are deriving the language codes from the *_rules files not from *_dict files. But if the same offer applies to those files, that would also simplify this.

I have a potential work-around for this nvaccess/nvda#12370 however perhaps we are only creating one dictionary when we should be creating two?

jaacoppi · 2021-05-06T18:50:34Z

I'll try to rename yue_ and cmn_ files this weekend or next week. Looks like it's a bit more complicated than I thought because changes are needed in Android and Windows project files and in the language configuration files. The dictionary files for Cantonese and Mandarin are different, you'll have to create both.

feerrenrut · 2021-05-07T07:36:50Z

Ok, thanks @jaacoppi. We might go ahead with the work around for now. But it would be good to be able to remove it in the future. On the other hand, perhaps we should have an explicit listing of the voices to use with language rules to produce dictionaries.

feerrenrut · 2021-05-07T10:33:42Z

So to confirm, I should compile the dictionaries using voice lang yue with zhy rules as well as using voice lang cmn with zhy rules to produce two dictionaries. Doing this seems to produce zhy_dict and zh_dict. If that is the case, it might be better for us just to have an explicit mapping rather than iterate over the files.

jaacoppi · 2021-05-07T11:09:24Z

So to confirm, I should compile the dictionaries using voice lang yue with zhy rules as well as using voice lang cmn with zhy rules to produce two dictionaries. Doing this seems to produce zhy_dict and zh_dict. If that is the case, it might be better for us just to have an explicit mapping rather than iterate over the files.

Yes, except for the typo (cmn uses zh rules, not zhy). To say this in another way, at the moment: 1) "espeak-ng --compile=yue" uses zhy_rules (and other zhy_* files) to produce zhy_dict (Cantonese) 2) both "espeak-ng --compile=cmn" and "espea-ng --compile=zh" use zh_rules (and other zh_* files) to produce zh_dict (Mandarin) Once I'm done with the restructuring: 1) --compile=zh will be deprecated (so make sure to start using --compile=cmn for Mandarin to avoid problems in the future) 2) zhy_* files will be renamed yue_* (you'll need to rename them in your build scripts) 3) zh_* files will be renamed cmn_* (you'll need to rename them in your build scripts)

jaacoppi · 2021-05-08T13:30:03Z

I've refactored zhy to yue without problems. Two questions for @rhdunn before I push the changes:

Are the extended dictionaries and original espeak compatibility still relevant? Do we keep them or can we simplify and just use one _lsit file per language like for most languages? See commit f672211 to refresh your memory.
the codebase has instances of "zh" (like the switch case in tr_languages.c and the language tags in espeak-ng-data/lang/sit/cmn and espeak-ng-data/lang/sit/zhy. There's "language zh-cmn", "language zh 8" and so on. Can I get rid of everything that mentions zh so that the code explicitly uses either Mandarin or Cantonese, not generic Chinese?

rhdunn · 2021-05-08T14:49:41Z

The extended dictionaries are still relevant. When enabled, they add a lot of entries generated from a dictionary (which I believe is from the Unicode unihan database, but I'm not 100% sure on that). This allows distributions that want to save space to ignore the listx files. It also keeps the generated lists separate from the custom exception lists. -- Ideally, the listx files should be autogenerated from the unihan list, but I haven't figured out how to do that yet, what version was used for the original list (to compare when using a script to generate the list), and what changes (if any) were made to that process.

Ideally, espeak compatibility where possible is important. Especially around the use of espeak. So keeping the old names in the lang files allows users/applications that have e.g. zh set as their TTS voice for orca/spech-dispatcher/etc. to still work when using espeak-ng. Likewise if/when they are using the espeak API. Therefore, the language zh 8, etc. should stay, but changing the dictionary/language file names should be OK as users don't directly interact with those.

feerrenrut · 2021-05-12T05:06:31Z

Thanks for the explanation @jaacoppi and @rhdunn. Could you link the PR / change to this issue when possible. I think it would be handy to confirm the espeak mechanism for compiling the dictionaries matches with our usage of the espeak DLL in NVDA.

See discussion in espeak-ng#933.

jaacoppi · 2021-05-16T13:21:01Z

@feerrenrut: The renaming has now been done in #940.

I'll also point out the dictrules setting for Cantonese. It's in the file espeak-ng-data/lang/sit/yue.
dictrules 1 means latin characters are presumed to be English
dictrules 2 means latin characters are presumed to be Jyutping

You might want to make two versions of the voice for different use cases.

Mandarin doesn't have such a choice yet. There's an open issue at #347. It seems I've forgotten it.

At the moment there's no way to easily set another default language than English.

jaacoppi · 2021-05-22T13:07:30Z

@feerrenrut: Can this be closed?

feerrenrut · 2021-05-27T06:12:43Z

Yes. thanks @jaacoppi. We are updating NVDA to make use of this in nvaccess/nvda#12370

…12370) Compiling the zhy dictionary has failed for a long time, it was excluded because the cause was unknown. It was suspected that there was an error in the format of the files. Looking into this I found the issue was caused by trying to set the voice to "zhy" by calling espeak_SetVoiceByProperties. The result was 2 (ENS_VOICE_NOT_FOUND) Compilation was based on using glob to find the *_rules files, and splitting the filename to get the language to use for the voice. Espeak-ng has renamed zhy and zh files to match the language code that should be used: yue and cmn for Cantonese and Mandarin respectively. See espeak-ng/espeak-ng#933 Description of how this pull request fixes the issue: This change makes the compilation of espeak dictionaries explicit. An explicit listing of the dictionaries NVDA expects (rather than using glob), allows us to be aware of the introduction or removal of languages.

feerrenrut mentioned this issue May 6, 2021

Fix compilation of espeak dictionary for language zhy nvaccess/nvda#12370

Merged

7 tasks

jaacoppi added a commit to jaacoppi/espeak-ng that referenced this issue May 12, 2021

Rename zhy (Cantonese) to yue across the codebase.

875174c

See discussion in espeak-ng#933.

jaacoppi added a commit to jaacoppi/espeak-ng that referenced this issue May 12, 2021

Rename zh (Mandarin) to cmn across the codebase.

d91b7cf

See discussion in espeak-ng#933.

jaacoppi mentioned this issue May 12, 2021

Rename zh and zhy to cmn and yue #940

Merged

jaacoppi mentioned this issue May 16, 2021

cmn: handle latin characters as English text. #943

Merged

feerrenrut closed this as completed May 27, 2021

feerrenrut mentioned this issue Jun 1, 2021

Downgrade espeak to aafd2e720 nvaccess/nvda#12495

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to compile zhy dictionay on Windows #933

Unable to compile zhy dictionay on Windows #933

feerrenrut commented May 4, 2021 •

edited

Loading

jaacoppi commented May 4, 2021 via email

rhdunn commented May 5, 2021

feerrenrut commented May 6, 2021 •

edited

Loading

jaacoppi commented May 6, 2021 via email

feerrenrut commented May 6, 2021

jaacoppi commented May 6, 2021 via email

feerrenrut commented May 7, 2021

feerrenrut commented May 7, 2021

jaacoppi commented May 7, 2021 via email

jaacoppi commented May 8, 2021

rhdunn commented May 8, 2021

feerrenrut commented May 12, 2021

jaacoppi commented May 16, 2021

jaacoppi commented May 22, 2021

feerrenrut commented May 27, 2021

Unable to compile zhy dictionay on Windows #933

Unable to compile zhy dictionay on Windows #933

Comments

feerrenrut commented May 4, 2021 • edited Loading

jaacoppi commented May 4, 2021 via email

rhdunn commented May 5, 2021

feerrenrut commented May 6, 2021 • edited Loading

jaacoppi commented May 6, 2021 via email

feerrenrut commented May 6, 2021

jaacoppi commented May 6, 2021 via email

feerrenrut commented May 7, 2021

feerrenrut commented May 7, 2021

jaacoppi commented May 7, 2021 via email

jaacoppi commented May 8, 2021

rhdunn commented May 8, 2021

feerrenrut commented May 12, 2021

jaacoppi commented May 16, 2021

jaacoppi commented May 22, 2021

feerrenrut commented May 27, 2021

feerrenrut commented May 4, 2021 •

edited

Loading

feerrenrut commented May 6, 2021 •

edited

Loading