-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to compile zhy dictionay on Windows #933
Comments
Short answer: Try "yue" for Cantonese. Note that Cantonese support is very basic. There are open issues about how to deal with mixed chinese characters and latin letters. We welcome all contributions to both Cantonese and Mandarin.
Long answer:
Confusingly, Mandarin and Cantonese have multiple names across the codebase.
docs/languages.md and espeak-ng --voices list them as:
- "yue" for Cantonese
- "cmn" for Mandarin
When calling espeak-ng, both "cmn" and "zh" are accepted for Mandarin. Even more strangely, the files related to Mandarin are in dictsource/zhy_* and zh_* for Cantonese. This doesn't make any sense to me.
The history behind these names can probably be found in git log. I suspect it having something to do with BCP-47 classification standard mentiond in the documentation. I think the naming convention should be clarified.
Hope this helps.
|
Yes, that's correct. The In the voice files, the old espeak names are listed as options for compatibility reasons. The code still refers to them by the old names because they haven't been refactored to align with the changes. The naming of the phoneme files is not consistent either (espeak used the language names, e.g. ph_dutch, but not for cantonese and mandarin for some reason). I just hadn't got around to addressing it, as other things like emoji support were higher priority and I got burned out after that. |
Thanks for the explanations. We are currently splitting the "zhy_rules" to get the language to switch to. I'll add an exception for that language. |
Would it be easier for NVDA if there was yue_dict and cmn_dict instead of current zhy_dict and zh_dict?
The change for us should be easy since it's mostly about renaming files, not about changing code.
|
I misspoke in my last comment (now edited), we are deriving the language codes from the I have a potential work-around for this nvaccess/nvda#12370 however perhaps we are only creating one dictionary when we should be creating two? |
I'll try to rename yue_ and cmn_ files this weekend or next week. Looks like it's a bit more complicated than I thought because changes are needed in Android and Windows project files and in the language configuration files.
The dictionary files for Cantonese and Mandarin are different, you'll have to create both.
|
Ok, thanks @jaacoppi. We might go ahead with the work around for now. But it would be good to be able to remove it in the future. On the other hand, perhaps we should have an explicit listing of the voices to use with language rules to produce dictionaries. |
So to confirm, I should compile the dictionaries using voice lang yue with zhy rules as well as using voice lang cmn with zhy rules to produce two dictionaries. Doing this seems to produce zhy_dict and zh_dict. If that is the case, it might be better for us just to have an explicit mapping rather than iterate over the files. |
So to confirm, I should compile the dictionaries using voice lang yue with zhy rules as well as using voice lang cmn with zhy rules to produce two dictionaries. Doing this seems to produce zhy_dict and zh_dict. If that is the case, it might be better for us just to have an explicit mapping rather than iterate over the files.
Yes, except for the typo (cmn uses zh rules, not zhy).
To say this in another way, at the moment:
1) "espeak-ng --compile=yue" uses zhy_rules (and other zhy_* files) to produce zhy_dict (Cantonese)
2) both "espeak-ng --compile=cmn" and "espea-ng --compile=zh" use zh_rules (and other zh_* files) to produce zh_dict (Mandarin)
Once I'm done with the restructuring:
1) --compile=zh will be deprecated (so make sure to start using --compile=cmn for Mandarin to avoid problems in the future)
2) zhy_* files will be renamed yue_* (you'll need to rename them in your build scripts)
3) zh_* files will be renamed cmn_* (you'll need to rename them in your build scripts)
|
I've refactored zhy to yue without problems. Two questions for @rhdunn before I push the changes:
|
The extended dictionaries are still relevant. When enabled, they add a lot of entries generated from a dictionary (which I believe is from the Unicode unihan database, but I'm not 100% sure on that). This allows distributions that want to save space to ignore the listx files. It also keeps the generated lists separate from the custom exception lists. -- Ideally, the listx files should be autogenerated from the unihan list, but I haven't figured out how to do that yet, what version was used for the original list (to compare when using a script to generate the list), and what changes (if any) were made to that process. Ideally, espeak compatibility where possible is important. Especially around the use of espeak. So keeping the old names in the lang files allows users/applications that have e.g. zh set as their TTS voice for orca/spech-dispatcher/etc. to still work when using espeak-ng. Likewise if/when they are using the espeak API. Therefore, the |
See discussion in espeak-ng#933.
See discussion in espeak-ng#933.
@feerrenrut: The renaming has now been done in #940. I'll also point out the dictrules setting for Cantonese. It's in the file espeak-ng-data/lang/sit/yue. You might want to make two versions of the voice for different use cases. Mandarin doesn't have such a choice yet. There's an open issue at #347. It seems I've forgotten it. At the moment there's no way to easily set another default language than English. |
@feerrenrut: Can this be closed? |
Yes. thanks @jaacoppi. We are updating NVDA to make use of this in nvaccess/nvda#12370 |
…12370) Compiling the zhy dictionary has failed for a long time, it was excluded because the cause was unknown. It was suspected that there was an error in the format of the files. Looking into this I found the issue was caused by trying to set the voice to "zhy" by calling espeak_SetVoiceByProperties. The result was 2 (ENS_VOICE_NOT_FOUND) Compilation was based on using glob to find the *_rules files, and splitting the filename to get the language to use for the voice. Espeak-ng has renamed zhy and zh files to match the language code that should be used: yue and cmn for Cantonese and Mandarin respectively. See espeak-ng/espeak-ng#933 Description of how this pull request fixes the issue: This change makes the compilation of espeak dictionaries explicit. An explicit listing of the dictionaries NVDA expects (rather than using glob), allows us to be aware of the introduction or removal of languages.
The NVDA project has had an issue open for a while. We'd like to ask for some assistance to identify the issue. This process works for all other languages.
Our build script for espeak dictionaries:
espeak_Initialize
espeak_VOICE
struct (see struct definition below) withlanguage
set tozhy\0
espeak_SetVoiceByProperties
which returns2
(ENS_VOICE_NOT_FOUND
)Struct:
Thanks!
The text was updated successfully, but these errors were encountered: