Add support for custom symbol dictionaries#16823
Conversation
See test results for failed build of commit 11e0eb3e6d |
96459cc to
552793f
Compare
WalkthroughThe changes introduce support for including and customizing speech symbol dictionaries in NVDA add-on packages. This includes updates to the manifest structure, new methods for handling symbol dictionaries, and a revamped settings UI to allow users to select which symbol dictionaries to use. Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
Outside diff range and nitpick comments (3)
source/config/profileUpgradeSteps.py (1)
2-2: Update copyrightEnsure that the year range is accurate.
- 2016-2024 + 2016-2023source/characterProcessing.py (2)
3-3: Update copyrightEnsure that the year range is accurate.
- 2010-2024 + 2010-2023
[!TIP]
Codebase VerificationEnsure file reading robustness
The
loadmethod lacks explicit error handling for IO errors that may occur during file operations. Consider adding a try-except block to handle potential IO errors when opening or reading the file.
- File:
source/characterProcessing.py- Line: 226-243
Analysis chain
Line range hint
226-243:
Ensure file reading robustnessThe
loadmethod reads symbol information from a file. Ensure robust error handling for file operations.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify robust error handling in the `load` method. # Test: Search for error handling in the `load` method. Expect: Proper error handling exists. rg --type python $'def load'Length of output: 60
Script:
#!/bin/bash # Description: Verify robust error handling in the `load` method. # Find the file containing the SpeechSymbols class speech_symbols_file=$(rg -l 'class SpeechSymbols') # Search for the load method within the identified file and display its content rg -A 20 'def load' $speech_symbols_fileLength of output: 1674
Script:
#!/bin/bash # Description: Verify robust error handling in the `load` method. # Find the file containing the SpeechSymbols class speech_symbols_file=$(rg -l 'class SpeechSymbols') # Search for the load method within the identified file and display its complete content rg -A 50 'def load' $speech_symbols_fileLength of output: 3199
|
@yeatersink and @paulGeoghegan you will likely be interested in a try build for this pull request, available here. |
|
@LeonarddeR I'll try get a chance to check this out and test it later today or tomorrow. It seems good and I like your solution though. |
seanbudd
left a comment
There was a problem hiding this comment.
Hi @LeonarddeR - a preliminary review as I am concerned about a missing aspect of the design - translating the braille table display names. I haven't reviewed the code in detail
source/characterProcessing.py
Outdated
| """Save symbol information to a file. | ||
| @param fileName: The name of the file to which to save symbol information, | ||
| C{None} to use the file name last passed to L{load} or L{save}. | ||
| C{None} to use the file name last passed to L{load} or L{save}. |
There was a problem hiding this comment.
I guess so, may be auto formatting was doing weird there. I will go over the code and will fix it accordingly
There was a problem hiding this comment.
this is still spaces instead of tabs
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
|
|
||
| Each of these language directories can contain a locale-specific manifest file called manifest.ini, which can contain a small subset of the manifest fields for translation. | ||
| These fields are summary and description. | ||
| You can also override the `displayName` field for speech symbol dictionaries and braille translation tables. |
There was a problem hiding this comment.
Fyi @seanbudd
We probably need to make this more explicit, with examples.
See test results for failed build of commit 56b17f2f70 |
|
@paulGeoghegan There's now a new build
That's now covered, that is, the built-in CLDR dictionary will always come first.
Could you reproduce this reliably?
I copied the dictionaries from your initial pull request. However I don't think the dictionaries are valid. For example, they require the "most" symbol level and "Send actual symbol to syntheszier" is set to always. |
source/characterProcessing.py
Outdated
| """Save symbol information to a file. | ||
| @param fileName: The name of the file to which to save symbol information, | ||
| C{None} to use the file name last passed to L{load} or L{save}. | ||
| C{None} to use the file name last passed to L{load} or L{save}. |
There was a problem hiding this comment.
this is still spaces instead of tabs
Co-authored-by: Sean Budd <seanbudd123@gmail.com>
|
@LeonarddeR This recent build looks great. Right now the Hebrew is only showing the punctuation. When we add the languages, we will update you on how it works. |
|
@LeonarddeR we can't seem to get it to read the symbols correctly. If we add them to an existing symbols file like in the en locale then they are read correctly but when we add them to an extention they are not read almost at all. |
|
@paulGeoghegan : The |
|
@LeonarddeR yes that fixed it. We have tested with a few languages now and it seems to work perfectly so far. We will let you know if we encounter any issues. |
Yes, exactly. If it is true, the symbols will be always enabled and won't be shown in the list. This is meant for cases where the correct functioning of an add-on is dependent on the enabled state of the symbol dictionary. |
|
Hey @LeonarddeR . This work s great. Quick question, how can we make it read a line? Right now it reads nothing, unless I arrow through every letter. |
|
@yeatersink wrote:
Nice to hear!
You should either change your NVDA symbol level to most or higher, or change the symbol level in the dictionary to something lower than most, e.g. |
|
@LeonarddeR thanks for the pro tip, it works glorasly well. I have another question relevant to this subject, but more focuses on braille. @seanbudd and @LeonarddeR the ability to check multiple languages and symbols is a huge step forward. The ability to check which add on we want to enable is monumental. I am so thankful as a NVDA user myself. Wat would be the iceing on the cake is the ability to enable several braille tables with the tables from Lib Louie. Is there any way that we can enable the same type of feature with braille tables? This way we can have both speech and braille with all of these symbols, with out having to create an all inclusive braille table for Lib Louis? Also, if we can do that, can we employ an opportunity to enable a braille table that we have created on our own computer? I know that people have been asking for these features for years. Me and my team are willing to help. @LeonarddeR this will be a major help to you in your language studies, especially if you want to learn Hebrew and Greek in Dutch or German. |
Great to hear.
There is an issue for this, namely #2044.
Yes, see the developer guide as well as the associated pull request that implemented this: #16208 |
|
Thanks @LeonarddeR |
| """ | ||
|
|
||
|
|
||
| def listAvailableSymbolDictionaryDefinitions() -> list[SymbolDictionaryDefinition]: |
There was a problem hiding this comment.
Hi @LeonarddeR
I need information on this already merged PR.
This listAvailableSymbolDictionaryDefinitions function is member of the public API and, if possible, I'd like to have clarification on it before 2025.1 API is totally frozen. I need to use it in my Character Information add-on.
I have modified your add-on for test purpose and run the following code in console:
>>> print('\n'.join([str((d.source, d.displayName)) for d in characterProcessing.listAvailableSymbolDictionaryDefinitions()]))
(<_SymbolDefinitionSource.BUILTIN: 'builtin'>, None)
(<_SymbolDefinitionSource.BUILTIN: 'builtin'>, 'Données du Consortium Unicode (incluant les emoji)')
('ancientLanguagesTest', 'Aaa First')
('ancientLanguagesTest', 'Biblical Hebrew')
('ancientLanguagesTest', 'transliteratedCuneiform')
(<_SymbolDefinitionSource.USER: 'user'>, None)
('ancientLanguagesTest', 'Zzz Last')
-
This function is only used to show user visible dictionaries in the GUI. Why haven't you use it elsewhere in the code? Indeed, it would have made sense that the visible order in the GUI be the same as the real symbol processing order in the code. That is
_SymbolDefinitionSource.USERshould have been last when using first key sort order. -
You use
strxfrm(dct.displayName or dct.name)as secondary sorting key. Why doesdct.displayNamecontrol the sorting order? I do not think that there will be so many dictionaries so displaying them sorted for presentation only is not useful. Moreover, as written before, it would be much better if the sorting order could be representative of the processing order. At last, using a localized name to sort them means that the sorting order may depend on the translation made and the language used in the GUI (or speech). This would make no sense if it matches the real processing order.
Let me know what you think and if I should open an issue or PR to have things changed. Thanks.
There was a problem hiding this comment.
Thanks for chiming in @CyrilleB79
I think you're raising valid questions here. The method is documented as the only public method to retrieve the definitions, yet it behaves as it is specifically meant for GUI presentation. That's why the sorting is there. I added this sorting heuristic to have a way of sorting that is identical to the sorting of braille tables, No more and no less.
I think you have a very valid point here and that we should remove the sorting from the list method and instead, return a copy of the list. Then, either
- We should refrain from sorting
- For listing purposes, I think it is useful to sort in a similar way as braille tables are sorted, but the sorting should be performed by the GUI and not by characterProcessing itself.
Curious to know your thoughts.
There was a problem hiding this comment.
If the GUI sorting can be made representative of the order in which symbols are processed, I'd say we should use processing order. Though, it will never be exactly the same as this list does not make a difference between English and locale versions of the symbol dictionaries.
If it is not possible to make something representative of the real processing order, in any case, CLDR (i.e. built-in) should remain first iMO as it is the most commonly used. For the remaining (custom) dictionaries, I'd prefer not to have a sorting order depending on the GUI language, but I have no strong argument for that.
In any case, for 2025.1, if you agree, we should just make listAvailableSymbolDictionaryDefinitions a public accessor to the underlying private variable, i.e. remove sorting from this list. GUI sorting can be discussed and implemented in 2025.2 if needed.
There was a problem hiding this comment.
I agree, as long as the public list method returns a copy of the list.
Link to issue number:
Closes #16739
Summary of the issue:
There is a desire to provide symbol pronunciation rules in optional symbol dictionaries, e.g. for ancient languages. IN #16739, we concluded that having the ability to provide them in add-ons would be most helpful.
Description of user facing changes
Include Unicode Consortium data (including emoji) when processing characters and symbolsfrom the speech settings category and replaced it by a checkable list box to enable optional dictionaries. This only contains the CLDR dictionary by default, but can be expanded by add-ons.Description of development approach
SymbolDictionaryDefinitionclass tocharacterProcessingthat contains all data about a dictionary.characterProcessingto create speech symbol processors based on enabled definitions.Testing strategy:
I created an add-on with dictionaries based on #16732. SPecial thanks to @yeatersink and @paulGeoghegan. Remove the .zip extension before installing this in a build from this pull request.
Test cases:
With an add-on:
Known issues with pull request:
None known
Code Review Checklist:
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Documentation