Skip to content

Add support for custom symbol dictionaries#16823

Merged
seanbudd merged 28 commits intonvaccess:masterfrom
LeonarddeR:customSymbolDicts
Jul 30, 2024
Merged

Add support for custom symbol dictionaries#16823
seanbudd merged 28 commits intonvaccess:masterfrom
LeonarddeR:customSymbolDicts

Conversation

@LeonarddeR
Copy link
Collaborator

@LeonarddeR LeonarddeR commented Jul 5, 2024

Link to issue number:

Closes #16739

Summary of the issue:

There is a desire to provide symbol pronunciation rules in optional symbol dictionaries, e.g. for ancient languages. IN #16739, we concluded that having the ability to provide them in add-ons would be most helpful.

Description of user facing changes

  1. Removed the option Include Unicode Consortium data (including emoji) when processing characters and symbols from the speech settings category and replaced it by a checkable list box to enable optional dictionaries. This only contains the CLDR dictionary by default, but can be expanded by add-ons.
  2. Added the ability for add-ons to provide extra dictionaries. They could be mandatory, meaning that enabling the add-on means enabling the dictionary. If optional, they are listed in the listbox mentioned above.

Description of development approach

  1. Added a SymbolDictionaryDefinition class to characterProcessing that contains all data about a dictionary.
  2. Changed the logic in characterProcessing to create speech symbol processors based on enabled definitions.
  3. For dictionary registration in add-ons, I chose a similar approach as for custom braille tables, e.g. you define the dictionaries in a subsection of the add-on manifest.

Testing strategy:

I created an add-on with dictionaries based on #16732. SPecial thanks to @yeatersink and @paulGeoghegan. Remove the .zip extension before installing this in a build from this pull request.

Test cases:

  • Enable and disable CLDR from the GUI
  • Tested fallback to English
  • Tested profile upgrade with CLDR enabled and disabled

With an add-on:

  • Tested that mandatory dictionaries are always enabled if the add-on is enabled
  • Tested that non-mandatory dictionaries are available from the GUI
  • Tested that dictionary names are translatable
  • Tested that inheritance works properly for languages other than English

Known issues with pull request:

None known

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

Summary by CodeRabbit

  • New Features

    • Expanded NVDA add-ons to include custom speech symbol dictionaries for enhanced customization.
    • Added user interface options to manage and configure speech symbol dictionaries within the settings.
  • Improvements

    • Updated documentation to detail the inclusion of speech symbol dictionaries in NVDA add-ons.
    • Enhanced backward compatibility for speech settings by mapping deprecated values to new fields.
  • Bug Fixes

    • Fixed issues related to backward compatibility in speech configuration handling.
  • Documentation

    • Added detailed instructions on creating and using custom speech symbol dictionaries in the developer guide.

@AppVeyorBot
Copy link

  • PASS: Translation comments check.
  • PASS: License check.
  • FAIL: Unit tests. See test results for more information.
  • PASS: Lint check.
  • PASS: System tests (tags: installer NVDA).
  • Build (for testing PR): https://ci.appveyor.com/api/buildjobs/p3fbjpsr1tn3fvaf/artifacts/output/nvda_snapshot_pr16823-32794,11e0eb3e.exe
  • CI timing (mins):
    INIT 0.0,
    INSTALL_START 1.1,
    INSTALL_END 1.1,
    BUILD_START 0.0,
    BUILD_END 12.7,
    TESTSETUP_START 0.0,
    TESTSETUP_END 0.4,
    TEST_START 0.0,
    TEST_END 20.9,
    FINISH_END 0.2
  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit 11e0eb3e6d

@LeonarddeR LeonarddeR force-pushed the customSymbolDicts branch from 96459cc to 552793f Compare July 5, 2024 17:14
@LeonarddeR LeonarddeR marked this pull request as ready for review July 5, 2024 18:15
@LeonarddeR LeonarddeR requested a review from a team as a code owner July 5, 2024 18:15
@LeonarddeR LeonarddeR requested a review from michaelDCurran July 5, 2024 18:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 5, 2024

Walkthrough

The changes introduce support for including and customizing speech symbol dictionaries in NVDA add-on packages. This includes updates to the manifest structure, new methods for handling symbol dictionaries, and a revamped settings UI to allow users to select which symbol dictionaries to use.

Changes

File/Path Change Summary
projectDocs/dev/developerGuide/developerGuide.md Expanded section on packaging NVDA add-ons to include speech symbol dictionaries and detailed setup instructions.
source/addonHandler/__init__.py Added symbolDictionaries section in AddonManifest class and logic to manage display names.
source/characterProcessing.py Refactored existing methods and introduced new ones for symbol dictionary handling. Added a new class SymbolDictionaryDefinition.
source/config/__init__.py Added aliasing for deprecated includeCLDR field in favor of symbolDictionaries.
source/config/configSpec.py Incremented latestSchemaVersion and replaced includeCLDR with symbolDictionaries.
source/config/profileUpgradeSteps.py Added upgrade steps to handle configuration changes related to symbolDictionaries.
source/core.py Added import and initialization/termination calls for characterProcessing.
source/gui/settingsDialogs.py Updated UI logic to manage symbolDictionaries selection and saving.
tests/unit/__init__.py Added imports and initialization for characterProcessing in test setup.

Assessment against linked issues

Objective Addressed Explanation
Add a generic way to expand symbols regardless of translation (#16739)
Replace includeCLDR checkbox with a checkable list for symbol dictionaries (#16739)
Ensure backward compatibility with old configurations (#16739)
Provide UI for managing multiple symbol dictionaries (#16739)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range and nitpick comments (3)
source/config/profileUpgradeSteps.py (1)

2-2: Update copyright

Ensure that the year range is accurate.

- 2016-2024
+ 2016-2023
source/characterProcessing.py (2)

3-3: Update copyright

Ensure that the year range is accurate.

- 2010-2024
+ 2010-2023

[!TIP]
Codebase Verification

Ensure file reading robustness

The load method lacks explicit error handling for IO errors that may occur during file operations. Consider adding a try-except block to handle potential IO errors when opening or reading the file.

  • File: source/characterProcessing.py
  • Line: 226-243
Analysis chain

Line range hint 226-243:
Ensure file reading robustness

The load method reads symbol information from a file. Ensure robust error handling for file operations.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify robust error handling in the `load` method.

# Test: Search for error handling in the `load` method. Expect: Proper error handling exists.
rg --type python $'def load'

Length of output: 60


Script:

#!/bin/bash
# Description: Verify robust error handling in the `load` method.

# Find the file containing the SpeechSymbols class
speech_symbols_file=$(rg -l 'class SpeechSymbols')

# Search for the load method within the identified file and display its content
rg -A 20 'def load' $speech_symbols_file

Length of output: 1674


Script:

#!/bin/bash
# Description: Verify robust error handling in the `load` method.

# Find the file containing the SpeechSymbols class
speech_symbols_file=$(rg -l 'class SpeechSymbols')

# Search for the load method within the identified file and display its complete content
rg -A 50 'def load' $speech_symbols_file

Length of output: 3199

@LeonarddeR
Copy link
Collaborator Author

@yeatersink and @paulGeoghegan you will likely be interested in a try build for this pull request, available here.

@paulGeoghegan
Copy link

@LeonarddeR I'll try get a chance to check this out and test it later today or tomorrow. It seems good and I like your solution though.

Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LeonarddeR - a preliminary review as I am concerned about a missing aspect of the design - translating the braille table display names. I haven't reviewed the code in detail

"""Save symbol information to a file.
@param fileName: The name of the file to which to save symbol information,
C{None} to use the file name last passed to L{load} or L{save}.
C{None} to use the file name last passed to L{load} or L{save}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be tabs still?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, may be auto formatting was doing weird there. I will go over the code and will fix it accordingly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still spaces instead of tabs

@seanbudd seanbudd marked this pull request as draft July 11, 2024 04:55
Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Each of these language directories can contain a locale-specific manifest file called manifest.ini, which can contain a small subset of the manifest fields for translation.
These fields are summary and description.
You can also override the `displayName` field for speech symbol dictionaries and braille translation tables.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi @seanbudd
We probably need to make this more explicit, with examples.

@AppVeyorBot
Copy link

See test results for failed build of commit 56b17f2f70

@LeonarddeR LeonarddeR marked this pull request as ready for review July 12, 2024 16:41
@LeonarddeR
Copy link
Collaborator Author

@paulGeoghegan There's now a new build

1. Dictionaries in alphabetical sorted order,

That's now covered, that is, the built-in CLDR dictionary will always come first.

2. the dictionaries seemed to not always show up correctly,

Could you reproduce this reliably?

3. I'm not sure if you just added a sample of the characters or what but most of them aren't being read.

I copied the dictionaries from your initial pull request. However I don't think the dictionaries are valid. For example, they require the "most" symbol level and "Send actual symbol to syntheszier" is set to always.

"""Save symbol information to a file.
@param fileName: The name of the file to which to save symbol information,
C{None} to use the file name last passed to L{load} or L{save}.
C{None} to use the file name last passed to L{load} or L{save}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still spaces instead of tabs

@seanbudd seanbudd marked this pull request as draft July 16, 2024 07:03
@yeatersink
Copy link

@LeonarddeR This recent build looks great. Right now the Hebrew is only showing the punctuation. When we add the languages, we will update you on how it works.

@paulGeoghegan
Copy link

@LeonarddeR we can't seem to get it to read the symbols correctly. If we add them to an existing symbols file like in the en locale then they are read correctly but when we add them to an extention they are not read almost at all.
We tried using the extention you made and we also made our own:
https://drive.google.com/drive/folders/1KUWBdicue60IrTPKL1U1QvQ0_UXE-cv5?usp=sharing
We have the symbols set to most but as I said they work fine when added to an existing locale's symbols.dic file.

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented Jul 19, 2024

@paulGeoghegan : The symbols-akk-Xsux.dic file in your add-on is missing the symbols: start header just below the copyright notice and above the symbols

@paulGeoghegan
Copy link

@LeonarddeR yes that fixed it. We have tested with a few languages now and it seems to work perfectly so far. We will let you know if we encounter any issues.
I do wonder what does the mandatory = false option do?
If we set this to true does it force the symbols to be enabled?

@LeonarddeR
Copy link
Collaborator Author

I do wonder what does the mandatory = false option do? If we set this to true does it force the symbols to be enabled?

Yes, exactly. If it is true, the symbols will be always enabled and won't be shown in the list. This is meant for cases where the correct functioning of an add-on is dependent on the enabled state of the symbol dictionary.

@LeonarddeR LeonarddeR marked this pull request as ready for review July 20, 2024 11:09
@yeatersink
Copy link

Hey @LeonarddeR . This work s great. Quick question, how can we make it read a line? Right now it reads nothing, unless I arrow through every letter.
I would like it to read the entire line if it is possible. Any ideas on how we can make this happen?

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented Jul 23, 2024

@yeatersink wrote:

Hey @LeonarddeR . This work s great.

Nice to hear!

Quick question, how can we make it read a line? Right now it reads nothing, unless I arrow through every letter. I would like it to read the entire line if it is possible. Any ideas on how we can make this happen?

You should either change your NVDA symbol level to most or higher, or change the symbol level in the dictionary to something lower than most, e.g. none or some. None will deffinitely break pronunciation by synths that support the hcaracters properly, e.g. in Hebrew.

@yeatersink
Copy link

@LeonarddeR thanks for the pro tip, it works glorasly well. I have another question relevant to this subject, but more focuses on braille.

@seanbudd and @LeonarddeR the ability to check multiple languages and symbols is a huge step forward. The ability to check which add on we want to enable is monumental. I am so thankful as a NVDA user myself.

Wat would be the iceing on the cake is the ability to enable several braille tables with the tables from Lib Louie. Is there any way that we can enable the same type of feature with braille tables? This way we can have both speech and braille with all of these symbols, with out having to create an all inclusive braille table for Lib Louis?

Also, if we can do that, can we employ an opportunity to enable a braille table that we have created on our own computer?

I know that people have been asking for these features for years. Me and my team are willing to help.

@LeonarddeR this will be a major help to you in your language studies, especially if you want to learn Hebrew and Greek in Dutch or German.

@LeonarddeR
Copy link
Collaborator Author

@LeonarddeR thanks for the pro tip, it works glorasly well.

Great to hear.

Wat would be the iceing on the cake is the ability to enable several braille tables with the tables from Lib Louie. Is there any way that we can enable the same type of feature with braille tables? This way we can have both speech and braille with all of these symbols, with out having to create an all inclusive braille table for Lib Louis?

There is an issue for this, namely #2044.

Also, if we can do that, can we employ an opportunity to enable a braille table that we have created on our own computer?

Yes, see the developer guide as well as the associated pull request that implemented this: #16208

@LeonarddeR LeonarddeR requested a review from seanbudd July 25, 2024 11:17
@seanbudd
Copy link
Member

Thanks @LeonarddeR

"""


def listAvailableSymbolDictionaryDefinitions() -> list[SymbolDictionaryDefinition]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LeonarddeR

I need information on this already merged PR.
This listAvailableSymbolDictionaryDefinitions function is member of the public API and, if possible, I'd like to have clarification on it before 2025.1 API is totally frozen. I need to use it in my Character Information add-on.

I have modified your add-on for test purpose and run the following code in console:

>>> print('\n'.join([str((d.source, d.displayName)) for d in characterProcessing.listAvailableSymbolDictionaryDefinitions()]))
(<_SymbolDefinitionSource.BUILTIN: 'builtin'>, None)
(<_SymbolDefinitionSource.BUILTIN: 'builtin'>, 'Données du Consortium Unicode (incluant les emoji)')
('ancientLanguagesTest', 'Aaa First')
('ancientLanguagesTest', 'Biblical Hebrew')
('ancientLanguagesTest', 'transliteratedCuneiform')
(<_SymbolDefinitionSource.USER: 'user'>, None)
('ancientLanguagesTest', 'Zzz Last')
  1. This function is only used to show user visible dictionaries in the GUI. Why haven't you use it elsewhere in the code? Indeed, it would have made sense that the visible order in the GUI be the same as the real symbol processing order in the code. That is _SymbolDefinitionSource.USER should have been last when using first key sort order.

  2. You use strxfrm(dct.displayName or dct.name) as secondary sorting key. Why does dct.displayName control the sorting order? I do not think that there will be so many dictionaries so displaying them sorted for presentation only is not useful. Moreover, as written before, it would be much better if the sorting order could be representative of the processing order. At last, using a localized name to sort them means that the sorting order may depend on the translation made and the language used in the GUI (or speech). This would make no sense if it matches the real processing order.

Let me know what you think and if I should open an issue or PR to have things changed. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for chiming in @CyrilleB79

I think you're raising valid questions here. The method is documented as the only public method to retrieve the definitions, yet it behaves as it is specifically meant for GUI presentation. That's why the sorting is there. I added this sorting heuristic to have a way of sorting that is identical to the sorting of braille tables, No more and no less.

I think you have a very valid point here and that we should remove the sorting from the list method and instead, return a copy of the list. Then, either

  1. We should refrain from sorting
  2. For listing purposes, I think it is useful to sort in a similar way as braille tables are sorted, but the sorting should be performed by the GUI and not by characterProcessing itself.

Curious to know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the GUI sorting can be made representative of the order in which symbols are processed, I'd say we should use processing order. Though, it will never be exactly the same as this list does not make a difference between English and locale versions of the symbol dictionaries.

If it is not possible to make something representative of the real processing order, in any case, CLDR (i.e. built-in) should remain first iMO as it is the most commonly used. For the remaining (custom) dictionaries, I'd prefer not to have a sorting order depending on the GUI language, but I have no strong argument for that.

In any case, for 2025.1, if you agree, we should just make listAvailableSymbolDictionaryDefinitions a public accessor to the underlying private variable, i.e. remove sorting from this list. GUI sorting can be discussed and implemented in 2025.2 if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, as long as the public list method returns a copy of the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a generic way to expand symmbols regardless of translation

6 participants