Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Unicode CLDR to create speech symbol dictionaries with emojis #8758

Merged
merged 16 commits into from Sep 25, 2018

Conversation

LeonarddeR
Copy link
Collaborator

Link to issue number:

Closes #6523

Summary of the issue:

NVDA has no built-in mechanism to read emoji descriptions. Currently, it relies on the dictionaries that are available for speech synthesizers, such as Windows OneCore and ESpeak. However, synthesizers like Vocalizer do not have this data and therefore can't speak emojis.

Description of how this pull request fixes the issue:

This pr includes the emoji descriptions from the Unicode Common Locale Data Repository. The emoji descriptions are build with NVDA and added to locale specific speech symbol dictionaries using scons, making it very easy to update the emoji sources whenever the CLDR is updated. For this, we use a nice github repository hosted and maintained by @fujiwarat.

To NVDA's speech settings, I added the option "Include Unicode Consortium data when processing characters and symbols" which makes it easy to disable the inclusion of these databases.

I also had to add some functionality to the config manager in order to allow making a dump of the configuration in Python dictionary format (i.e. a deep copy like how configobj does this). This changes include a new prevConf argument that is passed to config.post_configProfileSwitch, allowing handlers to compare the current configuration against the previous and decide on what to do. This is used to clear the caches of the character processing framework.

Testing performed

  1. Made emojis read within the Windows 10 emoji panel. When switching from Dutch to English, the emoji descriptions were accurately switched from Dutch to English.
  2. Switched CLDR data on and off in NVDA's speech settings, the data was accurately included and excluded when processing symbols.
  3. Switched CLDR data on and off by means of a configuration profile.

Known issues

  • I agree that the wording of the new GUI option is somewhat vague, but this is because the annotations database doesn't only contain emoji, but it also contains descriptions for characters like ® (trademark). Therefore, talking about emoji doesn't cover the whole spectrum of what this database adds to the list of speech symbols.
  • Other languages than Dutch and English have to be tested for accuracy. I'm quite confident that the Unicode data has decent quality, but you'll never know.
  • The user guide has yet to be updated as soon as we agree about the wording of the GUI option.
  • There is currently no braille support, as basically, there is no braille symbol processing mechanism yet. It felt a bit too far-fetched to implement that as part of this pr.

Change log entries

  • New features
    • NVDA is now able to read descriptions for emoji as well as other characters that are part of the Unicode Common Locale Data Repository. (nvda can read emojies #6523)
  • Changes vor developers
    • The config.post_configProfileSwitch action now takes the optional prevConf keyword argument, allowing handlers to take action based on differences between configuration before and after the profile switch.

@LeonarddeR
Copy link
Collaborator Author

@derekriemer: Also requested review from you because I recall you're pretty good with scons.

@LeonarddeR
Copy link
Collaborator Author

@jcsteh: I did not ask review from you specifically, however since the implementation is roughly based on your research and proposal, you might be interested to have a look.

@jcsteh
Copy link
Contributor

jcsteh commented Sep 18, 2018 via email

michaelDCurran
michaelDCurran previously approved these changes Sep 18, 2018
Copy link
Member

@michaelDCurran michaelDCurran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful. And even working much more accurately than eSpeak's own support at the moment.

sconstruct Outdated
@@ -174,7 +174,7 @@ env64['projectResFile'] = resFile
#Fill sourceDir with anything provided for it by miscDeps
env.recursiveCopy(sourceDir,Dir('miscdeps/source'))

env.SConscript('source/comInterfaces_sconscript',exports=['env'])
env.SConscript('source/comInterfaces_sconscript',exports=['env', 'sourceDir'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change meant to be here?

@josephsl
Copy link
Collaborator

josephsl commented Sep 18, 2018 via email

@jcsteh
Copy link
Contributor

jcsteh commented Sep 19, 2018 via email

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented Sep 19, 2018 via email

@michaelDCurran
Copy link
Member

@LeonarddeR: I'm really not too bothered, but I'd be fine with @jcsteh's final suggestion.

@LeonarddeR
Copy link
Collaborator Author

The espeak emoji dictionaries are now deleted, basically using a copy of #7810.

@LeonarddeR
Copy link
Collaborator Author

Python 3 unicodedata is only useful if it includes CLDR annotation data for multiple languages. I'm not sure that it does.

I can't find anything about this in the python docs, so I'm afraid that it does not.

michaelDCurran
michaelDCurran previously approved these changes Sep 25, 2018
@michaelDCurran michaelDCurran merged commit 21065fa into nvaccess:master Sep 25, 2018
@nvaccessAuto nvaccessAuto added this to the 2018.4 milestone Sep 25, 2018
@jage9
Copy link
Contributor

jage9 commented Sep 25, 2018

This works great when reading text. It doesn't currently function when arrowing by character.

  1. Open Notepad
  2. Insert an Emoji such as 🌮
  3. Arrow over the emoji by character.

You can also arrow over the taco emoji above to observe the same results. Using the up and down arrow keys reads it correctly.

I'm not sure if some emojis are represented by two characters, especially in browse mode. I can create a new issue for this if desired.

@LeonarddeR
Copy link
Collaborator Author

Feel free to create a new issue for this please.

@tmthywynn8
Copy link
Sponsor

Is there an explanation for the rationale behind choosing some for the Level field rather than none? After all, if this is able to be toggled anyway, you'd have to know to change your punctuation level to some if you manually enabled it dependent on situation, i.e., through a configuration profile.

@LeonarddeR
Copy link
Collaborator Author

LeonarddeR commented Nov 20, 2018 via email

@tmthywynn8
Copy link
Sponsor

tmthywynn8 commented Nov 21, 2018 via email

@LeonarddeR
Copy link
Collaborator Author

If the OneCore voices are the default though, then having the box unchecked will have the Emoji read, if known, regardless of punctuation level anyway so there's a consistency issue to consider.

This is certainly a valid point!

@tmthywynn8
Copy link
Sponsor

tmthywynn8 commented Nov 29, 2018

Looks like we've an unexpected error if you change the Unicode Consortium data via the Punctuation/symbol pronunciation dialog.

Steps to reproduce:

  1. Under the Speech catrogory of NVDA's preferences, check "Include Unicode Consortium data (including emoji) when processing characters and symbols".
  2. Make a change to one of the Emojis, e.g., change the level of the "face savoring food" Emoji (😋) to none using the Punctuation/symbol pronunciation dialog.
  3. Under the Speech catrogory of NVDA's preferences, uncheck "Include Unicode Consortium data (including emoji) when processing characters and symbols".
  4. Find the Emoji somewhere, e.g., through the cldr.dic file, and read it.

Expected: The synthesizer processes the line pre-Unicode Consortium data functionality.

Actual: An error noise, synthesizer staying silent, with the following in the log:

Replacement not defined in locale en for symbol: 😋
ERROR - eventHandler.executeEvent (HH:mm:ss.SSS"):
error executing event: gainFocus on <baseObject.Dynamic_IAccessibleEditWindowNVDAObject object at 0x05078AB0> with extra args of {}
Traceback (most recent call last):
  File "eventHandler.pyo", line 155, in executeEvent
  File "eventHandler.pyo", line 92, in __init__
  File "eventHandler.pyo", line 100, in next
  File "NVDAObjects\behaviors.pyo", line 169, in event_gainFocus
  File "NVDAObjects\__init__.pyo", line 961, in event_gainFocus
  File "NVDAObjects\__init__.pyo", line 849, in reportFocus
  File "speech.pyo", line 418, in speakObject
  File "speech.pyo", line 1029, in speakTextInfo
  File "speech.pyo", line 568, in speak
  File "speech.pyo", line 80, in processText
  File "characterProcessing.pyo", line 635, in processSpeechSymbols
  File "characterProcessing.pyo", line 550, in processText
  File "characterProcessing.pyo", line 534, in _regexpRepl
KeyError: u'\U0001f60b'

Possible solutions:

  1. Don't allow users to change the symbol definitions for the Unicode Consortium data.
  2. Suppress the error somehow, passing through the original text with the symbol not defined.
  3. Have a separate cldr-en.dic (or whatever locale) instead of putting the definition in symbols-en.dic. That way, you can check if NVDA is using the Unicode Consortium data or not, and if it is, then load in the custom cldr entries, otherwise don't load the entries from that file.

I don't know how complicated any of the proposed solutions would be as I'm not a coder, but hopefully a solution can be divides before 2018.4 is released.

@LeonarddeR
Copy link
Collaborator Author

Looks like we've an unexpected error if you change the Unicode Consortium data via the Punctuation/symbol pronunciation dialog.

There is a pr for this: #8932

feerrenrut referenced this pull request in nvaccess/nvda-cldr Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nvda can read emojies
7 participants