New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Unicode CLDR to create speech symbol dictionaries with emojis #8758

Merged
merged 16 commits into from Sep 25, 2018

Conversation

Projects
None yet
6 participants
@leonardder
Collaborator

leonardder commented Sep 18, 2018

Link to issue number:

Closes #6523

Summary of the issue:

NVDA has no built-in mechanism to read emoji descriptions. Currently, it relies on the dictionaries that are available for speech synthesizers, such as Windows OneCore and ESpeak. However, synthesizers like Vocalizer do not have this data and therefore can't speak emojis.

Description of how this pull request fixes the issue:

This pr includes the emoji descriptions from the Unicode Common Locale Data Repository. The emoji descriptions are build with NVDA and added to locale specific speech symbol dictionaries using scons, making it very easy to update the emoji sources whenever the CLDR is updated. For this, we use a nice github repository hosted and maintained by @fujiwarat.

To NVDA's speech settings, I added the option "Include Unicode Consortium data when processing characters and symbols" which makes it easy to disable the inclusion of these databases.

I also had to add some functionality to the config manager in order to allow making a dump of the configuration in Python dictionary format (i.e. a deep copy like how configobj does this). This changes include a new prevConf argument that is passed to config.post_configProfileSwitch, allowing handlers to compare the current configuration against the previous and decide on what to do. This is used to clear the caches of the character processing framework.

Testing performed

  1. Made emojis read within the Windows 10 emoji panel. When switching from Dutch to English, the emoji descriptions were accurately switched from Dutch to English.
  2. Switched CLDR data on and off in NVDA's speech settings, the data was accurately included and excluded when processing symbols.
  3. Switched CLDR data on and off by means of a configuration profile.

Known issues

  • I agree that the wording of the new GUI option is somewhat vague, but this is because the annotations database doesn't only contain emoji, but it also contains descriptions for characters like ® (trademark). Therefore, talking about emoji doesn't cover the whole spectrum of what this database adds to the list of speech symbols.
  • Other languages than Dutch and English have to be tested for accuracy. I'm quite confident that the Unicode data has decent quality, but you'll never know.
  • The user guide has yet to be updated as soon as we agree about the wording of the GUI option.
  • There is currently no braille support, as basically, there is no braille symbol processing mechanism yet. It felt a bit too far-fetched to implement that as part of this pr.

Change log entries

  • New features
    • NVDA is now able to read descriptions for emoji as well as other characters that are part of the Unicode Common Locale Data Repository. (#6523)
  • Changes vor developers
    • The config.post_configProfileSwitch action now takes the optional prevConf keyword argument, allowing handlers to take action based on differences between configuration before and after the profile switch.

@leonardder leonardder added the Speech label Sep 18, 2018

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Sep 18, 2018

@derekriemer: Also requested review from you because I recall you're pretty good with scons.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Sep 18, 2018

@jcsteh: I did not ask review from you specifically, however since the implementation is roughly based on your research and proposal, you might be interested to have a look.

@jcsteh

This comment has been minimized.

Contributor

jcsteh commented Sep 18, 2018

@michaelDCurran

Wonderful. And even working much more accurately than eSpeak's own support at the moment.

sconstruct Outdated
@@ -174,7 +174,7 @@ env64['projectResFile'] = resFile
#Fill sourceDir with anything provided for it by miscDeps
env.recursiveCopy(sourceDir,Dir('miscdeps/source'))
env.SConscript('source/comInterfaces_sconscript',exports=['env'])
env.SConscript('source/comInterfaces_sconscript',exports=['env', 'sourceDir'])

This comment has been minimized.

@michaelDCurran

michaelDCurran Sep 18, 2018

Contributor

Was this change meant to be here?

@josephsl

This comment has been minimized.

Collaborator

josephsl commented Sep 18, 2018

@jcsteh

This comment has been minimized.

Contributor

jcsteh commented Sep 19, 2018

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Sep 19, 2018

@michaelDCurran

This comment has been minimized.

Contributor

michaelDCurran commented Sep 19, 2018

@leonardder: I'm really not too bothered, but I'd be fine with @jcsteh's final suggestion.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Sep 19, 2018

The espeak emoji dictionaries are now deleted, basically using a copy of #7810.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Sep 19, 2018

Python 3 unicodedata is only useful if it includes CLDR annotation data for multiple languages. I'm not sure that it does.

I can't find anything about this in the python docs, so I'm afraid that it does not.

@michaelDCurran michaelDCurran merged commit 21065fa into nvaccess:master Sep 25, 2018

1 check passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@nvaccessAuto nvaccessAuto added this to the 2018.4 milestone Sep 25, 2018

@jage9

This comment has been minimized.

Contributor

jage9 commented Sep 25, 2018

This works great when reading text. It doesn't currently function when arrowing by character.

  1. Open Notepad
  2. Insert an Emoji such as 🌮
  3. Arrow over the emoji by character.

You can also arrow over the taco emoji above to observe the same results. Using the up and down arrow keys reads it correctly.

I'm not sure if some emojis are represented by two characters, especially in browse mode. I can create a new issue for this if desired.

@leonardder

This comment has been minimized.

Collaborator

leonardder commented Sep 25, 2018

Feel free to create a new issue for this please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment