Join GitHub today
Use Unicode CLDR to create speech symbol dictionaries with emojis #8758
Link to issue number:
Summary of the issue:
NVDA has no built-in mechanism to read emoji descriptions. Currently, it relies on the dictionaries that are available for speech synthesizers, such as Windows OneCore and ESpeak. However, synthesizers like Vocalizer do not have this data and therefore can't speak emojis.
Description of how this pull request fixes the issue:
This pr includes the emoji descriptions from the Unicode Common Locale Data Repository. The emoji descriptions are build with NVDA and added to locale specific speech symbol dictionaries using scons, making it very easy to update the emoji sources whenever the CLDR is updated. For this, we use a nice github repository hosted and maintained by @fujiwarat.
To NVDA's speech settings, I added the option "Include Unicode Consortium data when processing characters and symbols" which makes it easy to disable the inclusion of these databases.
I also had to add some functionality to the config manager in order to allow making a dump of the configuration in Python dictionary format (i.e. a deep copy like how configobj does this). This changes include a new prevConf argument that is passed to config.post_configProfileSwitch, allowing handlers to compare the current configuration against the previous and decide on what to do. This is used to clear the caches of the character processing framework.
Change log entries
I took a quick look at the code. It looks good to me!
Hi, one ting to keep in mind: if we move to Python 3, unicodedata may provide an interesting solution, as Python 3.7 uses Unicode 11.0 which does include emoji chars. Thanks. From: Michael Curran <email@example.com> Sent: Tuesday, September 18, 2018 3:40 PM To: nvaccess/nvda <firstname.lastname@example.org> Cc: Subscribed <email@example.com> Subject: Re: [nvaccess/nvda] Use Unicode CLDR to create speech symbol dictionaries with emojis (#8758) @michaelDCurran approved this pull request. Wonderful. And even working much more accurately than eSpeak's own support at the moment.
_____ In sconstruct <#8758 (comment)> :
@@ -174,7 +174,7 @@ env64['projectResFile'] = resFile
#Fill sourceDir with anything provided for it by miscDeps env.recursiveCopy(sourceDir,Dir('miscdeps/source'))…
-env.SConscript('source/comInterfaces_sconscript',exports=['env']) +env.SConscript('source/comInterfaces_sconscript',exports=['env', 'sourceDir']) Was this change meant to be here? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8758 (review)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AHgLkJKS9pvoJOfGp4QI1duLY8K5gNkAks5ucXZGgaJpZM4Wt2Jm> .
We should consider removing the eSpeak Emoji data (if that's possible) in favour of this solution, rather than doubling up on the data. Python 3 unicodedata is only useful if it includes CLDR annotation data for multiple languages. I'm not sure that it does.
Sep 25, 2018
1 check passed
This works great when reading text. It doesn't currently function when arrowing by character.
You can also arrow over the taco emoji above to observe the same results. Using the up and down arrow keys reads it correctly.
I'm not sure if some emojis are represented by two characters, especially in browse mode. I can create a new issue for this if desired.