New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Unicode CLDR to create speech symbol dictionaries with emojis #8758
Conversation
@derekriemer: Also requested review from you because I recall you're pretty good with scons. |
@jcsteh: I did not ask review from you specifically, however since the implementation is roughly based on your research and proposal, you might be interested to have a look. |
I took a quick look at the code. It looks good to me! 👍
That GUI option name really worries me, though. I understand the reasoning
behind it, but almost no user is going to understand what it means. I'd
almost rather call it "Speak Emoji" and just document that it also includes
some other symbols, even despite the fact that this is a bit misleading.
Do we know what the criteria is for inclusion in this annotations database?
For example, why trademark and not mathematical symbols? That might be
worth looking into.
Alternatively, we could filter out anything but Emoji when generating the
dictionary in scons (by testing for specific Unicode ranges). It does seem
sad to miss out on translated names for other symbols, though.
Finally, we could consider naming the option something like "Include
Unicode Consortium data (including Emoji) when processing characters and
symbols". That's ugly, but at least it gives the user some idea what it's
talking about.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful. And even working much more accurately than eSpeak's own support at the moment.
sconstruct
Outdated
@@ -174,7 +174,7 @@ env64['projectResFile'] = resFile | |||
#Fill sourceDir with anything provided for it by miscDeps | |||
env.recursiveCopy(sourceDir,Dir('miscdeps/source')) | |||
|
|||
env.SConscript('source/comInterfaces_sconscript',exports=['env']) | |||
env.SConscript('source/comInterfaces_sconscript',exports=['env', 'sourceDir']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change meant to be here?
Hi, one ting to keep in mind: if we move to Python 3, unicodedata may provide an interesting solution, as Python 3.7 uses Unicode 11.0 which does include emoji chars. Thanks.
From: Michael Curran <notifications@github.com>
Sent: Tuesday, September 18, 2018 3:40 PM
To: nvaccess/nvda <nvda@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Subject: Re: [nvaccess/nvda] Use Unicode CLDR to create speech symbol dictionaries with emojis (#8758)
@michaelDCurran approved this pull request.
Wonderful. And even working much more accurately than eSpeak's own support at the moment.
_____
In sconstruct <#8758 (comment)> :
@@ -174,7 +174,7 @@ env64['projectResFile'] = resFile
#Fill sourceDir with anything provided for it by miscDeps
env.recursiveCopy(sourceDir,Dir('miscdeps/source'))
…-env.SConscript('source/comInterfaces_sconscript',exports=['env'])
+env.SConscript('source/comInterfaces_sconscript',exports=['env', 'sourceDir'])
Was this change meant to be here?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#8758 (review)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AHgLkJKS9pvoJOfGp4QI1duLY8K5gNkAks5ucXZGgaJpZM4Wt2Jm> .
|
We should consider removing the eSpeak Emoji data (if that's possible) in
favour of this solution, rather than doubling up on the data.
Python 3 unicodedata is only useful if it includes CLDR annotation data for
multiple languages. I'm not sure that it does.
|
@michaelDCurran, what do you think about "jcsteh's concerns regarding the wording of the new gui option?
|
@LeonarddeR: I'm really not too bothered, but I'd be fine with @jcsteh's final suggestion. |
The espeak emoji dictionaries are now deleted, basically using a copy of #7810. |
I can't find anything about this in the python docs, so I'm afraid that it does not. |
This works great when reading text. It doesn't currently function when arrowing by character.
You can also arrow over the taco emoji above to observe the same results. Using the up and down arrow keys reads it correctly. I'm not sure if some emojis are represented by two characters, especially in browse mode. I can create a new issue for this if desired. |
Feel free to create a new issue for this please. |
Is there an explanation for the rationale behind choosing some for the Level field rather than none? After all, if this is able to be toggled anyway, you'd have to know to change your punctuation level to some if you manually enabled it dependent on situation, i.e., through a configuration profile. |
Yes. The some level is NVDA's default level, and we want emoji to be spoken at that level. On the other hand, emoji are symbols, and if a user choses not to hear symbols at all, we want emoji to be left unspoken as well.
What would you suggest otherwise?
|
I'm not sure what the solution should be, though I understand the rationale now. It just didn't seem intuitive, as if I had the box checked, then I'd expect the Emojis to read across the board regardless of punctuation level; if I didn't want it read, instead of changing punctuation to a lower level, I'd just uncheck the box and I'd be set.
Granted, I'm not exactly the common case here, as my punctuation level is at none for most scenarios, and the box is unchecked until I encounter an Emoji, which, depending on synthesizer, can either alert you of its existence or not. If the OneCore voices are the default though, then having the box unchecked will have the Emoji read, if known, regardless of punctuation level anyway so there's a consistency issue to consider.
|
This is certainly a valid point! |
Looks like we've an unexpected error if you change the Unicode Consortium data via the Punctuation/symbol pronunciation dialog. Steps to reproduce:
Expected: The synthesizer processes the line pre-Unicode Consortium data functionality. Actual: An error noise, synthesizer staying silent, with the following in the log:
Possible solutions:
I don't know how complicated any of the proposed solutions would be as I'm not a coder, but hopefully a solution can be divides before 2018.4 is released. |
There is a pr for this: #8932 |
Link to issue number:
Closes #6523
Summary of the issue:
NVDA has no built-in mechanism to read emoji descriptions. Currently, it relies on the dictionaries that are available for speech synthesizers, such as Windows OneCore and ESpeak. However, synthesizers like Vocalizer do not have this data and therefore can't speak emojis.
Description of how this pull request fixes the issue:
This pr includes the emoji descriptions from the Unicode Common Locale Data Repository. The emoji descriptions are build with NVDA and added to locale specific speech symbol dictionaries using scons, making it very easy to update the emoji sources whenever the CLDR is updated. For this, we use a nice github repository hosted and maintained by @fujiwarat.
To NVDA's speech settings, I added the option "Include Unicode Consortium data when processing characters and symbols" which makes it easy to disable the inclusion of these databases.
I also had to add some functionality to the config manager in order to allow making a dump of the configuration in Python dictionary format (i.e. a deep copy like how configobj does this). This changes include a new prevConf argument that is passed to config.post_configProfileSwitch, allowing handlers to compare the current configuration against the previous and decide on what to do. This is used to clear the caches of the character processing framework.
Testing performed
Known issues
Change log entries